License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2017.3
URN: urn:nbn:de:0030-drops-76425
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2017/7642/
Go to the corresponding LIPIcs Volume Portal


Brubach, Brian ; Ghurye, Jay ; Pop, Mihai ; Srinivasan, Aravind

Better Greedy Sequence Clustering with Fast Banded Alignment

pdf-format:
LIPIcs-WABI-2017-3.pdf (0.8 MB)


Abstract

Comparing a string to a large set of sequences is a key subroutine in greedy heuristics for clustering genomic data. Clustering 16S rRNA gene sequences into operational taxonomic units (OTUs) is a common method used in studying microbial communities. We present a new approach to greedy clustering using a trie-like data structure and Four Russians speedup. We evaluate the running time of our method in terms of the number of comparisons it makes during clustering and show in experimental results that the number of comparisons grows linearly with the size of the dataset as opposed to the quadratic running time of other methods. We compare the clusters output by our method to the popular greedy clustering tool UCLUST. We show that the clusters we generate can be both tighter and larger.

BibTeX - Entry

@InProceedings{brubach_et_al:LIPIcs:2017:7642,
  author =	{Brian Brubach and Jay Ghurye and Mihai Pop and Aravind Srinivasan},
  title =	{{Better Greedy Sequence Clustering with Fast Banded Alignment}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{3:1--3:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Russell Schwartz and Knut Reinert},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/7642},
  URN =		{urn:nbn:de:0030-drops-76425},
  doi =		{10.4230/LIPIcs.WABI.2017.3},
  annote =	{Keywords: Sequence Clustering, Metagenomics, String Algorithms}
}

Keywords: Sequence Clustering, Metagenomics, String Algorithms
Collection: 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)
Issue Date: 2017
Date of publication: 11.08.2017


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI