License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2018.2
URN: urn:nbn:de:0030-drops-93044
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/9304/
Go to the corresponding LIPIcs Volume Portal


Boucher, Christina ; Gagie, Travis ; Kuhnle, Alan ; Manzini, Giovanni

Prefix-Free Parsing for Building Big BWTs

pdf-format:
LIPIcs-WABI-2018-2.pdf (0.7 MB)


Abstract

High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive - a characteristic that can be exploited and enable the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. In this paper, we introduce a preprocessing algorithm, referred to as prefix-free parsing, that takes a text T as input, and in one-pass generates a dictionary D and a parse P of T with the property that the BWT of T can be constructed from D and P using workspace proportional to their total size and O(|T|)-time. Our experiments show that D and P are significantly smaller than T in practice, and thus, can fit in a reasonable internal memory even when T is very large. Therefore, prefix-free parsing eases BWT construction, which is pertinent to many bioinformatics applications.

BibTeX - Entry

@InProceedings{boucher_et_al:LIPIcs:2018:9304,
  author =	{Christina Boucher and Travis Gagie and Alan Kuhnle and Giovanni Manzini},
  title =	{{Prefix-Free Parsing for Building Big BWTs}},
  booktitle =	{18th International Workshop on Algorithms in  Bioinformatics (WABI 2018)},
  pages =	{2:1--2:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Laxmi Parida and Esko Ukkonen},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2018/9304},
  URN =		{urn:nbn:de:0030-drops-93044},
  doi =		{10.4230/LIPIcs.WABI.2018.2},
  annote =	{Keywords: Burrows-Wheeler Transform, prefix-free parsing, compression-aware algorithms, genomic databases}
}

Keywords: Burrows-Wheeler Transform, prefix-free parsing, compression-aware algorithms, genomic databases
Collection: 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Issue Date: 2018
Date of publication: 02.08.2018
Supplementary Material: Source code: https://gitlab.com/manzai/Big-BWT


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI