License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2018.2
URN: urn:nbn:de:0030-drops-93044
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/9304/
Boucher, Christina ;
Gagie, Travis ;
Kuhnle, Alan ;
Manzini, Giovanni
Prefix-Free Parsing for Building Big BWTs
Abstract
High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive - a characteristic that can be exploited and enable the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. In this paper, we introduce a preprocessing algorithm, referred to as prefix-free parsing, that takes a text T as input, and in one-pass generates a dictionary D and a parse P of T with the property that the BWT of T can be constructed from D and P using workspace proportional to their total size and O(|T|)-time. Our experiments show that D and P are significantly smaller than T in practice, and thus, can fit in a reasonable internal memory even when T is very large. Therefore, prefix-free parsing eases BWT construction, which is pertinent to many bioinformatics applications.
BibTeX - Entry
@InProceedings{boucher_et_al:LIPIcs:2018:9304,
author = {Christina Boucher and Travis Gagie and Alan Kuhnle and Giovanni Manzini},
title = {{Prefix-Free Parsing for Building Big BWTs}},
booktitle = {18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
pages = {2:1--2:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-082-8},
ISSN = {1868-8969},
year = {2018},
volume = {113},
editor = {Laxmi Parida and Esko Ukkonen},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2018/9304},
URN = {urn:nbn:de:0030-drops-93044},
doi = {10.4230/LIPIcs.WABI.2018.2},
annote = {Keywords: Burrows-Wheeler Transform, prefix-free parsing, compression-aware algorithms, genomic databases}
}
Keywords: |
|
Burrows-Wheeler Transform, prefix-free parsing, compression-aware algorithms, genomic databases |
Collection: |
|
18th International Workshop on Algorithms in Bioinformatics (WABI 2018) |
Issue Date: |
|
2018 |
Date of publication: |
|
02.08.2018 |
Supplementary Material: |
|
Source code: https://gitlab.com/manzai/Big-BWT |