License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2021.16
URN: urn:nbn:de:0030-drops-139675
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13967/
Italiano, Giuseppe F. ;
Prezza, Nicola ;
Sinaimeri, Blerina ;
Venturini, Rossano
Compressed Weighted de Bruijn Graphs
Abstract
We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.
BibTeX - Entry
@InProceedings{italiano_et_al:LIPIcs.CPM.2021.16,
author = {Italiano, Giuseppe F. and Prezza, Nicola and Sinaimeri, Blerina and Venturini, Rossano},
title = {{Compressed Weighted de Bruijn Graphs}},
booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
pages = {16:1--16:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-186-3},
ISSN = {1868-8969},
year = {2021},
volume = {191},
editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/13967},
URN = {urn:nbn:de:0030-drops-139675},
doi = {10.4230/LIPIcs.CPM.2021.16},
annote = {Keywords: weighted de Bruijn graphs, k-mer annotation, compressed data structures, partial sums}
}
Keywords: |
|
weighted de Bruijn graphs, k-mer annotation, compressed data structures, partial sums |
Collection: |
|
32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021) |
Issue Date: |
|
2021 |
Date of publication: |
|
30.06.2021 |
Supplementary Material: |
|
The code is written in C++ and is available at Software (Source Code): https://github.com/nicolaprezza/cw-dBg |