License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2021.16
URN: urn:nbn:de:0030-drops-139675
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13967/
Go to the corresponding LIPIcs Volume Portal


Italiano, Giuseppe F. ; Prezza, Nicola ; Sinaimeri, Blerina ; Venturini, Rossano

Compressed Weighted de Bruijn Graphs

pdf-format:
LIPIcs-CPM-2021-16.pdf (0.8 MB)


Abstract

We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.

BibTeX - Entry

@InProceedings{italiano_et_al:LIPIcs.CPM.2021.16,
  author =	{Italiano, Giuseppe F. and Prezza, Nicola and Sinaimeri, Blerina and Venturini, Rossano},
  title =	{{Compressed Weighted de Bruijn Graphs}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{16:1--16:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/13967},
  URN =		{urn:nbn:de:0030-drops-139675},
  doi =		{10.4230/LIPIcs.CPM.2021.16},
  annote =	{Keywords: weighted de Bruijn graphs, k-mer annotation, compressed data structures, partial sums}
}

Keywords: weighted de Bruijn graphs, k-mer annotation, compressed data structures, partial sums
Collection: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)
Issue Date: 2021
Date of publication: 30.06.2021
Supplementary Material: The code is written in C++ and is available at
Software (Source Code): https://github.com/nicolaprezza/cw-dBg


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI