License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2020.9
URN: urn:nbn:de:0030-drops-127982
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12798/
Mukherjee, Kingshuk ;
Rossi, Massimiliano ;
Salmela, Leena ;
Boucher, Christina
Fast and Efficient Rmap Assembly Using the Bi-Labelled de Bruijn Graph
Abstract
Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there exists very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary method that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics' Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (2006) and Solve by Bionano Genomics on data from three genomes - E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was the only one able to successfully run on all three genomes. The method of Valouev et al. (2006) only successfully ran on E. coli and Bionano Solve successfully ran on E. coli and human but not on the fish genome. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies.
BibTeX - Entry
@InProceedings{mukherjee_et_al:LIPIcs:2020:12798,
author = {Kingshuk Mukherjee and Massimiliano Rossi and Leena Salmela and Christina Boucher},
title = {{Fast and Efficient Rmap Assembly Using the Bi-Labelled de Bruijn Graph}},
booktitle = {20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
pages = {9:1--9:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-161-0},
ISSN = {1868-8969},
year = {2020},
volume = {172},
editor = {Carl Kingsford and Nadia Pisanti},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12798},
URN = {urn:nbn:de:0030-drops-127982},
doi = {10.4230/LIPIcs.WABI.2020.9},
annote = {Keywords: optical maps, de Bruijn graph, assembly}
}
Keywords: |
|
optical maps, de Bruijn graph, assembly |
Collection: |
|
20th International Workshop on Algorithms in Bioinformatics (WABI 2020) |
Issue Date: |
|
2020 |
Date of publication: |
|
25.08.2020 |
Supplementary Material: |
|
Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper. |