License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2019.17
URN: urn:nbn:de:0030-drops-110470
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/11047/
Jain, Chirag ;
Zhang, Haowen ;
Dilthey, Alexander ;
Aluru, Srinivas
Validating Paired-End Read Alignments in Sequence Graphs
Abstract
Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.
BibTeX - Entry
@InProceedings{jain_et_al:LIPIcs:2019:11047,
author = {Chirag Jain and Haowen Zhang and Alexander Dilthey and Srinivas Aluru},
title = {{Validating Paired-End Read Alignments in Sequence Graphs}},
booktitle = {19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
pages = {17:1--17:13},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-123-8},
ISSN = {1868-8969},
year = {2019},
volume = {143},
editor = {Katharina T. Huber and Dan Gusfield},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/11047},
URN = {urn:nbn:de:0030-drops-110470},
doi = {10.4230/LIPIcs.WABI.2019.17},
annote = {Keywords: Sequence graphs, read mapping, index, sparse matrix-matrix multiplication}
}
Keywords: |
|
Sequence graphs, read mapping, index, sparse matrix-matrix multiplication |
Collection: |
|
19th International Workshop on Algorithms in Bioinformatics (WABI 2019) |
Issue Date: |
|
2019 |
Date of publication: |
|
03.09.2019 |