License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2023.12
URN: urn:nbn:de:0030-drops-186389
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18638/
Rajput, Jyotshna ;
Chandra, Ghanshyam ;
Jain, Chirag
Co-Linear Chaining on Pangenome Graphs
Abstract
Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to these graphs with complex topology and cycles can be challenging. The seed-chain-extend based alignment algorithms use co-linear chaining as a standard technique to identify a good cluster of exact seed matches that can be combined to form an alignment. Recent works show how the co-linear chaining problem can be efficiently solved for acyclic pangenome graphs by exploiting their small width [Makinen et al., TALG'19] and how incorporating gap cost in the scoring function improves alignment accuracy [Chandra and Jain, RECOMB'23]. However, it remains open on how to effectively generalize these techniques for general pangenome graphs which contain cycles. Here we present the first practical formulation and an exact algorithm for co-linear chaining on cyclic pangenome graphs. We rigorously prove the correctness and computational complexity of the proposed algorithm. We evaluate the empirical performance of our algorithm by aligning simulated long reads from the human genome to a cyclic pangenome graph constructed from 95 publicly available haplotype-resolved human genome assemblies. While the existing heuristic-based algorithms are faster, the proposed algorithm provides a significant advantage in terms of accuracy.
BibTeX - Entry
@InProceedings{rajput_et_al:LIPIcs.WABI.2023.12,
author = {Rajput, Jyotshna and Chandra, Ghanshyam and Jain, Chirag},
title = {{Co-Linear Chaining on Pangenome Graphs}},
booktitle = {23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
pages = {12:1--12:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-294-5},
ISSN = {1868-8969},
year = {2023},
volume = {273},
editor = {Belazzougui, Djamal and Ouangraoua, A\"{i}da},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2023/18638},
URN = {urn:nbn:de:0030-drops-186389},
doi = {10.4230/LIPIcs.WABI.2023.12},
annote = {Keywords: Sequence alignment, variation graph, genome sequencing, path cover}
}
Keywords: |
|
Sequence alignment, variation graph, genome sequencing, path cover |
Collection: |
|
23rd International Workshop on Algorithms in Bioinformatics (WABI 2023) |
Issue Date: |
|
2023 |
Date of publication: |
|
29.08.2023 |
Supplementary Material: |
|
Software (Source Code): https://github.com/at-cg/PanAligner |