License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2023.12
URN: urn:nbn:de:0030-drops-186389
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18638/
Go to the corresponding LIPIcs Volume Portal


Rajput, Jyotshna ; Chandra, Ghanshyam ; Jain, Chirag

Co-Linear Chaining on Pangenome Graphs

pdf-format:
LIPIcs-WABI-2023-12.pdf (2 MB)


Abstract

Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to these graphs with complex topology and cycles can be challenging. The seed-chain-extend based alignment algorithms use co-linear chaining as a standard technique to identify a good cluster of exact seed matches that can be combined to form an alignment. Recent works show how the co-linear chaining problem can be efficiently solved for acyclic pangenome graphs by exploiting their small width [Makinen et al., TALG'19] and how incorporating gap cost in the scoring function improves alignment accuracy [Chandra and Jain, RECOMB'23]. However, it remains open on how to effectively generalize these techniques for general pangenome graphs which contain cycles. Here we present the first practical formulation and an exact algorithm for co-linear chaining on cyclic pangenome graphs. We rigorously prove the correctness and computational complexity of the proposed algorithm. We evaluate the empirical performance of our algorithm by aligning simulated long reads from the human genome to a cyclic pangenome graph constructed from 95 publicly available haplotype-resolved human genome assemblies. While the existing heuristic-based algorithms are faster, the proposed algorithm provides a significant advantage in terms of accuracy.

BibTeX - Entry

@InProceedings{rajput_et_al:LIPIcs.WABI.2023.12,
  author =	{Rajput, Jyotshna and Chandra, Ghanshyam and Jain, Chirag},
  title =	{{Co-Linear Chaining on Pangenome Graphs}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{12:1--12:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18638},
  URN =		{urn:nbn:de:0030-drops-186389},
  doi =		{10.4230/LIPIcs.WABI.2023.12},
  annote =	{Keywords: Sequence alignment, variation graph, genome sequencing, path cover}
}

Keywords: Sequence alignment, variation graph, genome sequencing, path cover
Collection: 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Issue Date: 2023
Date of publication: 29.08.2023
Supplementary Material: Software (Source Code): https://github.com/at-cg/PanAligner


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI