License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2023.14
URN: urn:nbn:de:0030-drops-186403
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18640/
Go to the corresponding LIPIcs Volume Portal


Schulz, Tizian ; Medvedev, Paul

Exact Sketch-Based Read Mapping

pdf-format:
LIPIcs-WABI-2023-14.pdf (0.8 MB)


Abstract

Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a "similar sequence". Traditionally, "similar sequence" was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold.
In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in {O}(|t| + |p| + ?²) time and Θ(?²) space, where |t| is the number of k-mers inside the sketch of the reference, |p| is the number of k-mers inside the read’s sketch and ? is the number of times that k-mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm’s performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.

BibTeX - Entry

@InProceedings{schulz_et_al:LIPIcs.WABI.2023.14,
  author =	{Schulz, Tizian and Medvedev, Paul},
  title =	{{Exact Sketch-Based Read Mapping}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18640},
  URN =		{urn:nbn:de:0030-drops-186403},
  doi =		{10.4230/LIPIcs.WABI.2023.14},
  annote =	{Keywords: Sequence Sketching, Long-read Mapping, Exact Algorithm, Dynamic Programming}
}

Keywords: Sequence Sketching, Long-read Mapping, Exact Algorithm, Dynamic Programming
Collection: 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)
Issue Date: 2023
Date of publication: 29.08.2023
Supplementary Material: Software (Source Code): https://github.com/medvedevgroup/eskemap archived at: https://archive.softwareheritage.org/swh:1:dir:a8eaacfde8a32bab844b58efd605f7f7196c00eb


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI