License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.SEA.2018.16
URN: urn:nbn:de:0030-drops-89515
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/8951/
Go to the corresponding LIPIcs Volume Portal


Pissis, Solon P. ; Retha, Ahmad

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line

pdf-format:
LIPIcs-SEA-2018-16.pdf (0.6 MB)


Abstract

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. For a standard pattern of length m, pattern matching in an elastic-degenerate text can be solved on-line in time O(nm^2+N) with pre-processing time and space O(m) (Grossi et al., CPM 2017). A fast bit-vector algorithm requiring time O(N * ceil[m/w]) with pre-processing time and space O(m * ceil[m/w]), where w is the size of the computer word, was also presented. In this paper we consider the same problem for a set of patterns of total length M. A straightforward generalization of the existing bit-vector algorithm would require time O(N * ceil[M/w]) with pre-processing time and space O(M * ceil[M/w]), which is prohibitive in practice. We present a new on-line O(N * ceil[M/w])-time algorithm with pre-processing time and space O(M). We present experimental results using both synthetic and real data demonstrating the performance of the algorithm. We further demonstrate a real application of our algorithm in a pipeline for discovery and verification of minimal absent words (MAWs) in the human genome showing that a significant number of previously discovered MAWs are in fact false-positives when a population's variants are considered.

BibTeX - Entry

@InProceedings{pissis_et_al:LIPIcs:2018:8951,
  author =	{Solon P. Pissis and Ahmad Retha},
  title =	{{Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line}},
  booktitle =	{17th International Symposium on Experimental Algorithms  (SEA 2018)},
  pages =	{16:1--16:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-070-5},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{103},
  editor =	{Gianlorenzo D'Angelo},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2018/8951},
  URN =		{urn:nbn:de:0030-drops-89515},
  doi =		{10.4230/LIPIcs.SEA.2018.16},
  annote =	{Keywords: on-line algorithms, algorithms on strings, dictionary matching, elastic-degenerate string, Variant Call Format}
}

Keywords: on-line algorithms, algorithms on strings, dictionary matching, elastic-degenerate string, Variant Call Format
Collection: 17th International Symposium on Experimental Algorithms (SEA 2018)
Issue Date: 2018
Date of publication: 19.06.2018


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI