License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2021.3
URN: urn:nbn:de:0030-drops-139548
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2021/13954/
Go to the corresponding LIPIcs Volume Portal


Pisanti, Nadia

On-Line Pattern Matching on D-Texts (Invited Talk)

pdf-format:
LIPIcs-CPM-2021-3.pdf (0.4 MB)


Abstract

The Elastic Degenerate String Matching (EDSM) problem is defined as that of finding an occurrence of a pattern P of length m in an ED-text T. A D-text (Degenerate text) is a string that actually represents a set of similar and aligned strings (e.g. a pan-genome [The Computational Pan-Genomics Consortium, 2018]) by collapsing common fragments into a standard string, and representing variants with sets of alternative substrings. When such substrings are not bound to have the same size, then we talk about elastic D-strings (ED-strings). In [R.Grossi et al., 2017] we gave an O(nm²+N) time on-line algorithm for EDSM, where n is the length of T and N is its size, defined as the total number of letters. A fundamental toolkit of our algorithm is the O(m²+N) time solution of the later called Active Prefixes problem (AP). In [K.Aoyama et al., 2018], a O(m^{1.5} √{log m}+N) solution for AP was shown, leading to a O(nm^{1.5} √{log m}+N) time solution for EDSM. The natural open problem was thus whether the 1.5 exponent could furtherly be decreased. In [G.Bernardini et al., 2019], we prove several properties that answer this and other questions: we give a conditional O(nm^{1.5}+N) lower bound for EDSM, proving that a combinatorial algorithm solving EDSM in O(nm^{1.5-ε} +N) time would break the Boolean Matrix Multiplication (BMM) conjecture; we use this result as a hint to devise a non-combinatorial algorithm that solves EDSM in O(nm^{1.381}+N) time; we do so by successfully combining Fast Fourier Transform and properties of string periodicity. In my talk I will overview the results above, as well as some interesting side results: the extension to a dictionary rather than a single pattern [S.P.Pissis and A.Retha, 2018], the introduction of errors [G.Bernardini et al., 2020], and a notion of matching among D-strings with its linear time solution [M.Alzamel et al., 2020].

BibTeX - Entry

@InProceedings{pisanti:LIPIcs.CPM.2021.3,
  author =	{Pisanti, Nadia},
  title =	{{On-Line Pattern Matching on D-Texts}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{3:1--3:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/13954},
  URN =		{urn:nbn:de:0030-drops-139548},
  doi =		{10.4230/LIPIcs.CPM.2021.3},
  annote =	{Keywords: pattern matching, elastic-degenerate string, matrix multiplication}
}

Keywords: pattern matching, elastic-degenerate string, matrix multiplication
Collection: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)
Issue Date: 2021
Date of publication: 30.06.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI