License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2020.8
URN: urn:nbn:de:0030-drops-121336
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12133/
Charalampopoulos, Panagiotis ;
Kociumaka, Tomasz ;
Mohamed, Manal ;
Radoszewski, Jakub ;
Rytter, Wojciech ;
Straszyński, Juliusz ;
Waleń, Tomasz ;
Zuba, Wiktor
Counting Distinct Patterns in Internal Dictionary Matching
Abstract
We consider the problem of preprocessing a text T of length n and a dictionary ? in order to be able to efficiently answer queries CountDistinct(i,j), that is, given i and j return the number of patterns from ? that occur in the fragment T[i..j]. The dictionary is internal in the sense that each pattern in ? is given as a fragment of T. This way, the dictionary takes space proportional to the number of patterns d=|?| rather than their total length, which could be Θ(n⋅ d). An ?̃(n+d)-size data structure that answers CountDistinct(i,j) queries ?(log n)-approximately in ?̃(1) time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an ?̃(n+d)-size data structure that answers CountDistinct(i,j) queries 2-approximately in ?̃(1) time. Using range queries, for any m, we give an ?̃(min(nd/m,n²/m²)+d)-size data structure that answers CountDistinct(i,j) queries exactly in ?̃(m) time. We also consider the special case when the dictionary consists of all square factors of the string. We design an ?(n log² n)-size data structure that allows us to count distinct squares in a text fragment T[i..j] in ?(log n) time.
BibTeX - Entry
@InProceedings{charalampopoulos_et_al:LIPIcs:2020:12133,
author = {Panagiotis Charalampopoulos and Tomasz Kociumaka and Manal Mohamed and Jakub Radoszewski and Wojciech Rytter and Juliusz Straszyński and Tomasz Waleń and Wiktor Zuba},
title = {{Counting Distinct Patterns in Internal Dictionary Matching}},
booktitle = {31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
pages = {8:1--8:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-149-8},
ISSN = {1868-8969},
year = {2020},
volume = {161},
editor = {Inge Li G{\o}rtz and Oren Weimann},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12133},
URN = {urn:nbn:de:0030-drops-121336},
doi = {10.4230/LIPIcs.CPM.2020.8},
annote = {Keywords: dictionary matching, internal pattern matching, squares}
}
Keywords: |
|
dictionary matching, internal pattern matching, squares |
Collection: |
|
31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020) |
Issue Date: |
|
2020 |
Date of publication: |
|
09.06.2020 |