License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2016.6
URN: urn:nbn:de:0030-drops-60825
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/6082/
Go to the corresponding LIPIcs Volume Portal


Kopelowitz, Tsvi ; Porat, Ely ; Rozen, Yaron

Succinct Online Dictionary Matching with Improved Worst-Case Guarantees

pdf-format:
LIPIcs-CPM-2016-6.pdf (0.5 MB)


Abstract

In the online dictionary matching problem the goal is to preprocess a set of patterns D={P_1,...,P_d} over alphabet Sigma, so that given an online text (one character at a time) we report all of the occurrences of patterns that are a suffix of the current text before the following character arrives. We introduce a succinct Aho-Corasick like data structure for the online dictionary matching problem. Our solution uses a new succinct representation for multi-labeled trees, in which each node has a set of labels from a universe of size lambda. We consider lowest labeled ancestor (LLA) queries on multi-labeled trees, where given a node and a label we return the lowest proper ancestor of the node that has the queried label.

In this paper we introduce a succinct representation of multi-labeled trees for lambda=omega(1) that support LLA queries in O(log(log(lambda))) time. Using this representation of multi-labeled trees, we introduce a succinct data structure for the online dictionary matching problem when sigma=omega(1). In this solution the worst case cost per character is O(log(log(sigma)) + occ) time, where occ is the size of the current output.
Moreover, the amortized cost per character is O(1+occ) time.

BibTeX - Entry

@InProceedings{kopelowitz_et_al:LIPIcs:2016:6082,
  author =	{Tsvi Kopelowitz and Ely Porat and Yaron Rozen},
  title =	{{Succinct Online Dictionary Matching with Improved Worst-Case Guarantees}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{6:1--6:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Roberto Grossi and Moshe Lewenstein},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/6082},
  URN =		{urn:nbn:de:0030-drops-60825},
  doi =		{10.4230/LIPIcs.CPM.2016.6},
  annote =	{Keywords: Succinct indexing, dictionary matching, Aho-Corasick, labeled trees}
}

Keywords: Succinct indexing, dictionary matching, Aho-Corasick, labeled trees
Collection: 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Issue Date: 2016
Date of publication: 27.06.2016


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI