License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.STACS.2020.10
URN: urn:nbn:de:0030-drops-118716
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/11871/
Go to the corresponding LIPIcs Volume Portal


Bille, Philip ; Gørtz, Inge Li ; Steiner, Teresa Anna

String Indexing with Compressed Patterns

pdf-format:
LIPIcs-STACS-2020-10.pdf (0.5 MB)


Abstract

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.

BibTeX - Entry

@InProceedings{bille_et_al:LIPIcs:2020:11871,
  author =	{Philip Bille and Inge Li G{\o}rtz and Teresa Anna Steiner},
  title =	{{String Indexing with Compressed Patterns}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{10:1--10:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Christophe Paul and Markus Bl{\"a}ser},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/11871},
  URN =		{urn:nbn:de:0030-drops-118716},
  doi =		{10.4230/LIPIcs.STACS.2020.10},
  annote =	{Keywords: string indexing, compression, pattern matching}
}

Keywords: string indexing, compression, pattern matching
Collection: 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Issue Date: 2020
Date of publication: 04.03.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI