License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2017.16
URN: urn:nbn:de:0030-drops-73254
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2017/7325/
Go to the corresponding LIPIcs Volume Portal


Bille, Philip ; Ettienne, Mikko Berggren ; Gørtz, Inge Li ; Vildhøj, Hjalte Wedel

Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing

pdf-format:
LIPIcs-CPM-2017-16.pdf (0.5 MB)


Abstract

Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. Let n, and z denote the size of the input string, and the compressed LZ77 string, respectively. We obtain the following time-space trade-offs. Given a pattern string P of length m, we can solve the problem in

(i) O(m + occ lglg n) time using O(z lg(n/z) lglg z) space, or
(ii) O(m(1 + lg^e z / lg(n/z)) + occ(lglg n + lg^e z)) time using O(z lg(n/z)) space, for any 0 < e < 1

In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lglg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1+lg^e z / lg(n/z))). However, for any polynomial compression ratio, i.e., z = O(n^{1-d}), for constant d > 0, this becomes O(m). Our index also supports extraction of any substring of length l in O(l + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search.

BibTeX - Entry

@InProceedings{bille_et_al:LIPIcs:2017:7325,
  author =	{Philip Bille and Mikko Berggren Ettienne and Inge Li G\ortz and Hjalte Wedel Vildh\oj},
  title =	{{Time-Space Trade-Offs for Lempel-Ziv Compressed Indexing}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{16:1--16:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{Juha K{\"a}rkk{\"a}inen and Jakub Radoszewski and Wojciech Rytter},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2017/7325},
  URN =		{urn:nbn:de:0030-drops-73254},
  doi =		{10.4230/LIPIcs.CPM.2017.16},
  annote =	{Keywords: compressed indexing, pattern matching, LZ77, prefix search}
}

Keywords: compressed indexing, pattern matching, LZ77, prefix search
Collection: 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Issue Date: 2017
Date of publication: 30.06.2017


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI