License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2018.7
URN: urn:nbn:de:0030-drops-87025
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2018/8702/
Go to the corresponding LIPIcs Volume Portal


Bannai, Hideo ; Gagie, Travis ; I, Tomohiro

Online LZ77 Parsing and Matching Statistics with RLBWTs

pdf-format:
LIPIcs-CPM-2018-7.pdf (0.5 MB)


Abstract

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse T^R of a text T, to compute offline the LZ77 parse of T in O(n log r) time and O(r) space, where n is the length of T and r is the number of runs in the BWT of T^R. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O(n log r) time and O(r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further - albeit making it static again and increasing its space by a factor proportional to the size of the alphabet - such that later, given another string S and O(log log n)-time random access to T, we can compute the matching statistics of S with respect to T in O(|S| log log n) time.

BibTeX - Entry

@InProceedings{bannai_et_al:LIPIcs:2018:8702,
  author =	{Hideo Bannai and Travis Gagie and Tomohiro I},
  title =	{{Online LZ77 Parsing and Matching Statistics with RLBWTs}},
  booktitle =	{Annual Symposium on Combinatorial Pattern Matching  (CPM 2018)},
  pages =	{7:1--7:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-074-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{105},
  editor =	{Gonzalo Navarro and David Sankoff and Binhai Zhu},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2018/8702},
  URN =		{urn:nbn:de:0030-drops-87025},
  doi =		{10.4230/LIPIcs.CPM.2018.7},
  annote =	{Keywords: Lempel-Ziv 1977, Matching Statistics, Run-Length Compressed Burrows-Wheeler Transform}
}

Keywords: Lempel-Ziv 1977, Matching Statistics, Run-Length Compressed Burrows-Wheeler Transform
Collection: Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Issue Date: 2018
Date of publication: 18.05.2018


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI