License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2023.74
URN: urn:nbn:de:0030-drops-187274
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2023/18727/
Go to the corresponding LIPIcs Volume Portal


Köppl, Dominik ; Kurpicz, Florian ; Meyer, Daniel

Faster Block Tree Construction

pdf-format:
LIPIcs-ESA-2023-74.pdf (1 MB)


Abstract

The block tree [Belazzougui et al. J. Comput. Syst. Sci. '21] is a compressed text index that can answer access (extract a character at a position), rank (number of occurrences of a specified character in a prefix of the text), and select (size of smallest prefix such that a specified character has a specified rank) queries. It requires O(zlog(n/z)) words of space, where z is the number of Lempel-Ziv factors of the text. For some highly repetitive inputs, a block tree can require as little as 0.015 bits per character of the text. Small values of z make the block tree a space-efficient alternative to the wavelet tree, which is another index for these three types of queries. While wavelet trees can be constructed fast in practice, up so far compressed versions of the wavelet tree only leverage statistical compression, meaning that they are blind to spaced repetitions.
To make block trees usable in practice, a first step is to find ways in constructing them efficiently. We address this problem by presenting a practically efficient construction algorithm for block trees, which is up to an order of magnitude faster than previous implementations. Additionally, we parallelize our implementation, making it the first block tree construction implementation that works in parallel in shared memory.

BibTeX - Entry

@InProceedings{koppl_et_al:LIPIcs.ESA.2023.74,
  author =	{K\"{o}ppl, Dominik and Kurpicz, Florian and Meyer, Daniel},
  title =	{{Faster Block Tree Construction}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{74:1--74:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2023/18727},
  URN =		{urn:nbn:de:0030-drops-187274},
  doi =		{10.4230/LIPIcs.ESA.2023.74},
  annote =	{Keywords: compressed data structure, block tree, Lempel-Ziv compression, longest previous factor array, rank and select}
}

Keywords: compressed data structure, block tree, Lempel-Ziv compression, longest previous factor array, rank and select
Collection: 31st Annual European Symposium on Algorithms (ESA 2023)
Issue Date: 2023
Date of publication: 30.08.2023
Supplementary Material: Software (Source Code): https://github.com/pasta-toolbox/block_tree archived at: https://archive.softwareheritage.org/swh:1:dir:534632174136011114d40181f0dcd87e61ddfc4f
Software (Comparison with Competitors and Raw Data): https://github.com/pasta-toolbox/block_tree_experiments archived at: https://archive.softwareheritage.org/swh:1:dir:add3d6b114766d0a06532e612ad0f6d08cebdff9


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI