License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.STACS.2020.22
URN: urn:nbn:de:0030-drops-118836
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/11883/
Go to the corresponding LIPIcs Volume Portal


Gańczorz, Michał

Using Statistical Encoding to Achieve Tree Succinctness Never Seen Before

pdf-format:
LIPIcs-STACS-2020-22.pdf (0.6 MB)


Abstract

We propose new entropy measures for trees, the known ones are H_k(?), the k-th order (tree label) entropy (Ferragina at al. 2005), and tree entropy H(?) (Jansson et al. 2006), the former considers only the tree labels and the latter only tree shape. The proposed entropy measures, H_k(?|L) and H_k(L|?), exploit the relation between the labels and the tree shape. We prove that they lower bound label entropy and tree entropy, respectively, i.e. H_k(?|L) ≤ H(?) and H_k(L|?) ≤ H_k(L). Besides being theoretically superior, the new measures are significantly smaller in practice.
We also propose a new succinct representation of labeled trees which represents a tree T using one of the following bounds: |T|(H(?) + H_k(L|?)) or |T|(H_k(?|L) + H_k(L)). The representation is based on a new, simple method of partitioning the tree, which preserves both tree shape and node degrees. The previous state-of-the-art method of compressing the tree achieved |T|(H(?) + H_k(L)) bits, by combining the results of Ferragina at al. 2005 and Jansson et al. 2006; so proposed representation is not worse and often superior. Moreover, our representation supports standard tree navigation in constant time as well as more complex queries. Such a structure achieving this space bounds was not known before: aforementioned solution only worked for compression alone, our structure is the first which achieves H_k(?) for k>0 and supports such queries. Lastly, our data structure is fairly simple, both conceptually and in terms of the implementation, moreover it uses known tools, which is a counter-argument to the claim that methods based on tree-partitioning are impractical.

BibTeX - Entry

@InProceedings{gaczorz:LIPIcs:2020:11883,
  author =	{Michał Gańczorz},
  title =	{{Using Statistical Encoding to Achieve Tree Succinctness Never Seen Before}},
  booktitle =	{37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
  pages =	{22:1--22:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-140-5},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{154},
  editor =	{Christophe Paul and Markus Bl{\"a}ser},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/11883},
  URN =		{urn:nbn:de:0030-drops-118836},
  doi =		{10.4230/LIPIcs.STACS.2020.22},
  annote =	{Keywords: succinct data structures, labeled tree, ordered tree, entropy, tree entropy}
}

Keywords: succinct data structures, labeled tree, ordered tree, entropy, tree entropy
Collection: 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Issue Date: 2020
Date of publication: 04.03.2020


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI