Abstract
We propose new entropy measures for trees, the known ones are H_k(?), the kth order (tree label) entropy (Ferragina at al. 2005), and tree entropy H(?) (Jansson et al. 2006), the former considers only the tree labels and the latter only tree shape. The proposed entropy measures, H_k(?L) and H_k(L?), exploit the relation between the labels and the tree shape. We prove that they lower bound label entropy and tree entropy, respectively, i.e. H_k(?L) ≤ H(?) and H_k(L?) ≤ H_k(L). Besides being theoretically superior, the new measures are significantly smaller in practice.
We also propose a new succinct representation of labeled trees which represents a tree T using one of the following bounds: T(H(?) + H_k(L?)) or T(H_k(?L) + H_k(L)). The representation is based on a new, simple method of partitioning the tree, which preserves both tree shape and node degrees. The previous stateoftheart method of compressing the tree achieved T(H(?) + H_k(L)) bits, by combining the results of Ferragina at al. 2005 and Jansson et al. 2006; so proposed representation is not worse and often superior. Moreover, our representation supports standard tree navigation in constant time as well as more complex queries. Such a structure achieving this space bounds was not known before: aforementioned solution only worked for compression alone, our structure is the first which achieves H_k(?) for k>0 and supports such queries. Lastly, our data structure is fairly simple, both conceptually and in terms of the implementation, moreover it uses known tools, which is a counterargument to the claim that methods based on treepartitioning are impractical.
BibTeX  Entry
@InProceedings{gaczorz:LIPIcs:2020:11883,
author = {Michał Gańczorz},
title = {{Using Statistical Encoding to Achieve Tree Succinctness Never Seen Before}},
booktitle = {37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)},
pages = {22:122:29},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771405},
ISSN = {18688969},
year = {2020},
volume = {154},
editor = {Christophe Paul and Markus Bl{\"a}ser},
publisher = {Schloss DagstuhlLeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/11883},
URN = {urn:nbn:de:0030drops118836},
doi = {10.4230/LIPIcs.STACS.2020.22},
annote = {Keywords: succinct data structures, labeled tree, ordered tree, entropy, tree entropy}
}
Keywords: 

succinct data structures, labeled tree, ordered tree, entropy, tree entropy 
Collection: 

37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020) 
Issue Date: 

2020 
Date of publication: 

04.03.2020 