License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2019.11
URN: urn:nbn:de:0030-drops-104822
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/10482/
Go to the corresponding LIPIcs Volume Portal


Ganczorz, Michal

Entropy Lower Bounds for Dictionary Compression

pdf-format:
LIPIcs-CPM-2019-11.pdf (0.5 MB)


Abstract

We show that a wide class of dictionary compression methods (including LZ77, LZ78, grammar compressors as well as parsing-based structures) require |S|H_k(S) + Omega (|S|k log sigma/log_sigma |S|) bits to encode their output. This matches known upper bounds and improves the information-theoretic lower bound of |S|H_k(S). To this end, we abstract the crucial properties of parsings created by those methods, construct a certain family of strings and analyze the parsings of those strings. We also show that for k = alpha log_sigma |S|, where 0 < alpha < 1 is a constant, the aforementioned methods produce an output of size at least 1/(1-alpha)|S|H_k(S) bits. Thus our results separate dictionary compressors from context-based one (such as PPM) and BWT-based ones, as the those include methods achieving |S|H_k(S) + O(sigma^k log sigma) bits, i.e. the redundancy depends on k and sigma but not on |S|.

BibTeX - Entry

@InProceedings{ganczorz:LIPIcs:2019:10482,
  author =	{Michal Ganczorz},
  title =	{{Entropy Lower Bounds for Dictionary Compression}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{11:1--11:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Nadia Pisanti and Solon P. Pissis},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2019/10482},
  URN =		{urn:nbn:de:0030-drops-104822},
  doi =		{10.4230/LIPIcs.CPM.2019.11},
  annote =	{Keywords: compression, empirical entropy, parsing, lower bounds}
}

Keywords: compression, empirical entropy, parsing, lower bounds
Collection: 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)
Issue Date: 2019
Date of publication: 06.06.2019


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI