License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2022.76
URN: urn:nbn:de:0030-drops-170140
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/17014/
Go to the corresponding LIPIcs Volume Portal


Łukasiewicz, Aleksander ; Uznański, Przemysław

Cardinality Estimation Using Gumbel Distribution

pdf-format:
LIPIcs-ESA-2022-76.pdf (0.9 MB)


Abstract

Cardinality estimation is the task of approximating the number of distinct elements in a large dataset with possibly repeating elements. LogLog and HyperLogLog (c.f. Durand and Flajolet [ESA 2003], Flajolet et al. [Discrete Math Theor. 2007]) are small space sketching schemes for cardinality estimation, which have both strong theoretical guarantees of performance and are highly effective in practice. This makes them a highly popular solution with many implementations in big-data systems (e.g. Algebird, Apache DataSketches, BigQuery, Presto and Redis). However, despite having simple and elegant formulation, both the analysis of LogLog and HyperLogLog are extremely involved - spanning over tens of pages of analytic combinatorics and complex function analysis.
We propose a modification to both LogLog and HyperLogLog that replaces discrete geometric distribution with the continuous Gumbel distribution. This leads to a very short, simple and elementary analysis of estimation guarantees, and smoother behavior of the estimator.

BibTeX - Entry

@InProceedings{lukasiewicz_et_al:LIPIcs.ESA.2022.76,
  author =	{{\L}ukasiewicz, Aleksander and Uzna\'{n}ski, Przemys{\l}aw},
  title =	{{Cardinality Estimation Using Gumbel Distribution}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{76:1--76:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2022/17014},
  URN =		{urn:nbn:de:0030-drops-170140},
  doi =		{10.4230/LIPIcs.ESA.2022.76},
  annote =	{Keywords: Streaming algorithms, Cardinality estimation, Sketching, Gumbel distribution}
}

Keywords: Streaming algorithms, Cardinality estimation, Sketching, Gumbel distribution
Collection: 30th Annual European Symposium on Algorithms (ESA 2022)
Issue Date: 2022
Date of publication: 01.09.2022


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI