License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2016.6
URN: urn:nbn:de:0030-drops-57754
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/5775/
Go to the corresponding LIPIcs Volume Portal


Dasgupta, Anirban ; Lang, Kevin J. ; Rhodes, Lee ; Thaler, Justin

A Framework for Estimating Stream Expression Cardinalities

pdf-format:
5.pdf (0.6 MB)


Abstract

Given m distributed data streams A_1,..., A_m, we consider the problem of estimating the number of unique identifiers in streams defined by set expressions over A_1,..., A_m. We identify a broad class of algorithms for solving this problem, and show that the estimators output by any algorithm in this class are perfectly unbiased and satisfy strong variance bounds. Our analysis unifies and generalizes a variety of earlier results in the literature. To demonstrate its generality, we describe several novel sampling algorithms in our class, and show that they achieve a novel tradeoff between accuracy, space usage, update speed, and applicability.

BibTeX - Entry

@InProceedings{dasgupta_et_al:LIPIcs:2016:5775,
  author =	{Anirban Dasgupta and Kevin J. Lang and Lee Rhodes and Justin Thaler},
  title =	{{A Framework for Estimating Stream Expression Cardinalities}},
  booktitle =	{19th International Conference on Database Theory (ICDT 2016)},
  pages =	{6:1--6:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-002-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{48},
  editor =	{Wim Martens and Thomas Zeume},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2016/5775},
  URN =		{urn:nbn:de:0030-drops-57754},
  doi =		{10.4230/LIPIcs.ICDT.2016.6},
  annote =	{Keywords: sketching, data stream algorithms, mergeability, distinct elements, set operations}
}

Keywords: sketching, data stream algorithms, mergeability, distinct elements, set operations
Collection: 19th International Conference on Database Theory (ICDT 2016)
Issue Date: 2016
Date of publication: 14.03.2016


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI