License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.AofA.2022.12
URN: urn:nbn:de:0030-drops-160987
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/16098/
Go to the corresponding LIPIcs Volume Portal


Lumbroso, Jérémie ; Martínez, Conrado

Affirmative Sampling: Theory and Applications

pdf-format:
LIPIcs-AofA-2022-12.pdf (0.8 MB)


Abstract

Affirmative Sampling is a practical and efficient novel algorithm to obtain random samples of distinct elements from a data stream. Its most salient feature is that the size S of the sample will, on expectation, grow with the (unknown) number n of distinct elements in the data stream. As any distinct element has the same probability to be sampled, and the sample size is greater when the "diversity" (the number of distinct elements) is greater, the samples that Affirmative Sampling delivers are more representative than those produced by any scheme where the sample size is fixed a priori - hence its name. Our algorithm is straightforward to implement, and several implementations already exist.

BibTeX - Entry

@InProceedings{lumbroso_et_al:LIPIcs.AofA.2022.12,
  author =	{Lumbroso, J\'{e}r\'{e}mie and Mart{\'\i}nez, Conrado},
  title =	{{Affirmative Sampling: Theory and Applications}},
  booktitle =	{33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2022)},
  pages =	{12:1--12:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-230-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{225},
  editor =	{Ward, Mark Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2022/16098},
  URN =		{urn:nbn:de:0030-drops-160987},
  doi =		{10.4230/LIPIcs.AofA.2022.12},
  annote =	{Keywords: Data streams, Distinct sampling, Random sampling, Cardinality estimation, Analysis of algorithms}
}

Keywords: Data streams, Distinct sampling, Random sampling, Cardinality estimation, Analysis of algorithms
Collection: 33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2022)
Issue Date: 2022
Date of publication: 08.06.2022


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI