License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2020.7
URN: urn:nbn:de:0030-drops-119315
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/11931/
Go to the corresponding LIPIcs Volume Portal


Chen, Yu ; Yi, Ke

Random Sampling and Size Estimation Over Cyclic Joins

pdf-format:
LIPIcs-ICDT-2020-7.pdf (0.7 MB)


Abstract

Computing joins is expensive, and often unnecessary when the output size is large. In 1999, Chaudhuri et al. [Surajit Chaudhuri et al., 1999] posed the problem of random sampling over joins as a potentially effective approach to avoiding computing the join in full, while obtaining important statistical information about the join results. Unfortunately, no significant progress has been made in the last 20 years, except for the case of acyclic joins. In this paper, we present the first non-trivial result on sampling over cyclic joins. We show that after a linear-time preprocessing step, a join result can be drawn uniformly at random in expected time O(IN^ρ/OUT), where IN^ρ is known as the AGM bound of the join and OUT is its output size. This result holds for all joins on binary relations, as well as certain joins on relations of higher arity. We further show how this algorithm immediately leads to a join size estimation algorithm with the same running time.

BibTeX - Entry

@InProceedings{chen_et_al:LIPIcs:2020:11931,
  author =	{Yu Chen and Ke Yi},
  title =	{{Random Sampling and Size Estimation Over Cyclic Joins}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Carsten Lutz and Jean Christoph Jung},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2020/11931},
  URN =		{urn:nbn:de:0030-drops-119315},
  doi =		{10.4230/LIPIcs.ICDT.2020.7},
  annote =	{Keywords: Random sampling, joins, join size estimation}
}

Keywords: Random sampling, joins, join size estimation
Collection: 23rd International Conference on Database Theory (ICDT 2020)
Issue Date: 2020
Date of publication: 11.03.2020
Supplementary Material: Video of the Presentation: https://doi.org/10.5446/46833


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI