License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2020.7
URN: urn:nbn:de:0030-drops-119315
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/11931/
Chen, Yu ;
Yi, Ke
Random Sampling and Size Estimation Over Cyclic Joins
Abstract
Computing joins is expensive, and often unnecessary when the output size is large. In 1999, Chaudhuri et al. [Surajit Chaudhuri et al., 1999] posed the problem of random sampling over joins as a potentially effective approach to avoiding computing the join in full, while obtaining important statistical information about the join results. Unfortunately, no significant progress has been made in the last 20 years, except for the case of acyclic joins. In this paper, we present the first non-trivial result on sampling over cyclic joins. We show that after a linear-time preprocessing step, a join result can be drawn uniformly at random in expected time O(IN^ρ/OUT), where IN^ρ is known as the AGM bound of the join and OUT is its output size. This result holds for all joins on binary relations, as well as certain joins on relations of higher arity. We further show how this algorithm immediately leads to a join size estimation algorithm with the same running time.
BibTeX - Entry
@InProceedings{chen_et_al:LIPIcs:2020:11931,
author = {Yu Chen and Ke Yi},
title = {{Random Sampling and Size Estimation Over Cyclic Joins}},
booktitle = {23rd International Conference on Database Theory (ICDT 2020)},
pages = {7:1--7:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-139-9},
ISSN = {1868-8969},
year = {2020},
volume = {155},
editor = {Carsten Lutz and Jean Christoph Jung},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/11931},
URN = {urn:nbn:de:0030-drops-119315},
doi = {10.4230/LIPIcs.ICDT.2020.7},
annote = {Keywords: Random sampling, joins, join size estimation}
}
Keywords: |
|
Random sampling, joins, join size estimation |
Collection: |
|
23rd International Conference on Database Theory (ICDT 2020) |
Issue Date: |
|
2020 |
Date of publication: |
|
11.03.2020 |
Supplementary Material: |
|
Video of the Presentation: https://doi.org/10.5446/46833 |