License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICDT.2015.265
URN: urn:nbn:de:0030-drops-49895
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2015/4989/
Hu, Xiaocheng ;
Tao, Yufei ;
Yang, Yi ;
Zhang, Shengyu ;
Zhou, Shuigeng
On The I/O Complexity of Dynamic Distinct Counting
Abstract
In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size.
In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1).
BibTeX - Entry
@InProceedings{hu_et_al:LIPIcs:2015:4989,
author = {Xiaocheng Hu and Yufei Tao and Yi Yang and Shengyu Zhang and Shuigeng Zhou},
title = {{On The I/O Complexity of Dynamic Distinct Counting}},
booktitle = {18th International Conference on Database Theory (ICDT 2015)},
pages = {265--276},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-939897-79-8},
ISSN = {1868-8969},
year = {2015},
volume = {31},
editor = {Marcelo Arenas and Mart{\'i}n Ugarte},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2015/4989},
URN = {urn:nbn:de:0030-drops-49895},
doi = {10.4230/LIPIcs.ICDT.2015.265},
annote = {Keywords: distinct counting, lower bound, external memory}
}
Keywords: |
|
distinct counting, lower bound, external memory |
Collection: |
|
18th International Conference on Database Theory (ICDT 2015) |
Issue Date: |
|
2015 |
Date of publication: |
|
19.03.2015 |