License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ESA.2016.32
URN: urn:nbn:de:0030-drops-63838
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2016/6383/
Crouch, Michael ;
McGregor, Andrew ;
Valiant, Gregory ;
Woodruff, David P.
Stochastic Streams: Sample Complexity vs. Space Complexity
Abstract
We address the trade-off between the computational resources needed to process a large data set and the number of samples available from the data set. Specifically, we consider the following abstraction: we receive a potentially infinite stream of IID samples from some unknown distribution D, and are tasked with computing some function f(D). If the stream is observed for time t, how much memory, s, is required to estimate f(D)? We refer to t as the sample complexity and s as the space complexity. The main focus of this paper is investigating the trade-offs between the space and sample complexity. We study these trade-offs for several canonical problems studied in the data stream model: estimating the collision probability, i.e., the second moment of a distribution, deciding if a graph is connected, and approximating the dimension of an unknown subspace. Our results are based on techniques for simulating different classical sampling procedures in this model, emulating random walks given a sequence of IID samples, as well as leveraging a characterization between communication bounded protocols and statistical query algorithms.
BibTeX - Entry
@InProceedings{crouch_et_al:LIPIcs:2016:6383,
author = {Michael Crouch and Andrew McGregor and Gregory Valiant and David P. Woodruff},
title = {{Stochastic Streams: Sample Complexity vs. Space Complexity}},
booktitle = {24th Annual European Symposium on Algorithms (ESA 2016)},
pages = {32:1--32:15},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-015-6},
ISSN = {1868-8969},
year = {2016},
volume = {57},
editor = {Piotr Sankowski and Christos Zaroliagis},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2016/6383},
URN = {urn:nbn:de:0030-drops-63838},
doi = {10.4230/LIPIcs.ESA.2016.32},
annote = {Keywords: data streams, sample complexity, frequency moments, graph connectivity, subspace approximation}
}
Keywords: |
|
data streams, sample complexity, frequency moments, graph connectivity, subspace approximation |
Collection: |
|
24th Annual European Symposium on Algorithms (ESA 2016) |
Issue Date: |
|
2016 |
Date of publication: |
|
18.08.2016 |