License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.APPROX-RANDOM.2019.29
URN: urn:nbn:de:0030-drops-112443
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/11244/
Jayaram, Rajesh ;
Woodruff, David P.
Towards Optimal Moment Estimation in Streaming and Distributed Models
Abstract
One of the oldest problems in the data stream model is to approximate the p-th moment ||X||_p^p = sum_{i=1}^n X_i^p of an underlying non-negative vector X in R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p in (0,2]. Although a tight space bound of Theta(epsilon^-2 log n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is O(epsilon^-2 log n) bits, while the lower bound is only Omega(epsilon^-2 + log n) bits. Recently, an upper bound of O~(epsilon^-2 + log n) bits was obtained under the assumption that the updates arrive in a random order.
We show that for p in (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of O~(epsilon^-2 + log n) bits for estimating |X |_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p in (1,2], in the natural coordinator and blackboard distributed communication topologies, there is an O~(epsilon^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an O~(epsilon^2 log d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Omega(epsilon^-2 log n) bit lower bound for p in (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter.
BibTeX - Entry
@InProceedings{jayaram_et_al:LIPIcs:2019:11244,
author = {Rajesh Jayaram and David P. Woodruff},
title = {{Towards Optimal Moment Estimation in Streaming and Distributed Models}},
booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
pages = {29:1--29:21},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-125-2},
ISSN = {1868-8969},
year = {2019},
volume = {145},
editor = {Dimitris Achlioptas and L{\'a}szl{\'o} A. V{\'e}gh},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/11244},
URN = {urn:nbn:de:0030-drops-112443},
doi = {10.4230/LIPIcs.APPROX-RANDOM.2019.29},
annote = {Keywords: Streaming, Sketching, Message Passing, Moment Estimation}
}
Keywords: |
|
Streaming, Sketching, Message Passing, Moment Estimation |
Collection: |
|
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019) |
Issue Date: |
|
2019 |
Date of publication: |
|
17.09.2019 |