License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.APPROX/RANDOM.2020.24
URN: urn:nbn:de:0030-drops-126277
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2020/12627/
Canonne, Clément L. ;
Wimmer, Karl
Testing Data Binnings
Abstract
Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) ? and samples from an unknown distribution ?, both over [n] = {1,2,… ,n}, whether ? equals ?, or is significantly different from it.
In this paper, we introduce the related question of identity up to binning, where the reference distribution ? is over k ≪ n elements: the question is then whether there exists a suitable binning of the domain [n] into k intervals such that, once "binned," ? is equal to ?. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne [Clément L. Canonne, 2019]. Finally, we discuss several extensions and related research directions.
BibTeX - Entry
@InProceedings{canonne_et_al:LIPIcs:2020:12627,
author = {Cl{\'e}ment L. Canonne and Karl Wimmer},
title = {{Testing Data Binnings}},
booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)},
pages = {24:1--24:13},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-164-1},
ISSN = {1868-8969},
year = {2020},
volume = {176},
editor = {Jaros{\l}aw Byrka and Raghu Meka},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12627},
URN = {urn:nbn:de:0030-drops-126277},
doi = {10.4230/LIPIcs.APPROX/RANDOM.2020.24},
annote = {Keywords: property testing, distribution testing, identity testing, hypothesis testing}
}
Keywords: |
|
property testing, distribution testing, identity testing, hypothesis testing |
Collection: |
|
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020) |
Issue Date: |
|
2020 |
Date of publication: |
|
11.08.2020 |