License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.APPROX/RANDOM.2022.27
URN: urn:nbn:de:0030-drops-171497
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2022/17149/
Chakraborty, Sourav ;
Fischer, Eldar ;
Ghosh, Arijit ;
Mishra, Gopinath ;
Sen, Sayantan
Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing
Abstract
The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted.
This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap.
When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(√n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using ?(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests.
Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only ?(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.
BibTeX - Entry
@InProceedings{chakraborty_et_al:LIPIcs.APPROX/RANDOM.2022.27,
author = {Chakraborty, Sourav and Fischer, Eldar and Ghosh, Arijit and Mishra, Gopinath and Sen, Sayantan},
title = {{Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing}},
booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)},
pages = {27:1--27:23},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-249-5},
ISSN = {1868-8969},
year = {2022},
volume = {245},
editor = {Chakrabarti, Amit and Swamy, Chaitanya},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/17149},
URN = {urn:nbn:de:0030-drops-171497},
doi = {10.4230/LIPIcs.APPROX/RANDOM.2022.27},
annote = {Keywords: Distribution Testing, Tolerant Testing, Non-tolerant Testing, Sample Complexity}
}
Keywords: |
|
Distribution Testing, Tolerant Testing, Non-tolerant Testing, Sample Complexity |
Collection: |
|
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022) |
Issue Date: |
|
2022 |
Date of publication: |
|
15.09.2022 |