License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.APPROX-RANDOM.2019.44
URN: urn:nbn:de:0030-drops-112592
URL: http://dagstuhl.sunsite.rwth-aachen.de/volltexte/2019/11259/
Ban, Frank ;
Chen, Xi ;
Servedio, Rocco A. ;
Sinha, Sandip
Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions
Abstract
A number of recent works have considered the trace reconstruction problem, in which an unknown source string x in {0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n^{1/3})) traces [De et al., 2017; Fedor Nazarov and Yuval Peres, 2017], several highly efficient algorithms are known [Yuval Peres and Alex Zhai, 2017; Nina Holden et al., 2018] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0,1}^n. In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x^1,...,x^s in {0,1}^n, and each sample given to the algorithm is independently generated by drawing some x^i from this distribution and returning an independent trace of x^i. Building on the results of [Yuval Peres and Alex Zhai, 2017] and [Nina Holden et al., 2018], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 <= s <= exp(Theta(n^{1/3})), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s} subset {0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm can efficiently recover D up to total variation distance at most epsilon with high probability, given access to independent traces of independent draws from D. The running time of our algorithm is poly(n,s,1/epsilon) and its sample complexity is poly (s,1/epsilon,exp(log^{1/3} n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm [Frank Ban et al., 2019] is doubly exponential in s.
BibTeX - Entry
@InProceedings{ban_et_al:LIPIcs:2019:11259,
author = {Frank Ban and Xi Chen and Rocco A. Servedio and Sandip Sinha},
title = {{Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions}},
booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)},
pages = {44:1--44:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-125-2},
ISSN = {1868-8969},
year = {2019},
volume = {145},
editor = {Dimitris Achlioptas and L{\'a}szl{\'o} A. V{\'e}gh},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2019/11259},
URN = {urn:nbn:de:0030-drops-112592},
doi = {10.4230/LIPIcs.APPROX-RANDOM.2019.44},
annote = {Keywords: population recovery, deletion channel, trace reconstruction}
}
Keywords: |
|
population recovery, deletion channel, trace reconstruction |
Collection: |
|
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019) |
Issue Date: |
|
2019 |
Date of publication: |
|
17.09.2019 |