Search CORE

1,107 research outputs found

A Polynomial Time Algorithm for Lossy Population Recovery

Author: Moitra Ankur
Saks Michael
Publication venue
Publication date: 01/01/2013
Field of study

We give a polynomial time algorithm for the lossy population recovery problem. In this problem, the goal is to approximately learn an unknown distribution on binary strings of length

n

from lossy samples: for some parameter

\mu

each coordinate of the sample is preserved with probability

\mu

and otherwise is replaced by a `?'. The running time and number of samples needed for our algorithm is polynomial in

n

and

1/\varepsilon

for each fixed

\mu>0

. This improves on algorithm of Wigderson and Yehudayoff that runs in quasi-polynomial time for any

\mu > 0

and the polynomial time algorithm of Dvir et al which was shown to work for

\mu \gtrapprox 0.30

by Batman et al. In fact, our algorithm also works in the more general framework of Batman et al. in which there is no a priori bound on the size of the support of the distribution. The algorithm we analyze is implicit in previous work; our main contribution is to analyze the algorithm by showing (via linear programming duality and connections to complex analysis) that a certain matrix associated with the problem has a robust local inverse even though its condition number is exponentially small. A corollary of our result is the first polynomial time algorithm for learning DNFs in the restriction access model of Dvir et al

arXiv.org e-Print Archive

CiteSeerX

Crossref

Noisy population recovery in polynomial time

Author: De Anindya
Saks Michael
Tang Sijian
Publication venue
Publication date: 24/02/2016
Field of study

In the noisy population recovery problem of Dvir et al., the goal is to learn an unknown distribution

f

on binary strings of length

n

from noisy samples. For some parameter

\mu \in [0,1]

, a noisy sample is generated by flipping each coordinate of a sample from

f

independently with probability

(1-\mu)/2

. We assume an upper bound

k

on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error

\varepsilon

. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We show that for

\mu > 0

, the sample complexity (and hence the algorithmic complexity) is bounded by a polynomial in

k

n

and

1/\varepsilon

improving upon the previous best result of

\mathsf{poly}(k^{\log\log k},n,1/\varepsilon)

due to Lovett and Zhang. Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated} version of M\"{o}bius inversion. In turn, the latter crucially uses the construction of \emph{robust local inverse} due to Moitra and Saks

arXiv.org e-Print Archive

Crossref

Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions

Author: Ban Frank
Chen Xi
Servedio Rocco A.
Sinha Sandip
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

A number of recent works have considered the trace reconstruction problem, in which an unknown source string x in {0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n^{1/3})) traces [De et al., 2017; Fedor Nazarov and Yuval Peres, 2017], several highly efficient algorithms are known [Yuval Peres and Alex Zhai, 2017; Nina Holden et al., 2018] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0,1}^n. In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x^1,...,x^s in {0,1}^n, and each sample given to the algorithm is independently generated by drawing some x^i from this distribution and returning an independent trace of x^i. Building on the results of [Yuval Peres and Alex Zhai, 2017] and [Nina Holden et al., 2018], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 <= s <= exp(Theta(n^{1/3})), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s} subset {0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm can efficiently recover D up to total variation distance at most epsilon with high probability, given access to independent traces of independent draws from D. The running time of our algorithm is poly(n,s,1/epsilon) and its sample complexity is poly (s,1/epsilon,exp(log^{1/3} n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm [Frank Ban et al., 2019] is doubly exponential in s

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices

Author: Beurling A.
Chandrasekaran V.
Cheney E.
Do Ba K.
Fernandez-Granda C.
Horn R.
Hua Y.
Montgomery H.
Park S.
Stewart G.
Timan A.
Publication venue
Publication date: 01/04/2015
Field of study

Super-resolution is a fundamental task in imaging, where the goal is to extract fine-grained structure from coarse-grained measurements. Here we are interested in a popular mathematical abstraction of this problem that has been widely studied in the statistics, signal processing and machine learning communities. We exactly resolve the threshold at which noisy super-resolution is possible. In particular, we establish a sharp phase transition for the relationship between the cutoff frequency (

m

) and the separation (

\Delta

). If

m > 1/\Delta + 1

, our estimator converges to the true values at an inverse polynomial rate in terms of the magnitude of the noise. And when

m < (1-\epsilon) /\Delta

no estimator can distinguish between a particular pair of

\Delta

-separated signals even if the magnitude of the noise is exponentially small. Our results involve making novel connections between {\em extremal functions} and the spectral properties of Vandermonde matrices. We establish a sharp phase transition for their condition number which in turn allows us to give the first noise tolerance bounds for the matrix pencil method. Moreover we show that our methods can be interpreted as giving preconditioners for Vandermonde matrices, and we use this observation to design faster algorithms for super-resolution. We believe that these ideas may have other applications in designing faster algorithms for other basic tasks in signal processing.Comment: 19 page

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Finding heavy hitters from lossy or noisy data

Author: Cody Murray
Lucia Batman
Ramamohan Paturi
Russell Impagliazzo
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2013
Field of study

Abstract. Motivated by Dvir et al. and Wigderson and Yehudayoff [3

CiteSeerX

Noise-Resilient Group Testing: Limitations and Constructions

Author: A. Bonis De
A. Dyachkov
A. Macula
A. Ta-Shma
A.G. D’yachkov
B. Chlebus
D. Eppstein
D.-Z. Du
D.Z. Du
E. Knill
L. Trevisan
R. Raz
Ruszinkó
T. Berger
W. Kautz
Y. Cheng
Z. Füredi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We study combinatorial group testing schemes for learning

d

-sparse Boolean vectors using highly unreliable disjunctive measurements. We consider an adversarial noise model that only limits the number of false observations, and show that any noise-resilient scheme in this model can only approximately reconstruct the sparse vector. On the positive side, we take this barrier to our advantage and show that approximate reconstruction (within a satisfactory degree of approximation) allows us to break the information theoretic lower bound of

\tilde{\Omega}(d^2 \log n)

that is known for exact reconstruction of

d

-sparse vectors of length

n

via non-adaptive measurements, by a multiplicative factor

\tilde{\Omega}(d)

. Specifically, we give simple randomized constructions of non-adaptive measurement schemes, with

m=O(d \log n)

measurements, that allow efficient reconstruction of

d

-sparse vectors up to

O(d)

false positives even in the presence of

\delta m

false positives and

O(m/d)

false negatives within the measurement outcomes, for any constant

\delta < 1

. We show that, information theoretically, none of these parameters can be substantially improved without dramatically affecting the others. Furthermore, we obtain several explicit constructions, in particular one matching the randomized trade-off but using

m = O(d^{1+o(1)} \log n)

measurements. We also obtain explicit constructions that allow fast reconstruction in time \poly(m), which would be sublinear in

n

for sufficiently sparse vectors. The main tool used in our construction is the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the same title) in proceedings of the 17th International Symposium on Fundamentals of Computation Theory (FCT 2009

arXiv.org e-Print Archive

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository