Search CORE

30,588 research outputs found

Noisy population recovery in polynomial time

Author: De Anindya
Saks Michael
Tang Sijian
Publication venue
Publication date: 24/02/2016
Field of study

In the noisy population recovery problem of Dvir et al., the goal is to learn an unknown distribution

f

on binary strings of length

n

from noisy samples. For some parameter

\mu \in [0,1]

, a noisy sample is generated by flipping each coordinate of a sample from

f

independently with probability

(1-\mu)/2

. We assume an upper bound

k

on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error

\varepsilon

. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We show that for

\mu > 0

, the sample complexity (and hence the algorithmic complexity) is bounded by a polynomial in

k

n

and

1/\varepsilon

improving upon the previous best result of

\mathsf{poly}(k^{\log\log k},n,1/\varepsilon)

due to Lovett and Zhang. Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated} version of M\"{o}bius inversion. In turn, the latter crucially uses the construction of \emph{robust local inverse} due to Moitra and Saks

arXiv.org e-Print Archive

Crossref

Phase Transitions in the Pooled Data Problem

Author: Cevher Volkan
Scarlett Jonathan
Publication venue
Publication date: 05/09/2017
Field of study

In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete failure. In addition, we present a novel noisy variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an approximate recovery setting, where a given number of errors is allowed in the decoded labels.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Non-Asymptotic Analysis of Tangent Space Perturbation

Author: Kaslovsky Daniel N.
Meyer Francois G.
Publication venue
Publication date: 05/12/2013
Field of study

Constructing an efficient parameterization of a large, noisy data set of points lying close to a smooth manifold in high dimension remains a fundamental problem. One approach consists in recovering a local parameterization using the local tangent plane. Principal component analysis (PCA) is often the tool of choice, as it returns an optimal basis in the case of noise-free samples from a linear subspace. To process noisy data samples from a nonlinear manifold, PCA must be applied locally, at a scale small enough such that the manifold is approximately linear, but at a scale large enough such that structure may be discerned from noise. Using eigenspace perturbation theory and non-asymptotic random matrix theory, we study the stability of the subspace estimated by PCA as a function of scale, and bound (with high probability) the angle it forms with the true tangent space. By adaptively selecting the scale that minimizes this bound, our analysis reveals an appropriate scale for local tangent plane recovery. We also introduce a geometric uncertainty principle quantifying the limits of noise-curvature perturbation for stable recovery. With the purpose of providing perturbation bounds that can be used in practice, we propose plug-in estimates that make it possible to directly apply the theoretical results to real data sets.Comment: 53 pages. Revised manuscript with new content addressing application of results to real data set

arXiv.org e-Print Archive

CiteSeerX