1,107 research outputs found
A Polynomial Time Algorithm for Lossy Population Recovery
We give a polynomial time algorithm for the lossy population recovery
problem. In this problem, the goal is to approximately learn an unknown
distribution on binary strings of length from lossy samples: for some
parameter each coordinate of the sample is preserved with probability
and otherwise is replaced by a `?'. The running time and number of
samples needed for our algorithm is polynomial in and for
each fixed . This improves on algorithm of Wigderson and Yehudayoff that
runs in quasi-polynomial time for any and the polynomial time
algorithm of Dvir et al which was shown to work for by
Batman et al. In fact, our algorithm also works in the more general framework
of Batman et al. in which there is no a priori bound on the size of the support
of the distribution. The algorithm we analyze is implicit in previous work; our
main contribution is to analyze the algorithm by showing (via linear
programming duality and connections to complex analysis) that a certain matrix
associated with the problem has a robust local inverse even though its
condition number is exponentially small. A corollary of our result is the first
polynomial time algorithm for learning DNFs in the restriction access model of
Dvir et al
Noisy population recovery in polynomial time
In the noisy population recovery problem of Dvir et al., the goal is to learn
an unknown distribution on binary strings of length from noisy samples.
For some parameter , a noisy sample is generated by flipping
each coordinate of a sample from independently with probability
. We assume an upper bound on the size of the support of the
distribution, and the goal is to estimate the probability of any string to
within some given error . It is known that the algorithmic
complexity and sample complexity of this problem are polynomially related to
each other.
We show that for , the sample complexity (and hence the algorithmic
complexity) is bounded by a polynomial in , and
improving upon the previous best result of due to Lovett and Zhang.
Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated}
version of M\"{o}bius inversion. In turn, the latter crucially uses the
construction of \emph{robust local inverse} due to Moitra and Saks
Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions
A number of recent works have considered the trace reconstruction problem, in which an unknown source string x in {0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n^{1/3})) traces [De et al., 2017; Fedor Nazarov and Yuval Peres, 2017], several highly efficient algorithms are known [Yuval Peres and Alex Zhai, 2017; Nina Holden et al., 2018] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0,1}^n. In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x^1,...,x^s in {0,1}^n, and each sample given to the algorithm is independently generated by drawing some x^i from this distribution and returning an independent trace of x^i. Building on the results of [Yuval Peres and Alex Zhai, 2017] and [Nina Holden et al., 2018], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 <= s <= exp(Theta(n^{1/3})), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s} subset {0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm can efficiently recover D up to total variation distance at most epsilon with high probability, given access to independent traces of independent draws from D. The running time of our algorithm is poly(n,s,1/epsilon) and its sample complexity is poly (s,1/epsilon,exp(log^{1/3} n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm [Frank Ban et al., 2019] is doubly exponential in s
Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices
Super-resolution is a fundamental task in imaging, where the goal is to
extract fine-grained structure from coarse-grained measurements. Here we are
interested in a popular mathematical abstraction of this problem that has been
widely studied in the statistics, signal processing and machine learning
communities. We exactly resolve the threshold at which noisy super-resolution
is possible. In particular, we establish a sharp phase transition for the
relationship between the cutoff frequency () and the separation ().
If , our estimator converges to the true values at an inverse
polynomial rate in terms of the magnitude of the noise. And when no estimator can distinguish between a particular pair of
-separated signals even if the magnitude of the noise is exponentially
small.
Our results involve making novel connections between {\em extremal functions}
and the spectral properties of Vandermonde matrices. We establish a sharp phase
transition for their condition number which in turn allows us to give the first
noise tolerance bounds for the matrix pencil method. Moreover we show that our
methods can be interpreted as giving preconditioners for Vandermonde matrices,
and we use this observation to design faster algorithms for super-resolution.
We believe that these ideas may have other applications in designing faster
algorithms for other basic tasks in signal processing.Comment: 19 page
Finding heavy hitters from lossy or noisy data
Abstract. Motivated by Dvir et al. and Wigderson and Yehudayoff [3
Noise-Resilient Group Testing: Limitations and Constructions
We study combinatorial group testing schemes for learning -sparse Boolean
vectors using highly unreliable disjunctive measurements. We consider an
adversarial noise model that only limits the number of false observations, and
show that any noise-resilient scheme in this model can only approximately
reconstruct the sparse vector. On the positive side, we take this barrier to
our advantage and show that approximate reconstruction (within a satisfactory
degree of approximation) allows us to break the information theoretic lower
bound of that is known for exact reconstruction of
-sparse vectors of length via non-adaptive measurements, by a
multiplicative factor .
Specifically, we give simple randomized constructions of non-adaptive
measurement schemes, with measurements, that allow efficient
reconstruction of -sparse vectors up to false positives even in the
presence of false positives and false negatives within the
measurement outcomes, for any constant . We show that, information
theoretically, none of these parameters can be substantially improved without
dramatically affecting the others. Furthermore, we obtain several explicit
constructions, in particular one matching the randomized trade-off but using measurements. We also obtain explicit constructions
that allow fast reconstruction in time \poly(m), which would be sublinear in
for sufficiently sparse vectors. The main tool used in our construction is
the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the
same title) in proceedings of the 17th International Symposium on
Fundamentals of Computation Theory (FCT 2009
- …