1,107 research outputs found

    A Polynomial Time Algorithm for Lossy Population Recovery

    Full text link
    We give a polynomial time algorithm for the lossy population recovery problem. In this problem, the goal is to approximately learn an unknown distribution on binary strings of length nn from lossy samples: for some parameter μ\mu each coordinate of the sample is preserved with probability μ\mu and otherwise is replaced by a `?'. The running time and number of samples needed for our algorithm is polynomial in nn and 1/ε1/\varepsilon for each fixed μ>0\mu>0. This improves on algorithm of Wigderson and Yehudayoff that runs in quasi-polynomial time for any μ>0\mu > 0 and the polynomial time algorithm of Dvir et al which was shown to work for μ0.30\mu \gtrapprox 0.30 by Batman et al. In fact, our algorithm also works in the more general framework of Batman et al. in which there is no a priori bound on the size of the support of the distribution. The algorithm we analyze is implicit in previous work; our main contribution is to analyze the algorithm by showing (via linear programming duality and connections to complex analysis) that a certain matrix associated with the problem has a robust local inverse even though its condition number is exponentially small. A corollary of our result is the first polynomial time algorithm for learning DNFs in the restriction access model of Dvir et al

    Noisy population recovery in polynomial time

    Full text link
    In the noisy population recovery problem of Dvir et al., the goal is to learn an unknown distribution ff on binary strings of length nn from noisy samples. For some parameter μ[0,1]\mu \in [0,1], a noisy sample is generated by flipping each coordinate of a sample from ff independently with probability (1μ)/2(1-\mu)/2. We assume an upper bound kk on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error ε\varepsilon. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We show that for μ>0\mu > 0, the sample complexity (and hence the algorithmic complexity) is bounded by a polynomial in kk, nn and 1/ε1/\varepsilon improving upon the previous best result of poly(kloglogk,n,1/ε)\mathsf{poly}(k^{\log\log k},n,1/\varepsilon) due to Lovett and Zhang. Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated} version of M\"{o}bius inversion. In turn, the latter crucially uses the construction of \emph{robust local inverse} due to Moitra and Saks

    Efficient Average-Case Population Recovery in the Presence of Insertions and Deletions

    Get PDF
    A number of recent works have considered the trace reconstruction problem, in which an unknown source string x in {0,1}^n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string x from independent traces of x. While the asymptotically best algorithms known for worst-case strings use exp(O(n^{1/3})) traces [De et al., 2017; Fedor Nazarov and Yuval Peres, 2017], several highly efficient algorithms are known [Yuval Peres and Alex Zhai, 2017; Nina Holden et al., 2018] for the average-case version of the problem, in which the source string x is chosen uniformly at random from {0,1}^n. In this paper we consider a generalization of the above-described average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, rather than a single unknown source string there is an unknown distribution over s unknown source strings x^1,...,x^s in {0,1}^n, and each sample given to the algorithm is independently generated by drawing some x^i from this distribution and returning an independent trace of x^i. Building on the results of [Yuval Peres and Alex Zhai, 2017] and [Nina Holden et al., 2018], we give an efficient algorithm for the average-case population recovery problem in the presence of insertions and deletions. For any support size 1 <= s <= exp(Theta(n^{1/3})), for a 1-o(1) fraction of all s-element support sets {x^1,...,x^s} subset {0,1}^n, for every distribution D supported on {x^1,...,x^s}, our algorithm can efficiently recover D up to total variation distance at most epsilon with high probability, given access to independent traces of independent draws from D. The running time of our algorithm is poly(n,s,1/epsilon) and its sample complexity is poly (s,1/epsilon,exp(log^{1/3} n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version of the problem (when x^1,...,x^s may be any strings in {0,1}^n), in which the sample complexity of the most efficient known algorithm [Frank Ban et al., 2019] is doubly exponential in s

    Super-resolution, Extremal Functions and the Condition Number of Vandermonde Matrices

    Get PDF
    Super-resolution is a fundamental task in imaging, where the goal is to extract fine-grained structure from coarse-grained measurements. Here we are interested in a popular mathematical abstraction of this problem that has been widely studied in the statistics, signal processing and machine learning communities. We exactly resolve the threshold at which noisy super-resolution is possible. In particular, we establish a sharp phase transition for the relationship between the cutoff frequency (mm) and the separation (Δ\Delta). If m>1/Δ+1m > 1/\Delta + 1, our estimator converges to the true values at an inverse polynomial rate in terms of the magnitude of the noise. And when m<(1ϵ)/Δm < (1-\epsilon) /\Delta no estimator can distinguish between a particular pair of Δ\Delta-separated signals even if the magnitude of the noise is exponentially small. Our results involve making novel connections between {\em extremal functions} and the spectral properties of Vandermonde matrices. We establish a sharp phase transition for their condition number which in turn allows us to give the first noise tolerance bounds for the matrix pencil method. Moreover we show that our methods can be interpreted as giving preconditioners for Vandermonde matrices, and we use this observation to design faster algorithms for super-resolution. We believe that these ideas may have other applications in designing faster algorithms for other basic tasks in signal processing.Comment: 19 page

    Finding heavy hitters from lossy or noisy data

    Get PDF
    Abstract. Motivated by Dvir et al. and Wigderson and Yehudayoff [3

    Noise-Resilient Group Testing: Limitations and Constructions

    Full text link
    We study combinatorial group testing schemes for learning dd-sparse Boolean vectors using highly unreliable disjunctive measurements. We consider an adversarial noise model that only limits the number of false observations, and show that any noise-resilient scheme in this model can only approximately reconstruct the sparse vector. On the positive side, we take this barrier to our advantage and show that approximate reconstruction (within a satisfactory degree of approximation) allows us to break the information theoretic lower bound of Ω~(d2logn)\tilde{\Omega}(d^2 \log n) that is known for exact reconstruction of dd-sparse vectors of length nn via non-adaptive measurements, by a multiplicative factor Ω~(d)\tilde{\Omega}(d). Specifically, we give simple randomized constructions of non-adaptive measurement schemes, with m=O(dlogn)m=O(d \log n) measurements, that allow efficient reconstruction of dd-sparse vectors up to O(d)O(d) false positives even in the presence of δm\delta m false positives and O(m/d)O(m/d) false negatives within the measurement outcomes, for any constant δ<1\delta < 1. We show that, information theoretically, none of these parameters can be substantially improved without dramatically affecting the others. Furthermore, we obtain several explicit constructions, in particular one matching the randomized trade-off but using m=O(d1+o(1)logn)m = O(d^{1+o(1)} \log n) measurements. We also obtain explicit constructions that allow fast reconstruction in time \poly(m), which would be sublinear in nn for sufficiently sparse vectors. The main tool used in our construction is the list-decoding view of randomness condensers and extractors.Comment: Full version. A preliminary summary of this work appears (under the same title) in proceedings of the 17th International Symposium on Fundamentals of Computation Theory (FCT 2009
    corecore