4,015 research outputs found

    Phase Transitions in the Pooled Data Problem

    Get PDF
    In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete failure. In addition, we present a novel noisy variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an approximate recovery setting, where a given number of errors is allowed in the decoded labels.Comment: Accepted to NIPS 201

    Fundamental limits of symmetric low-rank matrix estimation

    Full text link
    We consider the high-dimensional inference problem where the signal is a low-rank symmetric matrix which is corrupted by an additive Gaussian noise. Given a probabilistic model for the low-rank matrix, we compute the limit in the large dimension setting for the mutual information between the signal and the observations, as well as the matrix minimum mean square error, while the rank of the signal remains constant. We also show that our model extends beyond the particular case of additive Gaussian noise and we prove an universality result connecting the community detection problem to our Gaussian framework. We unify and generalize a number of recent works on PCA, sparse PCA, submatrix localization or community detection by computing the information-theoretic limits for these problems in the high noise regime. In addition, we show that the posterior distribution of the signal given the observations is characterized by a parameter of the same dimension as the square of the rank of the signal (i.e. scalar in the case of rank one). Finally, we connect our work with the hard but detectable conjecture in statistical physics

    Testing Conditional Independence of Discrete Distributions

    Full text link
    We study the problem of testing \emph{conditional independence} for discrete distributions. Specifically, given samples from a discrete random variable (X,Y,Z)(X, Y, Z) on domain [ℓ1]×[ℓ2]×[n][\ell_1]\times[\ell_2] \times [n], we want to distinguish, with probability at least 2/32/3, between the case that XX and YY are conditionally independent given ZZ from the case that (X,Y,Z)(X, Y, Z) is ϵ\epsilon-far, in ℓ1\ell_1-distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of XX and YY are binary. The main algorithmic result of this work is the first conditional independence tester with {\em sublinear} sample complexity for discrete distributions over [ℓ1]×[ℓ2]×[n][\ell_1]\times[\ell_2] \times [n]. To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when ℓ1,ℓ2=O(1)\ell_1, \ell_2 = O(1), we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is \[ \Theta\left({\max\left(n^{1/2}/\epsilon^2,\min\left(n^{7/8}/\epsilon,n^{6/7}/\epsilon^{8/7}\right)\right)}\right)\,. \
    • …
    corecore