Search CORE

4,015 research outputs found

Phase Transitions in the Pooled Data Problem

Author: Cevher Volkan
Scarlett Jonathan
Publication venue
Publication date: 05/09/2017
Field of study

In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete failure. In addition, we present a novel noisy variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an approximate recovery setting, where a given number of errors is allowed in the decoded labels.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Fundamental limits of symmetric low-rank matrix estimation

Author: Lelarge Marc
Miolane Léo
Publication venue
Publication date: 30/03/2017
Field of study

We consider the high-dimensional inference problem where the signal is a low-rank symmetric matrix which is corrupted by an additive Gaussian noise. Given a probabilistic model for the low-rank matrix, we compute the limit in the large dimension setting for the mutual information between the signal and the observations, as well as the matrix minimum mean square error, while the rank of the signal remains constant. We also show that our model extends beyond the particular case of additive Gaussian noise and we prove an universality result connecting the community detection problem to our Gaussian framework. We unify and generalize a number of recent works on PCA, sparse PCA, submatrix localization or community detection by computing the information-theoretic limits for these problems in the high noise regime. In addition, we show that the posterior distribution of the signal given the observations is characterized by a parameter of the same dimension as the square of the rank of the signal (i.e. scalar in the case of rank one). Finally, we connect our work with the hard but detectable conjecture in statistical physics

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Testing Conditional Independence of Discrete Distributions

Author: Acharya J.
Batu T.
Bayesian Square Hellinger
Canonne C.
Canonne C. L.
Canonne C. L.
Closeness Near-Optimal
Fisher R. A.
Hardt M.
Mantel N.
Publication venue
Publication date: 01/07/2018
Field of study

We study the problem of testing \emph{conditional independence} for discrete distributions. Specifically, given samples from a discrete random variable

(X, Y, Z)

on domain

[\ell_1]\times[\ell_2] \times [n]

, we want to distinguish, with probability at least

2/3

, between the case that

X

and

Y

are conditionally independent given

Z

from the case that

(X, Y, Z)

\epsilon

-far, in

\ell_1

-distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of

X

and

Y

are binary. The main algorithmic result of this work is the first conditional independence tester with {\em sublinear} sample complexity for discrete distributions over

[\ell_1]\times[\ell_2] \times [n]

. To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings. Specifically, for the prototypical setting when

\ell_1, \ell_2 = O(1)

, we show that the sample complexity of testing conditional independence (upper bound and matching lower bound) is \[ \Theta\left({\max\left(n^{1/2}/\epsilon^2,\min\left(n^{7/8}/\epsilon,n^{6/7}/\epsilon^{8/7}\right)\right)}\right)\,. \

arXiv.org e-Print Archive

Crossref