27,296 research outputs found
Fundamental limits of symmetric low-rank matrix estimation
We consider the high-dimensional inference problem where the signal is a
low-rank symmetric matrix which is corrupted by an additive Gaussian noise.
Given a probabilistic model for the low-rank matrix, we compute the limit in
the large dimension setting for the mutual information between the signal and
the observations, as well as the matrix minimum mean square error, while the
rank of the signal remains constant. We also show that our model extends beyond
the particular case of additive Gaussian noise and we prove an universality
result connecting the community detection problem to our Gaussian framework. We
unify and generalize a number of recent works on PCA, sparse PCA, submatrix
localization or community detection by computing the information-theoretic
limits for these problems in the high noise regime. In addition, we show that
the posterior distribution of the signal given the observations is
characterized by a parameter of the same dimension as the square of the rank of
the signal (i.e. scalar in the case of rank one). Finally, we connect our work
with the hard but detectable conjecture in statistical physics
Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula
Factorizing low-rank matrices has many applications in machine learning and
statistics. For probabilistic models in the Bayes optimal setting, a general
expression for the mutual information has been proposed using heuristic
statistical physics computations, and proven in few specific cases. Here, we
show how to rigorously prove the conjectured formula for the symmetric rank-one
case. This allows to express the minimal mean-square-error and to characterize
the detectability phase transitions in a large set of estimation problems
ranging from community detection to sparse PCA. We also show that for a large
set of parameters, an iterative algorithm called approximate message-passing is
Bayes optimal. There exists, however, a gap between what currently known
polynomial algorithms can do and what is expected information theoretically.
Additionally, the proof technique has an interest of its own and exploits three
essential ingredients: the interpolation method introduced in statistical
physics by Guerra, the analysis of the approximate message-passing algorithm
and the theory of spatial coupling and threshold saturation in coding. Our
approach is generic and applicable to other open problems in statistical
estimation where heuristic statistical physics predictions are available
The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models
In this contribution we give a pedagogic introduction to the newly introduced
adaptive interpolation method to prove in a simple and unified way replica
formulas for Bayesian optimal inference problems. Many aspects of this method
can already be explained at the level of the simple Curie-Weiss spin system.
This provides a new method of solution for this model which does not appear to
be known. We then generalize this analysis to a paradigmatic inference problem,
namely rank-one matrix estimation, also refered to as the Wigner spike model in
statistics. We give many pointers to the recent literature where the method has
been succesfully applied
On the regularity of the covariance matrix of a discretized scalar field on the sphere
We present a comprehensive study of the regularity of the covariance matrix
of a discretized field on the sphere. In a particular situation, the rank of
the matrix depends on the number of pixels, the number of spherical harmonics,
the symmetries of the pixelization scheme and the presence of a mask. Taking
into account the above mentioned components, we provide analytical expressions
that constrain the rank of the matrix. They are obtained by expanding the
determinant of the covariance matrix as a sum of determinants of matrices made
up of spherical harmonics. We investigate these constraints for five different
pixelizations that have been used in the context of Cosmic Microwave Background
(CMB) data analysis: Cube, Icosahedron, Igloo, GLESP and HEALPix, finding that,
at least in the considered cases, the HEALPix pixelization tends to provide a
covariance matrix with a rank closer to the maximum expected theoretical value
than the other pixelizations. The effect of the propagation of numerical errors
in the regularity of the covariance matrix is also studied for different
computational precisions, as well as the effect of adding a certain level of
noise in order to regularize the matrix. In addition, we investigate the
application of the previous results to a particular example that requires the
inversion of the covariance matrix: the estimation of the CMB temperature power
spectrum through the Quadratic Maximum Likelihood algorithm. Finally, some
general considerations in order to achieve a regular covariance matrix are also
presented.Comment: 36 pages, 12 figures; minor changes in the text, matches published
versio
Learning Graphs from Linear Measurements: Fundamental Trade-offs and Applications
We consider a specific graph learning task: reconstructing a symmetric matrix that represents an underlying graph using linear measurements. We present a sparsity characterization for distributions of random graphs (that are allowed to contain high-degree nodes), based on which we study fundamental trade-offs between the number of measurements, the complexity of the graph class, and the probability of error. We first derive a necessary condition on the number of measurements. Then, by considering a three-stage recovery scheme, we give a sufficient condition for recovery. Furthermore, assuming the measurements are Gaussian IID, we prove upper and lower bounds on the (worst-case) sample complexity for both noisy and noiseless recovery. In the special cases of the uniform distribution on trees with n nodes and the Erdős-Rényi (n,p) class, the fundamental trade-offs are tight up to multiplicative factors with noiseless measurements. In addition, for practical applications, we design and implement a polynomial-time (in n ) algorithm based on the three-stage recovery scheme. Experiments show that the heuristic algorithm outperforms basis pursuit on star graphs. We apply the heuristic algorithm to learn admittance matrices in electric grids. Simulations for several canonical graph classes and IEEE power system test cases demonstrate the effectiveness and robustness of the proposed algorithm for parameter reconstruction
- …