1,886 research outputs found
Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
We investigate the relationship between the structure of a discrete graphical
model and the support of the inverse of a generalized covariance matrix. We
show that for certain graph structures, the support of the inverse covariance
matrix of indicator variables on the vertices of a graph reflects the
conditional independence structure of the graph. Our work extends results that
have previously been established only in the context of multivariate Gaussian
graphical models, thereby addressing an open question about the significance of
the inverse covariance matrix of a non-Gaussian distribution. The proof
exploits a combination of ideas from the geometry of exponential families,
junction tree theory and convex analysis. These population-level results have
various consequences for graph selection methods, both known and novel,
including a novel method for structure estimation for missing or corrupted
observations. We provide nonasymptotic guarantees for such methods and
illustrate the sharpness of these predictions via simulations.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1162 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
- …