3,370 research outputs found
Covariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive. We provide a survey of the
progress made in modeling covariance matrices from two relatively complementary
perspectives: (1) generalized linear models (GLM) or parsimony and use of
covariates in low dimensions, and (2) regularization or sparsity for
high-dimensional data. An emerging, unifying and powerful trend in both
perspectives is that of reducing a covariance estimation problem to that of
estimating a sequence of regression problems. We point out several instances of
the regression-based formulation. A notable case is in sparse estimation of a
precision matrix or a Gaussian graphical model leading to the fast graphical
LASSO algorithm. Some advantages and limitations of the regression-based
Cholesky decomposition relative to the classical spectral (eigenvalue) and
variance-correlation decompositions are highlighted. The former provides an
unconstrained and statistically interpretable reparameterization, and
guarantees the positive-definiteness of the estimated covariance matrix. It
reduces the unintuitive task of covariance estimation to that of modeling a
sequence of regressions at the cost of imposing an a priori order among the
variables. Elementwise regularization of the sample covariance matrix such as
banding, tapering and thresholding has desirable asymptotic properties and the
sparse estimated covariance matrix is positive definite with probability
tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sufficient Dimension Reduction and Modeling Responses Conditioned on Covariates: An Integrated Approach via Convex Optimization
Given observations of a collection of covariates and responses , sufficient dimension reduction (SDR)
techniques aim to identify a mapping
with such that is independent of . The image
summarizes the relevant information in a potentially large number of covariates
that influence the responses . In many contemporary settings, the number
of responses is also quite large, in addition to a large number of
covariates. This leads to the challenge of fitting a succinctly parameterized
statistical model to , which is a problem that is usually not addressed
in a traditional SDR framework. In this paper, we present a computationally
tractable convex relaxation based estimator for simultaneously (a) identifying
a linear dimension reduction of the covariates that is sufficient with
respect to the responses, and (b) fitting several types of structured
low-dimensional models -- factor models, graphical models, latent-variable
graphical models -- to the conditional distribution of . We analyze the
consistency properties of our estimator in a high-dimensional scaling regime.
We also illustrate the performance of our approach on a newsgroup dataset and
on a dataset consisting of financial asset prices.Comment: 34 pages, 1 figur
Brain covariance selection: better individual functional connectivity models using population prior
Spontaneous brain activity, as observed in functional neuroimaging, has been
shown to display reproducible structure that expresses brain architecture and
carries markers of brain pathologies. An important view of modern neuroscience
is that such large-scale structure of coherent activity reflects modularity
properties of brain connectivity graphs. However, to date, there has been no
demonstration that the limited and noisy data available in spontaneous activity
observations could be used to learn full-brain probabilistic models that
generalize to new data. Learning such models entails two main challenges: i)
modeling full brain connectivity is a difficult estimation problem that faces
the curse of dimensionality and ii) variability between subjects, coupled with
the variability of functional signals between experimental runs, makes the use
of multiple datasets challenging. We describe subject-level brain functional
connectivity structure as a multivariate Gaussian process and introduce a new
strategy to estimate it from group data, by imposing a common structure on the
graphical model in the population. We show that individual models learned from
functional Magnetic Resonance Imaging (fMRI) data using this population prior
generalize better to unseen data than models based on alternative
regularization schemes. To our knowledge, this is the first report of a
cross-validated model of spontaneous brain activity. Finally, we use the
estimated graphical model to explore the large-scale characteristics of
functional architecture and show for the first time that known cognitive
networks appear as the integrated communities of functional connectivity graph.Comment: in Advances in Neural Information Processing Systems, Vancouver :
Canada (2010
- …