3,370 research outputs found

    Covariance Estimation: The GLM and Regularization Perspectives

    Get PDF
    Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sufficient Dimension Reduction and Modeling Responses Conditioned on Covariates: An Integrated Approach via Convex Optimization

    Get PDF
    Given observations of a collection of covariates and responses (Y,X)∈Rp×Rq(Y, X) \in \mathbb{R}^p \times \mathbb{R}^q, sufficient dimension reduction (SDR) techniques aim to identify a mapping f:Rq→Rkf: \mathbb{R}^q \rightarrow \mathbb{R}^k with k≪qk \ll q such that Y∣f(X)Y|f(X) is independent of XX. The image f(X)f(X) summarizes the relevant information in a potentially large number of covariates XX that influence the responses YY. In many contemporary settings, the number of responses pp is also quite large, in addition to a large number qq of covariates. This leads to the challenge of fitting a succinctly parameterized statistical model to Y∣f(X)Y|f(X), which is a problem that is usually not addressed in a traditional SDR framework. In this paper, we present a computationally tractable convex relaxation based estimator for simultaneously (a) identifying a linear dimension reduction f(X)f(X) of the covariates that is sufficient with respect to the responses, and (b) fitting several types of structured low-dimensional models -- factor models, graphical models, latent-variable graphical models -- to the conditional distribution of Y∣f(X)Y|f(X). We analyze the consistency properties of our estimator in a high-dimensional scaling regime. We also illustrate the performance of our approach on a newsgroup dataset and on a dataset consisting of financial asset prices.Comment: 34 pages, 1 figur

    Brain covariance selection: better individual functional connectivity models using population prior

    Get PDF
    Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.Comment: in Advances in Neural Information Processing Systems, Vancouver : Canada (2010
    • …
    corecore