Search CORE

35,158 research outputs found

Restricted maximum likelihood estimation of covariances in sparse linear models

Author: Groeneveld Eildert
Neumaier Arnold
Publication venue: BioMed Central
Publication date: 01/01/1998
Field of study

This paper discusses the restricted maximum likelihood (REML) approach for the estimation of covariance matrices in linear stochastic models, as implemented in the current version of the VCE package for covariance component estimation in large animal breeding models. The main features are: 1) the representation of the equations in an augmented form that simplifies the implementation; 2) the parametrization of the covariance matrices by means of their Cholesky factors, thus automatically ensuring their positive definiteness; 3) explicit formulas for the gradients of the REML function for the case of large and sparse model equations with a large number of unknown covariance components and possibly incomplete data, using the sparse inverse to obtain the gradients cheaply; 4) use of model equations that make separate formation of the inverse of the numerator relationship matrix unnecessary. Many large scale breeding problems were solved with the new implementation, among them an example with more than 250 000 normal equations and 55 covariance components, taking 41 h CPU time on a Hewlett Packard 755

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Foundational principles for large scale inference: Illustrations through correlation mining

Author: Alfred O. Hero
Alfred O. Hero
Alfred O. Hero
Bala Rajaratnam
Bala Rajaratnam
Bala Rajaratnam
Publication venue
Publication date: 18/05/2015
Field of study

When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number

n

of acquired samples (statistical replicates) is far fewer than the number

p

of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size

n

is fixed, and the dimension

p

grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

arXiv.org e-Print Archive

CiteSeerX

PubMed Central

eScholarship - University of California

Sparse Inverse Covariance Estimation for Chordal Structures

Author: agrawal
banerjee
coffrin
fattahi
hsieh
mazumder
sojoudi
zhang
Publication venue
Publication date: 24/11/2017
Field of study

In this paper, we consider the Graphical Lasso (GL), a popular optimization problem for learning the sparse representations of high-dimensional datasets, which is well-known to be computationally expensive for large-scale problems. Recently, we have shown that the sparsity pattern of the optimal solution of GL is equivalent to the one obtained from simply thresholding the sample covariance matrix, for sparse graphs under different conditions. We have also derived a closed-form solution that is optimal when the thresholded sample covariance matrix has an acyclic structure. As a major generalization of the previous result, in this paper we derive a closed-form solution for the GL for graphs with chordal structures. We show that the GL and thresholding equivalence conditions can significantly be simplified and are expected to hold for high-dimensional problems if the thresholded sample covariance matrix has a chordal structure. We then show that the GL and thresholding equivalence is enough to reduce the GL to a maximum determinant matrix completion problem and drive a recursive closed-form solution for the GL when the thresholded sample covariance matrix has a chordal structure. For large-scale problems with up to 450 million variables, the proposed method can solve the GL problem in less than 2 minutes, while the state-of-the-art methods converge in more than 2 hours

arXiv.org e-Print Archive

Crossref

Markov models for fMRI correlation structure: is brain functional connectivity small world, or decomposable into networks?

Author: A. Gramfort
Achard
Amaral
B. Thirion
Bassett
Beckmann
Bekinschtein
Bialonski
Bickel
Biswal
Bullmore
Calhoun
Chan
Chinn
Conturo
Cordes
Cuthill
Damoiseaux
Dawid
Dehaene
Dempster
Deuker
Eguiluz
Fair
Fair
Felleman
Finger
Fisher
Fox
Fox
Friedman
Friston
G. Varoquaux
Giudici
Goodale
Greicius
Gross
Hagmann
Humphries
Ioannides
J.B. Poline
Kiviniemi
Lauritzen
Le Bihan
Ledoit
Marlin
Marrelec
Papadimitriou
Raichle
Ramachandran
Robins
Rothman
Rütimann
Sadaghiani
Salvador
Seeley
Seeley
Shulman
Smith
Smith
Spirtes
Sporns
Sporns
Sporns
Stam
Stephan
Strogatz
Tononi
Tononi
Tzourio-Mazoyer
Van Den Heuvel
Varoquaux
Wang
Watts
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Correlations in the signal observed via functional Magnetic Resonance Imaging (fMRI), are expected to reveal the interactions in the underlying neural populations through hemodynamic response. In particular, they highlight distributed set of mutually correlated regions that correspond to brain networks related to different cognitive functions. Yet graph-theoretical studies of neural connections give a different picture: that of a highly integrated system with small-world properties: local clustering but with short pathways across the complete structure. We examine the conditional independence properties of the fMRI signal, i.e. its Markov structure, to find realistic assumptions on the connectivity structure that are required to explain the observed functional connectivity. In particular we seek a decomposition of the Markov structure into segregated functional networks using decomposable graphs: a set of strongly-connected and partially overlapping cliques. We introduce a new method to efficiently extract such cliques on a large, strongly-connected graph. We compare methods learning different graph structures from functional connectivity by testing the goodness of fit of the model they learn on new data. We find that summarizing the structure as strongly-connected networks can give a good description only for very large and overlapping networks. These results highlight that Markov models are good tools to identify the structure of brain connectivity from fMRI signals, but for this purpose they must reflect the small-world properties of the underlying neural systems

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-Inserm

HAL-CEA

Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

Author: Ali Alnur
Azad Ariful
Buluc Aydin
Koanantakool Penporn
Morozov Dmitriy
Oh Sang-Yun
Oliker Leonid
Yelick Katherine
Publication venue
Publication date: 01/01/2018
Field of study

Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ~819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature.Comment: Main paper: 15 pages, appendix: 24 page

arXiv.org e-Print Archive

eScholarship - University of California