Search CORE

31,496 research outputs found

Estimation and Inference for High-Dimensional Gaussian Graphical Models with Structural Constraints.

Author: Ma Jing
Publication venue
Publication date
Field of study

This work discusses several aspects of estimation and inference for high-dimensional Gaussian graphical models and consists of two main parts. The first part considers network-based pathway enrichment analysis based on incomplete network information. Pathway enrichment analysis has become a key tool for biomedical researchers to gain insight into the underlying biology of differentially expressed genes, proteins and metabolites. We propose a constrained network estimation framework that combines network estimation based on cell- and condition-specific high-dimensional Omics data with interaction information from existing data bases. The resulting pathway topology information is subsequently used to provide a framework for simultaneous testing of differences in expression levels of pathway members, as well as their interactions. We study the asymptotic properties of the proposed network estimator and the test for pathway enrichment, and investigate its small sample performance in simulated experiments and illustrate it on two cancer data sets. The second part of the thesis is devoted to reconstructing multiple graphical models simultaneously from high-dimensional data. We develop methodology that jointly estimates multiple Gaussian graphical models, assuming that there exists prior information on how they are structurally related. The proposed method consists of two steps: in the first one, we employ neighborhood selection to obtain estimated edge sets of the graphs using a group lasso penalty. In the second step, we estimate the nonzero entries in the inverse covariance matrices by maximizing the corresponding Gaussian likelihood. We establish the consistency of the proposed method for sparse high-dimensional Gaussian graphical models and illustrate its performance using simulation experiments. An application to a climate data set is also discussed.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113495/1/mjing_1.pd

High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

Author: Anandkumar Animashree
Tan Vincent Y. F.
Willsky Alan. S.
Publication venue
Publication date: 01/06/2011
Field of study

We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n=omega(J_{min}^{-2} log p), where p is the number of variables and J_{min} is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency

arXiv.org e-Print Archive

CiteSeerX

Foundational principles for large scale inference: Illustrations through correlation mining

Author: Alfred O. Hero
Alfred O. Hero
Alfred O. Hero
Bala Rajaratnam
Bala Rajaratnam
Bala Rajaratnam
Publication venue
Publication date: 18/05/2015
Field of study

When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number

n

of acquired samples (statistical replicates) is far fewer than the number

p

of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size

n

is fixed, and the dimension

p

grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California