Search CORE

69 research outputs found

Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model

Author: Caffo Brian
Han Fang
Liu Han
Publication venue
Publication date: 11/10/2013
Field of study

In this manuscript a unified framework for conducting inference on complex aggregated data in high dimensional settings is proposed. The data are assumed to be a collection of multiple non-Gaussian realizations with underlying undirected graphical structures. Utilizing the concept of median graphs in summarizing the commonality across these graphical structures, a novel semiparametric approach to modeling such complex aggregated data is provided along with robust estimation of the median graph, which is assumed to be sparse. The estimator is proved to be consistent in graph recovery and an upper bound on the rate of convergence is given. Experiments on both synthetic and real datasets are conducted to illustrate the empirical usefulness of the proposed models and methods

arXiv.org e-Print Archive

Collection Of Biostatistics Research Archive

High Dimensional Semiparametric Gaussian Copula Graphical Models

Author: Han Fang
Lafferty John
Liu Han
Wasserman Larry
Yuan Ming
Publication venue
Publication date: 27/07/2012
Field of study

In this paper, we propose a semiparametric approach, named nonparanormal skeptic, for efficiently and robustly estimating high dimensional undirected graphical models. To achieve modeling flexibility, we consider Gaussian Copula graphical models (or the nonparanormal) as proposed by Liu et al. (2009). To achieve estimation robustness, we exploit nonparametric rank-based correlation coefficient estimators, including Spearman's rho and Kendall's tau. In high dimensional settings, we prove that the nonparanormal skeptic achieves the optimal parametric rate of convergence in both graph and parameter estimation. This celebrating result suggests that the Gaussian copula graphical models can be used as a safe replacement of the popular Gaussian graphical models, even when the data are truly Gaussian. Besides theoretical analysis, we also conduct thorough numerical simulations to compare different estimators for their graph recovery performance under both ideal and noisy settings. The proposed methods are then applied on a large-scale genomic dataset to illustrate their empirical usefulness. The R language software package huge implementing the proposed methods is available on the Comprehensive R Archive Network: http://cran. r-project.org/.Comment: 34 pages, 10 figures; the Annals of Statistics, 201

arXiv.org e-Print Archive

Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions

Author: Bickel
Borjabad
Breiman
Cai
Cai
Cai
Cai
Dickstein
d’Aspremont
Fan
Fan
Friedman
Friedman
Lam
Lauritzen
Liu
Meinshausen
Ravikumar
Rothman
Sun
Tibshirani
Weidong Liu
Xi Luo
Yuan
Yuan
Publication venue: 'Elsevier BV'
Publication date: 22/12/2016
Field of study

This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset.Comment: Maintext: 24 pages. Supplement: 13 pages. R package scio implementing the proposed method is available on CRAN at https://cran.r-project.org/package=scio . Published in J of Multivariate Analysis at http://www.sciencedirect.com/science/article/pii/S0047259X1400260

arXiv.org e-Print Archive

Crossref

De novo construction of polyploid linkage maps using discrete graphical models

Author: Behrouzi Pariya
Wit Ernst C.
Publication venue
Publication date: 02/04/2018
Field of study

Linkage maps are used to identify the location of genes responsible for traits and diseases. New sequencing techniques have created opportunities to substantially increase the density of genetic markers. Such revolutionary advances in technology have given rise to new challenges, such as creating high-density linkage maps. Current multiple testing approaches based on pairwise recombination fractions are underpowered in the high-dimensional setting and do not extend easily to polyploid species. We propose to construct linkage maps using graphical models either via a sparse Gaussian copula or a nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes, and the order of markers in each LG are determined by inferring the conditional independence relationships among large numbers of markers in the genome. Through simulations, we illustrate the utility of our map construction method and compare its performance with other available methods, both when the data are clean and contain no missing observations and when data contain genotyping errors and are incomplete. We apply the proposed method to two genotype datasets: barley and potato from diploid and polypoid populations, respectively. Our comprehensive map construction method makes full use of the dosage SNP data to reconstruct linkage map for any bi-parental diploid and polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure

arXiv.org e-Print Archive

Post-Regularization Inference for Time-Varying Nonparanormal Graphical Models

Author: Kolar Mladen
Liu Han
Lu Junwei
Publication venue
Publication date: 01/01/2018
Field of study

We propose a novel class of time-varying nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model, we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. The testing procedures are based on a high dimensional, debiasing-free moment estimator, which uses a novel kernel smoothed Kendall's tau correlation matrix as an input statistic. The estimator consistently estimates the latent inverse Pearson correlation matrix uniformly in both the index variable and kernel bandwidth. Its rate of convergence is shown to be minimax optimal. Our method is supported by thorough numerical simulations and an application to a neural imaging data set

arXiv.org e-Print Archive

Princeton University Open Access Repository