69 research outputs found
Sparse Median Graphs Estimation in a High Dimensional Semiparametric Model
In this manuscript a unified framework for conducting inference on complex
aggregated data in high dimensional settings is proposed. The data are assumed
to be a collection of multiple non-Gaussian realizations with underlying
undirected graphical structures. Utilizing the concept of median graphs in
summarizing the commonality across these graphical structures, a novel
semiparametric approach to modeling such complex aggregated data is provided
along with robust estimation of the median graph, which is assumed to be
sparse. The estimator is proved to be consistent in graph recovery and an upper
bound on the rate of convergence is given. Experiments on both synthetic and
real datasets are conducted to illustrate the empirical usefulness of the
proposed models and methods
High Dimensional Semiparametric Gaussian Copula Graphical Models
In this paper, we propose a semiparametric approach, named nonparanormal
skeptic, for efficiently and robustly estimating high dimensional undirected
graphical models. To achieve modeling flexibility, we consider Gaussian Copula
graphical models (or the nonparanormal) as proposed by Liu et al. (2009). To
achieve estimation robustness, we exploit nonparametric rank-based correlation
coefficient estimators, including Spearman's rho and Kendall's tau. In high
dimensional settings, we prove that the nonparanormal skeptic achieves the
optimal parametric rate of convergence in both graph and parameter estimation.
This celebrating result suggests that the Gaussian copula graphical models can
be used as a safe replacement of the popular Gaussian graphical models, even
when the data are truly Gaussian. Besides theoretical analysis, we also conduct
thorough numerical simulations to compare different estimators for their graph
recovery performance under both ideal and noisy settings. The proposed methods
are then applied on a large-scale genomic dataset to illustrate their empirical
usefulness. The R language software package huge implementing the proposed
methods is available on the Comprehensive R Archive Network: http://cran.
r-project.org/.Comment: 34 pages, 10 figures; the Annals of Statistics, 201
Fast and Adaptive Sparse Precision Matrix Estimation in High Dimensions
This paper proposes a new method for estimating sparse precision matrices in
the high dimensional setting. It has been popular to study fast computation and
adaptive procedures for this problem. We propose a novel approach, called
Sparse Column-wise Inverse Operator, to address these two issues. We analyze an
adaptive procedure based on cross validation, and establish its convergence
rate under the Frobenius norm. The convergence rates under other matrix norms
are also established. This method also enjoys the advantage of fast computation
for large-scale problems, via a coordinate descent algorithm. Numerical merits
are illustrated using both simulated and real datasets. In particular, it
performs favorably on an HIV brain tissue dataset and an ADHD resting-state
fMRI dataset.Comment: Maintext: 24 pages. Supplement: 13 pages. R package scio implementing
the proposed method is available on CRAN at
https://cran.r-project.org/package=scio . Published in J of Multivariate
Analysis at
http://www.sciencedirect.com/science/article/pii/S0047259X1400260
De novo construction of polyploid linkage maps using discrete graphical models
Linkage maps are used to identify the location of genes responsible for
traits and diseases. New sequencing techniques have created opportunities to
substantially increase the density of genetic markers. Such revolutionary
advances in technology have given rise to new challenges, such as creating
high-density linkage maps. Current multiple testing approaches based on
pairwise recombination fractions are underpowered in the high-dimensional
setting and do not extend easily to polyploid species. We propose to construct
linkage maps using graphical models either via a sparse Gaussian copula or a
nonparanormal skeptic approach. Linkage groups (LGs), typically chromosomes,
and the order of markers in each LG are determined by inferring the conditional
independence relationships among large numbers of markers in the genome.
Through simulations, we illustrate the utility of our map construction method
and compare its performance with other available methods, both when the data
are clean and contain no missing observations and when data contain genotyping
errors and are incomplete. We apply the proposed method to two genotype
datasets: barley and potato from diploid and polypoid populations,
respectively. Our comprehensive map construction method makes full use of the
dosage SNP data to reconstruct linkage map for any bi-parental diploid and
polyploid species. We have implemented the method in the R package netgwas.Comment: 25 pages, 7 figure
Post-Regularization Inference for Time-Varying Nonparanormal Graphical Models
We propose a novel class of time-varying nonparanormal graphical models,
which allows us to model high dimensional heavy-tailed systems and the
evolution of their latent network structures. Under this model, we develop
statistical tests for presence of edges both locally at a fixed index value and
globally over a range of values. The tests are developed for a high-dimensional
regime, are robust to model selection mistakes and do not require commonly
assumed minimum signal strength. The testing procedures are based on a high
dimensional, debiasing-free moment estimator, which uses a novel kernel
smoothed Kendall's tau correlation matrix as an input statistic. The estimator
consistently estimates the latent inverse Pearson correlation matrix uniformly
in both the index variable and kernel bandwidth. Its rate of convergence is
shown to be minimax optimal. Our method is supported by thorough numerical
simulations and an application to a neural imaging data set
- …