19,044 research outputs found
TIGER: A Tuning-Insensitive Approach for Optimally Estimating Gaussian Graphical Models
We propose a new procedure for estimating high dimensional Gaussian graphical
models. Our approach is asymptotically tuning-free and non-asymptotically
tuning-insensitive: it requires very few efforts to choose the tuning parameter
in finite sample settings. Computationally, our procedure is significantly
faster than existing methods due to its tuning-insensitive property.
Theoretically, the obtained estimator is simultaneously minimax optimal for
precision matrix estimation under different norms. Empirically, we illustrate
the advantages of our method using thorough simulated and real examples. The R
package bigmatrix implementing the proposed methods is available on the
Comprehensive R Archive Network: http://cran.r-project.org/
Adaptive estimation of covariance matrices via Cholesky decomposition
This paper studies the estimation of a large covariance matrix. We introduce
a novel procedure called ChoSelect based on the Cholesky factor of the inverse
covariance. This method uses a dimension reduction strategy by selecting the
pattern of zero of the Cholesky factor. Alternatively, ChoSelect can be
interpreted as a graph estimation procedure for directed Gaussian graphical
models. Our approach is particularly relevant when the variables under study
have a natural ordering (e.g. time series) or more generally when the Cholesky
factor is approximately sparse. ChoSelect achieves non-asymptotic oracle
inequalities with respect to the Kullback-Leibler entropy. Moreover, it
satisfies various adaptive properties from a minimax point of view. We also
introduce and study a two-stage procedure that combines ChoSelect with the
Lasso. This last method enables the practitioner to choose his own trade-off
between statistical efficiency and computational complexity. Moreover, it is
consistent under weaker assumptions than the Lasso. The practical performances
of the different procedures are assessed on numerical examples
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
A Constrained L1 Minimization Approach to Sparse Precision Matrix Estimation
A constrained L1 minimization method is proposed for estimating a sparse
inverse covariance matrix based on a sample of iid -variate random
variables. The resulting estimator is shown to enjoy a number of desirable
properties. In particular, it is shown that the rate of convergence between the
estimator and the true -sparse precision matrix under the spectral norm is
when the population distribution has either exponential-type
tails or polynomial-type tails. Convergence rates under the elementwise
norm and Frobenius norm are also presented. In addition, graphical
model selection is considered. The procedure is easily implementable by linear
programming. Numerical performance of the estimator is investigated using both
simulated and real data. In particular, the procedure is applied to analyze a
breast cancer dataset. The procedure performs favorably in comparison to
existing methods.Comment: To appear in Journal of the American Statistical Associatio
- …