2,924 research outputs found
Tensor Graphical Lasso (TeraLasso)
This paper introduces a multi-way tensor generalization of the Bigraphical
Lasso (BiGLasso), which uses a two-way sparse Kronecker-sum multivariate-normal
model for the precision matrix to parsimoniously model conditional dependence
relationships of matrix-variate data based on the Cartesian product of graphs.
We call this generalization the {\bf Te}nsor g{\bf ra}phical Lasso (TeraLasso).
We demonstrate using theory and examples that the TeraLasso model can be
accurately and scalably estimated from very limited data samples of high
dimensional variables with multiway coordinates such as space, time and
replicates. Statistical consistency and statistical rates of convergence are
established for both the BiGLasso and TeraLasso estimators of the precision
matrix and estimators of its support (non-sparsity) set, respectively. We
propose a scalable composite gradient descent algorithm and analyze the
computational convergence rate, showing that the composite gradient descent
algorithm is guaranteed to converge at a geometric rate to the global minimizer
of the TeraLasso objective function. Finally, we illustrate the TeraLasso using
both simulation and experimental data from a meteorological dataset, showing
that we can accurately estimate precision matrices and recover meaningful
conditional dependency graphs from high dimensional complex datasets.Comment: accepted to JRSS-
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Probabilistic Interpretation of Linear Solvers
This manuscript proposes a probabilistic framework for algorithms that
iteratively solve unconstrained linear problems with positive definite
for . The goal is to replace the point estimates returned by existing
methods with a Gaussian posterior belief over the elements of the inverse of
, which can be used to estimate errors. Recent probabilistic interpretations
of the secant family of quasi-Newton optimization algorithms are extended.
Combined with properties of the conjugate gradient algorithm, this leads to
uncertainty-calibrated methods with very limited cost overhead over conjugate
gradients, a self-contained novel interpretation of the quasi-Newton and
conjugate gradient algorithms, and a foundation for new nonlinear optimization
methods.Comment: final version, in press at SIAM J Optimizatio
Covariance Estimation in High Dimensions via Kronecker Product Expansions
This paper presents a new method for estimating high dimensional covariance
matrices. The method, permuted rank-penalized least-squares (PRLS), is based on
a Kronecker product series expansion of the true covariance matrix. Assuming an
i.i.d. Gaussian random sample, we establish high dimensional rates of
convergence to the true covariance as both the number of samples and the number
of variables go to infinity. For covariance matrices of low separation rank,
our results establish that PRLS has significantly faster convergence than the
standard sample covariance matrix (SCM) estimator. The convergence rate
captures a fundamental tradeoff between estimation error and approximation
error, thus providing a scalable covariance estimation framework in terms of
separation rank, similar to low rank approximation of covariance matrices. The
MSE convergence rates generalize the high dimensional rates recently obtained
for the ML Flip-flop algorithm for Kronecker product covariance estimation. We
show that a class of block Toeplitz covariance matrices is approximatable by
low separation rank and give bounds on the minimal separation rank that
ensures a given level of bias. Simulations are presented to validate the
theoretical bounds. As a real world application, we illustrate the utility of
the proposed Kronecker covariance estimator for spatio-temporal linear least
squares prediction of multivariate wind speed measurements.Comment: 47 pages, accepted to IEEE Transactions on Signal Processin
Network inference in matrix-variate Gaussian models with non-independent noise
Inferring a graphical model or network from observational data from a large
number of variables is a well studied problem in machine learning and
computational statistics. In this paper we consider a version of this problem
that is relevant to the analysis of multiple phenotypes collected in genetic
studies. In such datasets we expect correlations between phenotypes and between
individuals. We model observations as a sum of two matrix normal variates such
that the joint covariance function is a sum of Kronecker products. This model,
which generalizes the Graphical Lasso, assumes observations are correlated due
to known genetic relationships and corrupted with non-independent noise. We
have developed a computationally efficient EM algorithm to fit this model. On
simulated datasets we illustrate substantially improved performance in network
reconstruction by allowing for a general noise distribution
- …