20,009 research outputs found
Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection
Chandrasekaran, Parrilo and Willsky (2010) proposed a convex optimization
problem to characterize graphical model selection in the presence of unobserved
variables. This convex optimization problem aims to estimate an inverse
covariance matrix that can be decomposed into a sparse matrix minus a low-rank
matrix from sample data. Solving this convex optimization problem is very
challenging, especially for large problems. In this paper, we propose two
alternating direction methods for solving this problem. The first method is to
apply the classical alternating direction method of multipliers to solve the
problem as a consensus problem. The second method is a proximal gradient based
alternating direction method of multipliers. Our methods exploit and take
advantage of the special structure of the problem and thus can solve large
problems very efficiently. Global convergence result is established for the
proposed methods. Numerical results on both synthetic data and gene expression
data show that our methods usually solve problems with one million variables in
one to two minutes, and are usually five to thirty five times faster than a
state-of-the-art Newton-CG proximal point algorithm
Sparse Inverse Covariance Selection via Alternating Linearization Methods
Gaussian graphical models are of great interest in statistical learning.
Because the conditional independencies between different nodes correspond to
zero entries in the inverse covariance matrix of the Gaussian distribution, one
can learn the structure of the graph by estimating a sparse inverse covariance
matrix from sample data, by solving a convex maximum likelihood problem with an
-regularization term. In this paper, we propose a first-order method
based on an alternating linearization technique that exploits the problem's
special structure; in particular, the subproblems solved in each iteration have
closed-form solutions. Moreover, our algorithm obtains an -optimal
solution in iterations. Numerical experiments on both synthetic
and real data from gene association networks show that a practical version of
this algorithm outperforms other competitive algorithms
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Subsequence clustering of multivariate time series is a useful tool for
discovering repeated patterns in temporal data. Once these patterns have been
discovered, seemingly complicated datasets can be interpreted as a temporal
sequence of only a small number of states, or clusters. For example, raw sensor
data from a fitness-tracking application can be expressed as a timeline of a
select few actions (i.e., walking, sitting, running). However, discovering
these patterns is challenging because it requires simultaneous segmentation and
clustering of the time series. Furthermore, interpreting the resulting clusters
is difficult, especially when the data is high-dimensional. Here we propose a
new method of model-based clustering, which we call Toeplitz Inverse
Covariance-based Clustering (TICC). Each cluster in the TICC method is defined
by a correlation network, or Markov random field (MRF), characterizing the
interdependencies between different observations in a typical subsequence of
that cluster. Based on this graphical representation, TICC simultaneously
segments and clusters the time series data. We solve the TICC problem through
alternating minimization, using a variation of the expectation maximization
(EM) algorithm. We derive closed-form solutions to efficiently solve the two
resulting subproblems in a scalable way, through dynamic programming and the
alternating direction method of multipliers (ADMM), respectively. We validate
our approach by comparing TICC to several state-of-the-art baselines in a
series of synthetic experiments, and we then demonstrate on an automobile
sensor dataset how TICC can be used to learn interpretable clusters in
real-world scenarios.Comment: This revised version fixes two small typos in the published versio
Positive Definite Penalized Estimation of Large Covariance Matrices
The thresholding covariance estimator has nice asymptotic properties for
estimating sparse large covariance matrices, but it often has negative
eigenvalues when used in real data analysis. To simultaneously achieve sparsity
and positive definiteness, we develop a positive definite -penalized
covariance estimator for estimating sparse large covariance matrices. An
efficient alternating direction method is derived to solve the challenging
optimization problem and its convergence properties are established. Under weak
regularity conditions, non-asymptotic statistical theory is also established
for the proposed estimator. The competitive finite-sample performance of our
proposal is demonstrated by both simulation and real applications.Comment: accepted by JASA, August 201
Exact Hybrid Covariance Thresholding for Joint Graphical Lasso
This paper considers the problem of estimating multiple related Gaussian
graphical models from a -dimensional dataset consisting of different
classes. Our work is based upon the formulation of this problem as group
graphical lasso. This paper proposes a novel hybrid covariance thresholding
algorithm that can effectively identify zero entries in the precision matrices
and split a large joint graphical lasso problem into small subproblems. Our
hybrid covariance thresholding method is superior to existing uniform
thresholding methods in that our method can split the precision matrix of each
individual class using different partition schemes and thus split group
graphical lasso into much smaller subproblems, each of which can be solved very
fast. In addition, this paper establishes necessary and sufficient conditions
for our hybrid covariance thresholding algorithm. The superior performance of
our thresholding method is thoroughly analyzed and illustrated by a few
experiments on simulated data and real gene expression data
- …