26,959 research outputs found
Positive Definite Penalized Estimation of Large Covariance Matrices
The thresholding covariance estimator has nice asymptotic properties for
estimating sparse large covariance matrices, but it often has negative
eigenvalues when used in real data analysis. To simultaneously achieve sparsity
and positive definiteness, we develop a positive definite -penalized
covariance estimator for estimating sparse large covariance matrices. An
efficient alternating direction method is derived to solve the challenging
optimization problem and its convergence properties are established. Under weak
regularity conditions, non-asymptotic statistical theory is also established
for the proposed estimator. The competitive finite-sample performance of our
proposal is demonstrated by both simulation and real applications.Comment: accepted by JASA, August 201
Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection
Chandrasekaran, Parrilo and Willsky (2010) proposed a convex optimization
problem to characterize graphical model selection in the presence of unobserved
variables. This convex optimization problem aims to estimate an inverse
covariance matrix that can be decomposed into a sparse matrix minus a low-rank
matrix from sample data. Solving this convex optimization problem is very
challenging, especially for large problems. In this paper, we propose two
alternating direction methods for solving this problem. The first method is to
apply the classical alternating direction method of multipliers to solve the
problem as a consensus problem. The second method is a proximal gradient based
alternating direction method of multipliers. Our methods exploit and take
advantage of the special structure of the problem and thus can solve large
problems very efficiently. Global convergence result is established for the
proposed methods. Numerical results on both synthetic data and gene expression
data show that our methods usually solve problems with one million variables in
one to two minutes, and are usually five to thirty five times faster than a
state-of-the-art Newton-CG proximal point algorithm
Data Mining Using Relational Database Management Systems
Software packages providing a whole set of data mining and machine learning algorithms are attractive because they allow experimentation with many kinds of algorithms in an easy setup. However, these packages are often based on main-memory data structures, limiting the amount of data they can handle. In this paper we use a relational database as secondary storage in order to eliminate this limitation. Unlike existing approaches, which often focus on optimizing a single algorithm to work with a database backend, we propose a general approach, which provides a database interface for several algorithms at once. We have taken a popular machine learning software package, Weka, and added a relational storage manager as back-tier to the system. The extension is transparent to the algorithms implemented in Weka, since it is hidden behind Weka’s standard main-memory data structure interface. Furthermore, some general mining tasks are transfered into the database system to speed up execution. We tested the extended system, refered to as WekaDB, and our results show that it achieves a much higher scalability than Weka, while providing the same output and maintaining good computation time
- …