6 research outputs found
Fast ADMM Algorithm for Distributed Optimization with Adaptive Penalty
We propose new methods to speed up convergence of the Alternating Direction
Method of Multipliers (ADMM), a common optimization tool in the context of
large scale and distributed learning. The proposed method accelerates the speed
of convergence by automatically deciding the constraint penalty needed for
parameter consensus in each iteration. In addition, we also propose an
extension of the method that adaptively determines the maximum number of
iterations to update the penalty. We show that this approach effectively leads
to an adaptive, dynamic network topology underlying the distributed
optimization. The utility of the new penalty update schemes is demonstrated on
both synthetic and real data, including a computer vision application of
distributed structure from motion.Comment: 8 pages manuscript, 2 pages appendix, 5 figure
Grassmann Averages for Scalable Robust PCA
As the collection of large datasets becomes increasingly automated, the occurrence of outliers will increase – “big data” implies “big outliers”. While principal component analysis (PCA) is often used to reduce the size of data, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA do not scale beyond small-to-medium sized datasets. To address this, we introduce the Grassmann Average (GA), which expresses dimensionality reduction as an average of the subspaces spanned by the data. Because averages can be efficiently computed, we immediately gain scalability. GA is inherently more robust than PCA, but we show that they coincide for Gaussian data. We exploit that averages can be made robust to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. Robustness can be with respect to vectors (subspaces) or elements of vectors; we focus on the latter and use a trimmed average. The resulting Trimmed Grassmann Average (TGA) is particularly appropriate for computer vision because it is robust to pixel outliers. The algorithm has low computational complexity and minimal memory requirements, making it scalable to “big noisy data.” We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie
Distributed Learning, Prediction and Detection in Probabilistic Graphs.
Critical to high-dimensional statistical estimation is to exploit the structure in the data distribution. Probabilistic graphical models provide an efficient framework for representing complex joint distributions of random variables through their conditional dependency graph, and can be adapted to many high-dimensional machine learning applications.
This dissertation develops the probabilistic graphical modeling technique for three statistical estimation problems arising in real-world applications: distributed and parallel learning in networks, missing-value prediction in recommender systems, and emerging topic detection in text corpora. The common theme behind all proposed methods is a combination of parsimonious representation of uncertainties in the data, optimization surrogate that leads to computationally efficient algorithms, and fundamental limits of estimation performance in high dimension.
More specifically, the dissertation makes the following theoretical contributions:
(1) We propose a distributed and parallel framework for learning the parameters in Gaussian graphical models that is free of iterative global message passing. The proposed distributed estimator is shown to be asymptotically consistent, improve with increasing local neighborhood sizes, and have a high-dimensional error rate comparable to that of the centralized maximum likelihood estimator.
(2) We present a family of latent variable Gaussian graphical models whose marginal precision matrix has a “low-rank plus sparse” structure. Under mild conditions, we analyze the high-dimensional parameter error bounds for learning this family of models using regularized maximum likelihood estimation.
(3) We consider a hypothesis testing framework for detecting emerging topics in topic models, and propose a novel surrogate test statistic for the standard likelihood ratio. By leveraging the theory of empirical processes, we prove asymptotic consistency for the proposed test and provide guarantees of the detection performance.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110499/1/mengzs_1.pd