122 research outputs found

    Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes

    Full text link
    Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two different ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via a natural alternating maximization (AM) method. We show the the AM method is nontrivially equivalent to GPower (Journ\'{e}e et al; JMLR 11:517--553, 2010) for all our formulations. Besides this, we provide 24 efficient parallel SPCA implementations: 3 codes (multi-core, GPU and cluster) for each of the 8 problems. Parallelism in the methods is aimed at i) speeding up computations (our GPU code can be 100 times faster than an efficient serial code written in C++), ii) obtaining solutions explaining more variance and iii) dealing with big data problems (our cluster code is able to solve a 357 GB problem in about a minute).Comment: 29 pages, 9 tables, 7 figures (the paper is accompanied by a release of the open-source code '24am'

    Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function

    Full text link
    In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an ϵ\epsilon-accurate solution with probability at least 1ρ1-\rho in at most O(nϵlog1ρ)O(\tfrac{n}{\epsilon} \log \tfrac{1}{\rho}) iterations, where nn is the number of blocks. For strongly convex functions the method converges linearly. This extends recent results of Nesterov [Efficiency of coordinate descent methods on huge-scale optimization problems, CORE Discussion Paper #2010/2], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing ϵ\epsilon from the logarithmic term. More importantly, in contrast with the aforementioned work in which the author achieves the results by applying the method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving true iteration complexity bounds. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. Finally, we demonstrate numerically that the algorithm is able to solve huge-scale 1\ell_1-regularized least squares and support vector machine problems with a billion variables.Comment: 33 pages, 7 figures, 10 table
    corecore