176 research outputs found

    Distributed Big-Data Optimization via Block Communications

    Get PDF
    We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication overhead. To address this issue, we propose the first distributed algorithm whereby agents optimize and communicate only a portion of their local variables. The scheme hinges on successive convex approximation (SCA) to handle the nonconvexity of the objective function, coupled with a novel block-signal tracking scheme, aiming at locally estimating the average of the agents' gradients. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Numerical results on a sparse regression problem show the effectiveness of the proposed algorithm and the impact of the block size on its practical convergence speed and communication cost

    Distributed big-data optimization via block communications

    Get PDF
    We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication overhead. To address this issue, we propose the first distributed algorithm whereby agents optimize and communicate only a portion of their local variables. The scheme hinges on successive convex approximation (SCA) to handle the nonconvexity of the objective function, coupled with a novel block- signal tracking scheme, aiming at locally estimating the average of the agents\u2019 gradients. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Numerical results on a sparse regression problem show the effectiveness of the proposed algorithm and the impact of the block size on its practical convergence speed and communication cost

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted â„“2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning

    Full text link
    Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations. Here, we simplify such difficulties for a class of structured symmetric positive-definite matrices with the affine-invariant metric. We do so by proposing a generalized version of the Riemannian normal coordinates that dynamically orthonormalizes the metric and locally converts the problem into an unconstrained problem in the Euclidean space. We use our approach to simplify existing approaches for structured covariances and develop matrix-inverse-free 2nd2^\text{nd}-order optimizers for deep learning with low precision by using only matrix multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DLComment: An updated version of the ICML 2023 paper. Updated the main text and added more numerical results for DNNs including a new baseline method and improving existing baseline method

    Optimization of a Geman-McClure like criterion for sparse signal deconvolution

    Get PDF
    International audienceThis paper deals with the problem of recovering a sparse unknown signal from a set of observations. The latter are obtained by convolution of the original signal and corruption with additive noise. We tackle the problem by minimizing a least-squares fit criterion penalized by a Geman-McClure like potential. The resulting criterion is a rational function, which makes it possible to formulate its minimization as a generalized problem of moments for which a hierarchy of semidefinite programming relaxations can be proposed. These convex relaxations yield a monotone sequence of values which converges to the global optimum. To overcome the computational limitations due to the large number of involved variables, a stochastic block-coordinate descent method is proposed. The algorithm has been implemented and shows promising result

    Joint covariate selection and joint subspace selection for multiple classification problems

    Get PDF
    We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification problems. By penalizing the sum of â„“2-norms of the blocks of coefficients associated with each covariate across different classification problems, similar sparsity patterns in all models are encouraged. To take computational advantage of the sparsity of solutions at high regularization levels, we propose a blockwise path-following scheme that approximately traces the regularization path. As the regularization coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously active for all problems. We also show how to use random projections to extend this approach to the problem of joint subspace selection, where multiple predictors are found in a common low-dimensional subspace. We present theoretical results showing that this random projection approach converges to the solution yielded by trace-norm regularization. Finally, we present a variety of experimental results exploring joint covariate selection and joint subspace selection, comparing the path-following approach to competing algorithms in terms of prediction accuracy and running time
    • …
    corecore