20 research outputs found
Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization
Consider the problem of minimizing the sum of a smooth (possibly non-convex)
and a convex (possibly nonsmooth) function involving a large number of
variables. A popular approach to solve this problem is the block coordinate
descent (BCD) method whereby at each iteration only one variable block is
updated while the remaining variables are held fixed. With the recent advances
in the developments of the multi-core parallel processing technology, it is
desirable to parallelize the BCD method by allowing multiple blocks to be
updated simultaneously at each iteration of the algorithm. In this work, we
propose an inexact parallel BCD approach where at each iteration, a subset of
the variables is updated in parallel by minimizing convex approximations of the
original objective function. We investigate the convergence of this parallel
BCD method for both randomized and cyclic variable selection rules. We analyze
the asymptotic and non-asymptotic convergence behavior of the algorithm for
both convex and non-convex objective functions. The numerical experiments
suggest that for a special case of Lasso minimization problem, the cyclic block
selection rule can outperform the randomized rule
Structure-Aware Dynamic Scheduler for Parallel Machine Learning
Training large machine learning (ML) models with many variables or parameters
can take a long time if one employs sequential procedures even with stochastic
updates. A natural solution is to turn to distributed computing on a cluster;
however, naive, unstructured parallelization of ML algorithms does not usually
lead to a proportional speedup and can even result in divergence, because
dependencies between model elements can attenuate the computational gains from
parallelization and compromise correctness of inference. Recent efforts toward
this issue have benefited from exploiting the static, a priori block structures
residing in ML algorithms. In this paper, we take this path further by
exploring the dynamic block structures and workloads therein present during ML
program execution, which offers new opportunities for improving convergence,
correctness, and load balancing in distributed ML. We propose and showcase a
general-purpose scheduler, STRADS, for coordinating distributed updates in ML
algorithms, which harnesses the aforementioned opportunities in a systematic
way. We provide theoretical guarantees for our scheduler, and demonstrate its
efficacy versus static block structures on Lasso and Matrix Factorization
Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization
We propose a decomposition framework for the parallel optimization of the sum
of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly
nonseparable), convex one. The latter term is usually employed to enforce
structure in the solution, typically sparsity. The main contribution of this
work is a novel \emph{parallel, hybrid random/deterministic} decomposition
scheme wherein, at each iteration, a subset of (block) variables is updated at
the same time by minimizing local convex approximations of the original
nonconvex function. To tackle with huge-scale problems, the (block) variables
to be updated are chosen according to a \emph{mixed random and deterministic}
procedure, which captures the advantages of both pure deterministic and random
update-based schemes. Almost sure convergence of the proposed scheme is
established. Numerical results show that on huge-scale problems the proposed
hybrid random/deterministic algorithm outperforms both random and deterministic
schemes.Comment: The order of the authors is alphabetica