1,813 research outputs found
Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Asynchronous distributed algorithms are a popular way to reduce
synchronization costs in large-scale optimization, and in particular for neural
network training. However, for nonsmooth and nonconvex objectives, few
convergence guarantees exist beyond cases where closed-form proximal operator
solutions are available. As most popular contemporary deep neural networks lead
to nonsmooth and nonconvex objectives, there is now a pressing need for such
convergence guarantees. In this paper, we analyze for the first time the
convergence of stochastic asynchronous optimization for this general class of
objectives. In particular, we focus on stochastic subgradient methods allowing
for block variable partitioning, where the shared-memory-based model is
asynchronously updated by concurrent processes. To this end, we first introduce
a probabilistic model which captures key features of real asynchronous
scheduling between concurrent processes; under this model, we establish
convergence with probability one to an invariant set for stochastic subgradient
methods with momentum.
From the practical perspective, one issue with the family of methods we
consider is that it is not efficiently supported by machine learning
frameworks, as they mostly focus on distributed data-parallel strategies. To
address this, we propose a new implementation strategy for shared-memory based
training of deep neural networks, whereby concurrent parameter servers are
utilized to train a partitioned but shared model in single- and multi-GPU
settings. Based on this implementation, we achieve on average 1.2x speed-up in
comparison to state-of-the-art training methods for popular image
classification tasks without compromising accuracy
Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization
In this paper, we present a new stochastic algorithm, namely the stochastic
block mirror descent (SBMD) method for solving large-scale nonsmooth and
stochastic optimization problems. The basic idea of this algorithm is to
incorporate the block-coordinate decomposition and an incremental block
averaging scheme into the classic (stochastic) mirror-descent method, in order
to significantly reduce the cost per iteration of the latter algorithm. We
establish the rate of convergence of the SBMD method along with its associated
large-deviation results for solving general nonsmooth and stochastic
optimization problems. We also introduce different variants of this method and
establish their rate of convergence for solving strongly convex, smooth, and
composite optimization problems, as well as certain nonconvex optimization
problems. To the best of our knowledge, all these developments related to the
SBMD methods are new in the stochastic optimization literature. Moreover, some
of our results also seem to be new for block coordinate descent methods for
deterministic optimization
- …