15,372 research outputs found
Stochastic Training of Neural Networks via Successive Convex Approximations
This paper proposes a new family of algorithms for training neural networks
(NNs). These are based on recent developments in the field of non-convex
optimization, going under the general name of successive convex approximation
(SCA) techniques. The basic idea is to iteratively replace the original
(non-convex, highly dimensional) learning problem with a sequence of (strongly
convex) approximations, which are both accurate and simple to optimize.
Differently from similar ideas (e.g., quasi-Newton algorithms), the
approximations can be constructed using only first-order information of the
neural network function, in a stochastic fashion, while exploiting the overall
structure of the learning problem for a faster convergence. We discuss several
use cases, based on different choices for the loss function (e.g., squared loss
and cross-entropy loss), and for the regularization of the NN's weights. We
experiment on several medium-sized benchmark problems, and on a large-scale
dataset involving simulated physical data. The results show how the algorithm
outperforms state-of-the-art techniques, providing faster convergence to a
better minimum. Additionally, we show how the algorithm can be easily
parallelized over multiple computational units without hindering its
performance. In particular, each computational unit can optimize a tailored
surrogate function defined on a randomly assigned subset of the input
variables, whose dimension can be selected depending entirely on the available
computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and
Learning System
Distributed Nonconvex Multiagent Optimization Over Time-Varying Networks
We study nonconvex distributed optimization in multiagent networks where the
communications between nodes is modeled as a time-varying sequence of arbitrary
digraphs. We introduce a novel broadcast-based distributed algorithmic
framework for the (constrained) minimization of the sum of a smooth (possibly
nonconvex and nonseparable) function, i.e., the agents' sum-utility, plus a
convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually
employed to enforce some structure in the solution, typically sparsity. The
proposed method hinges on Successive Convex Approximation (SCA) techniques
coupled with i) a tracking mechanism instrumental to locally estimate the
gradients of agents' cost functions; and ii) a novel broadcast protocol to
disseminate information and distribute the computation among the agents.
Asymptotic convergence to stationary solutions is established. A key feature of
the proposed algorithm is that it neither requires the double-stochasticity of
the consensus matrices (but only column stochasticity) nor the knowledge of the
graph sequence to implement. To the best of our knowledge, the proposed
framework is the first broadcast-based distributed algorithm for convex and
nonconvex constrained optimization over arbitrary, time-varying digraphs.
Numerical results show that our algorithm outperforms current schemes on both
convex and nonconvex problems.Comment: Copyright 2001 SS&C. Published in the Proceedings of the 50th annual
Asilomar conference on signals, systems, and computers, Nov. 6-9, 2016, CA,
US
- …