25,736 research outputs found
Adaptive Momentum for Neural Network Optimization
In this thesis, we develop a novel and efficient algorithm for optimizing neural networks inspired by a recently proposed geodesic optimization algorithm. Our algorithm, which we call Stochastic Geodesic Optimization (SGeO), utilizes an adaptive coefficient on top of Polyaks Heavy Ball method effectively controlling the amount of weight put on the previous update to the parameters based on the change of direction in the optimization path. Experimental results on strongly convex functions with Lipschitz gradients and deep Autoencoder benchmarks show that SGeO reaches lower errors than established first-order methods and competes well with lower or similar errors to a recent second-order method called K-FAC (Kronecker-Factored Approximate Curvature). We also incorporate Nesterov style lookahead gradient into our algorithm (SGeO-N) and observe notable improvements. We believe that our research will open up new directions for high-dimensional neural network optimization where combining the efficiency of first-order methods and the effectiveness of second-order methods proves a promising avenue to explore
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
Playing with Duality: An Overview of Recent Primal-Dual Approaches for Solving Large-Scale Optimization Problems
Optimization methods are at the core of many problems in signal/image
processing, computer vision, and machine learning. For a long time, it has been
recognized that looking at the dual of an optimization problem may drastically
simplify its solution. Deriving efficient strategies which jointly brings into
play the primal and the dual problems is however a more recent idea which has
generated many important new contributions in the last years. These novel
developments are grounded on recent advances in convex analysis, discrete
optimization, parallel processing, and non-smooth optimization with emphasis on
sparsity issues. In this paper, we aim at presenting the principles of
primal-dual approaches, while giving an overview of numerical methods which
have been proposed in different contexts. We show the benefits which can be
drawn from primal-dual algorithms both for solving large-scale convex
optimization problems and discrete ones, and we provide various application
examples to illustrate their usefulness
On Quasi-Newton Forward--Backward Splitting: Proximal Calculus and Convergence
We introduce a framework for quasi-Newton forward--backward splitting
algorithms (proximal quasi-Newton methods) with a metric induced by diagonal
rank- symmetric positive definite matrices. This special type of
metric allows for a highly efficient evaluation of the proximal mapping. The
key to this efficiency is a general proximal calculus in the new metric. By
using duality, formulas are derived that relate the proximal mapping in a
rank- modified metric to the original metric. We also describe efficient
implementations of the proximity calculation for a large class of functions;
the implementations exploit the piece-wise linear nature of the dual problem.
Then, we apply these results to acceleration of composite convex minimization
problems, which leads to elegant quasi-Newton methods for which we prove
convergence. The algorithm is tested on several numerical examples and compared
to a comprehensive list of alternatives in the literature. Our quasi-Newton
splitting algorithm with the prescribed metric compares favorably against
state-of-the-art. The algorithm has extensive applications including signal
processing, sparse recovery, machine learning and classification to name a few.Comment: arXiv admin note: text overlap with arXiv:1206.115
- …