2,233 research outputs found
Shampoo: Preconditioned Stochastic Tensor Optimization
Preconditioned gradient methods are among the most general and powerful tools
in optimization. However, preconditioning requires storing and manipulating
prohibitively large matrices. We describe and analyze a new structure-aware
preconditioning algorithm, called Shampoo, for stochastic optimization over
tensor spaces. Shampoo maintains a set of preconditioning matrices, each of
which operates on a single dimension, contracting over the remaining
dimensions. We establish convergence guarantees in the stochastic convex
setting, the proof of which builds upon matrix trace inequalities. Our
experiments with state-of-the-art deep learning models show that Shampoo is
capable of converging considerably faster than commonly used optimizers.
Although it involves a more complex update rule, Shampoo's runtime per step is
comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam
A Dynamic Parametrization Scheme for Shape Optimization Using Quasi-Newton Methods
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/97144/1/AIAA2012-962.pd
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Shampoo is an online and stochastic optimization algorithm belonging to the
AdaGrad family of methods for training neural networks. It constructs a
block-diagonal preconditioner where each block consists of a coarse Kronecker
product approximation to full-matrix AdaGrad for each parameter of the neural
network. In this work, we provide a complete description of the algorithm as
well as the performance optimizations that our implementation leverages to
train deep networks at-scale in PyTorch. Our implementation enables fast
multi-GPU distributed data-parallel training by distributing the memory and
computation associated with blocks of each parameter via PyTorch's DTensor data
structure and performing an AllGather primitive on the computed search
directions at each iteration. This major performance enhancement enables us to
achieve at most a 10% performance reduction in per-step wall-clock time
compared against standard diagonal-scaling-based adaptive gradient methods. We
validate our implementation by performing an ablation study on training
ImageNet ResNet50, demonstrating Shampoo's superiority over standard training
recipes with minimal hyperparameter tuning.Comment: 38 pages, 8 figures, 5 table
Optimization Methods for Inverse Problems
Optimization plays an important role in solving many inverse problems.
Indeed, the task of inversion often either involves or is fully cast as a
solution of an optimization problem. In this light, the mere non-linear,
non-convex, and large-scale nature of many of these inversions gives rise to
some very challenging optimization problems. The inverse problem community has
long been developing various techniques for solving such optimization tasks.
However, other, seemingly disjoint communities, such as that of machine
learning, have developed, almost in parallel, interesting alternative methods
which might have stayed under the radar of the inverse problem community. In
this survey, we aim to change that. In doing so, we first discuss current
state-of-the-art optimization methods widely used in inverse problems. We then
survey recent related advances in addressing similar challenges in problems
faced by the machine learning community, and discuss their potential advantages
for solving inverse problems. By highlighting the similarities among the
optimization challenges faced by the inverse problem and the machine learning
communities, we hope that this survey can serve as a bridge in bringing
together these two communities and encourage cross fertilization of ideas.Comment: 13 page
Classical Optimizers for Noisy Intermediate-Scale Quantum Devices
We present a collection of optimizers tuned for usage on Noisy Intermediate-Scale Quantum (NISQ) devices. Optimizers have a range of applications in quantum computing, including the Variational Quantum Eigensolver (VQE) and Quantum Approximate Optimization (QAOA) algorithms. They are also used for calibration tasks, hyperparameter tuning, in machine learning, etc. We analyze the efficiency and effectiveness of different optimizers in a VQE case study. VQE is a hybrid algorithm, with a classical minimizer step driving the next evaluation on the quantum processor. While most results to date concentrated on tuning the quantum VQE circuit, we show that, in the presence of quantum noise, the classical minimizer step needs to be carefully chosen to obtain correct results. We explore state-of-the-art gradient-free optimizers capable of handling noisy, black-box, cost functions and stress-test them using a quantum circuit simulation environment with noise injection capabilities on individual gates. Our results indicate that specifically tuned optimizers are crucial to obtaining valid science results on NISQ hardware, and will likely remain necessary even for future fault tolerant circuits
- …