Search CORE

2,233 research outputs found

Shampoo: Preconditioned Stochastic Tensor Optimization

Author: Gupta Vineet
Koren Tomer
Singer Yoram
Publication venue
Publication date: 01/01/2018
Field of study

Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Although it involves a more complex update rule, Shampoo's runtime per step is comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam

arXiv.org e-Print Archive

Princeton University Open Access Repository

A Dynamic Parametrization Scheme for Shape Optimization Using Quasi-Newton Methods

Author: Hwang John
Martins Joaquim
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2012
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/97144/1/AIAA2012-962.pd

Deep Blue Documents at the University of Michigan

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Author: Gallego-Posada Jose
Iwasaki Shintaro
Lee Tsung-Hsien
Li Zhijing
Mudigere Dheevatsa
Rabbat Michael
Rangadurai Kaushik
Shi Hao-Jun Michael
Publication venue
Publication date: 12/09/2023
Field of study

Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning.Comment: 38 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Advances in Stochastic Medical Image Registration

Author: Sun W. (Wei)
Publication venue: urn:ISBN:978-94-6299-230-6
Publication date: 15/12/2015
Field of study

EUR Research Repository

Erasmus University Digital Repository

Optimization Methods for Inverse Problems

Author: Cui Tiangang
Roosta-Khorasani Farbod
Ye Nan
Publication venue
Publication date: 30/11/2017
Field of study

Optimization plays an important role in solving many inverse problems. Indeed, the task of inversion often either involves or is fully cast as a solution of an optimization problem. In this light, the mere non-linear, non-convex, and large-scale nature of many of these inversions gives rise to some very challenging optimization problems. The inverse problem community has long been developing various techniques for solving such optimization tasks. However, other, seemingly disjoint communities, such as that of machine learning, have developed, almost in parallel, interesting alternative methods which might have stayed under the radar of the inverse problem community. In this survey, we aim to change that. In doing so, we first discuss current state-of-the-art optimization methods widely used in inverse problems. We then survey recent related advances in addressing similar challenges in problems faced by the machine learning community, and discuss their potential advantages for solving inverse problems. By highlighting the similarities among the optimization challenges faced by the inverse problem and the machine learning communities, we hope that this survey can serve as a bridge in bringing together these two communities and encourage cross fertilization of ideas.Comment: 13 page

arXiv.org e-Print Archive

University of Queensland eSpace

Classical Optimizers for Noisy Intermediate-Scale Quantum Devices

Author: De Jong W
Iancu C
Lavrijsen W
Muller J
Tudor A
Publication venue: eScholarship, University of California
Publication date: 01/10/2020
Field of study

We present a collection of optimizers tuned for usage on Noisy Intermediate-Scale Quantum (NISQ) devices. Optimizers have a range of applications in quantum computing, including the Variational Quantum Eigensolver (VQE) and Quantum Approximate Optimization (QAOA) algorithms. They are also used for calibration tasks, hyperparameter tuning, in machine learning, etc. We analyze the efficiency and effectiveness of different optimizers in a VQE case study. VQE is a hybrid algorithm, with a classical minimizer step driving the next evaluation on the quantum processor. While most results to date concentrated on tuning the quantum VQE circuit, we show that, in the presence of quantum noise, the classical minimizer step needs to be carefully chosen to obtain correct results. We explore state-of-the-art gradient-free optimizers capable of handling noisy, black-box, cost functions and stress-test them using a quantum circuit simulation environment with noise injection capabilities on individual gates. Our results indicate that specifically tuned optimizers are crucial to obtaining valid science results on NISQ hardware, and will likely remain necessary even for future fault tolerant circuits

Crossref

eScholarship - University of California