Search CORE

209 research outputs found

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

Author: Mahoney Michael W.
Roosta-Khorasani Farbod
Wang Shusen
Xu Peng
Publication venue
Publication date: 01/01/2018
Field of study

For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, which is sent to the main driver. The main driver, then, averages all the ANT directions received from workers to form a {\it Globally Improved ANT} (GIANT) direction. GIANT is highly communication efficient and naturally exploits the trade-offs between local computations and global communications in that more local computations result in fewer overall rounds of communications. Theoretically, we show that GIANT enjoys an improved convergence rate as compared with first-order methods and existing distributed Newton-type methods. Further, and in sharp contrast with many existing distributed Newton-type methods, as well as popular first-order methods, a highly advantageous practical feature of GIANT is that it only involves one tuning parameter. We conduct large-scale experiments on a computer cluster and, empirically, demonstrate the superior performance of GIANT.Comment: Fixed some typos. Improved writin

arXiv.org e-Print Archive

University of Queensland eSpace

Optimization Methods for Inverse Problems

Author: Cui Tiangang
Roosta-Khorasani Farbod
Ye Nan
Publication venue
Publication date: 30/11/2017
Field of study

Optimization plays an important role in solving many inverse problems. Indeed, the task of inversion often either involves or is fully cast as a solution of an optimization problem. In this light, the mere non-linear, non-convex, and large-scale nature of many of these inversions gives rise to some very challenging optimization problems. The inverse problem community has long been developing various techniques for solving such optimization tasks. However, other, seemingly disjoint communities, such as that of machine learning, have developed, almost in parallel, interesting alternative methods which might have stayed under the radar of the inverse problem community. In this survey, we aim to change that. In doing so, we first discuss current state-of-the-art optimization methods widely used in inverse problems. We then survey recent related advances in addressing similar challenges in problems faced by the machine learning community, and discuss their potential advantages for solving inverse problems. By highlighting the similarities among the optimization challenges faced by the inverse problem and the machine learning communities, we hope that this survey can serve as a bridge in bringing together these two communities and encourage cross fertilization of ideas.Comment: 13 page

arXiv.org e-Print Archive

University of Queensland eSpace

A stochastic nonmonotone trust-region training algorithm for image classification

Author: Angeles Martinez Calomardo
Yousefi Mahsa
Publication venue: IEEE
Publication date: 01/01/2022
Field of study

In this work, we consider the target of solving the nonlinear and nonconvex optimization problems arising in the training of deep neural networks. To this aim, we propose a nonmonotone trust-region (NTR) approach in a stochastic setting under inexact function and gradient approximations. We use the limited memory SR1 (L-SR1) updates as Hessian approximations when the curvature information is obtained by several different strategies. We provide results showing the performance of the proposed optimizer in the training of residual networks for image classification. Our results show that the proposed algorithm provides comparable or better testing accuracy than standard stochastic trust-region depending on the adopted curvature computing strategy and outperforms the well-known Adam optimizer

Archivio istituzionale della ricerca - Università di Trieste

Recommended from our members

On Learning and Optimization in Inverse Problems with Group Structured Latent Variables

Author: Paul Sounak
Publication venue: University of Chicago
Publication date: 24/07/2024
Field of study

Inverse problems are ubiquitous in science and engineering, manifesting whenever we seek to determine the underlying causes or parameters that give rise to observed data. These problems often involve latent variables, which in many cases, follow a group structure. In this class of inverse problems, we aim to estimate an unknown function after being distorted by a group action and observed via a known operator, with the observations typically being contaminated with a non-trivial level of noise. Two particular such problems of interest in this thesis are multireference alignment (MRA) and single-particle reconstruction (SPR) in cryo-electron microscopy (cryo-EM). SPR is a widely used technique for estimating the 3-D volume of a single macromolecule (often referred to as volume or signal) given several of its noisy 2-D projections taken at unknown viewing angles. In Chapter 1 we discuss the problem setting and mathematically formulate both MRA and cryo-EM. The method of moments (MoM) is a powerful technique used to suppress the noise, and provide a low-resolution ab initio initialization for the 3-D structure in cryo-EM. Maximum likelihood estimation (MLE) based approaches like Expectation Maximization (EM) or Empirical Risk Minimization (ERM) are widely used for iterative refinement of the ab initio structure to obtain high-resolution reconstructions. This thesis broadly deals with developing deep neural networks for solving inverse problems with group structured latent variables via MoM, and accelerating MLE-based methods using variance reduction techniques and second-order information. In Chapter 2 we suggest using the method of moments approach for both problems while introducing deep neural network priors. In particular, given a set of datasets, each containing observations corresponding to a single signal and distribution, our neural networks should output the signals and the distribution of group elements, with moment pairs of each dataset being the input. For MRA, we demonstrate the advantage of using the trained network to accelerate the convergence of the reconstruction of signals from moments coming from an unknown dataset. Finally, we use our method to reconstruct simulated and biological volumes in the cryo-EM setting. Chapter 3 is a direct extension of Chapter 2, in which we introduce MoM-net, a deep neural network for learning the moment inversion map for a more generalized cryo-EM setting where we assume the presence of small shifts in the projections. Our neural network is trained to output the spherical harmonic coefficients of the volumes along the distribution of rotations and shift variance, with moments from a set of datasets being the input. We also demonstrate the acceleration of convergence for the reconstruction using the trained neural network in this general cryo-EM setting, and use our method to reconstruct biological volumes. In Chapter 4 we study the same problems but using a different framework, i.e. maximum likelihood. Maximization of the likelihood function is usually carried out using first-order ERM and EM methods which suffer from slow convergence rates, while their stochastic versions have high variance in parameter updates. Stochastic variance-reduced gradient (SVRG) methods have been proposed in the literature to improve convergence rates and stability by reducing the variance of the stochastic updates. This chapter thus explores the application of SVRG and stochastic variance-reduced EM (sEM-vr) methods, along with their second-order accelerated variants, in solving MRA and SPR. A second-order acceleration of sEM-vr is also proposed. We conduct extensive experiments on simulated datasets illustrating the applicability of variance-reduced methods for both of these problems. We end with Chapter 5, where we provide final thoughts on the overarching theme of this thesis, and discuss the strengths and drawbacks of our methods, along with potential future research steps.</p

Knowledge UChicago