209 research outputs found
GIANT: Globally Improved Approximate Newton Method for Distributed Optimization
For distributed computing environment, we consider the empirical risk
minimization problem and propose a distributed and communication-efficient
Newton-type optimization method. At every iteration, each worker locally finds
an Approximate NewTon (ANT) direction, which is sent to the main driver. The
main driver, then, averages all the ANT directions received from workers to
form a {\it Globally Improved ANT} (GIANT) direction. GIANT is highly
communication efficient and naturally exploits the trade-offs between local
computations and global communications in that more local computations result
in fewer overall rounds of communications. Theoretically, we show that GIANT
enjoys an improved convergence rate as compared with first-order methods and
existing distributed Newton-type methods. Further, and in sharp contrast with
many existing distributed Newton-type methods, as well as popular first-order
methods, a highly advantageous practical feature of GIANT is that it only
involves one tuning parameter. We conduct large-scale experiments on a computer
cluster and, empirically, demonstrate the superior performance of GIANT.Comment: Fixed some typos. Improved writin
Optimization Methods for Inverse Problems
Optimization plays an important role in solving many inverse problems.
Indeed, the task of inversion often either involves or is fully cast as a
solution of an optimization problem. In this light, the mere non-linear,
non-convex, and large-scale nature of many of these inversions gives rise to
some very challenging optimization problems. The inverse problem community has
long been developing various techniques for solving such optimization tasks.
However, other, seemingly disjoint communities, such as that of machine
learning, have developed, almost in parallel, interesting alternative methods
which might have stayed under the radar of the inverse problem community. In
this survey, we aim to change that. In doing so, we first discuss current
state-of-the-art optimization methods widely used in inverse problems. We then
survey recent related advances in addressing similar challenges in problems
faced by the machine learning community, and discuss their potential advantages
for solving inverse problems. By highlighting the similarities among the
optimization challenges faced by the inverse problem and the machine learning
communities, we hope that this survey can serve as a bridge in bringing
together these two communities and encourage cross fertilization of ideas.Comment: 13 page
A stochastic nonmonotone trust-region training algorithm for image classification
In this work, we consider the target of solving the nonlinear and nonconvex optimization problems arising in the training of deep neural networks. To this aim, we propose a nonmonotone trust-region (NTR) approach in a stochastic setting under inexact function and gradient approximations. We use the limited memory SR1 (L-SR1) updates as Hessian approximations when the curvature information is obtained by several different strategies. We provide results showing the performance of the proposed optimizer in the training of residual networks for image classification. Our results show that the proposed algorithm provides comparable or better testing accuracy than standard stochastic trust-region depending on the adopted curvature computing strategy and outperforms the well-known Adam optimizer
Recommended from our members
On Learning and Optimization in Inverse Problems with Group Structured Latent Variables
Inverse problems are ubiquitous in science and engineering, manifesting whenever we seek to determine the underlying causes or parameters that give rise to observed data. These problems often involve latent variables, which in many cases, follow a group structure. In this class of inverse problems, we aim to estimate an unknown function after being distorted by a group action and observed via a known operator, with the observations typically being contaminated with a non-trivial level of noise. Two particular such problems of interest in this thesis are multireference alignment (MRA) and single-particle reconstruction (SPR) in cryo-electron microscopy (cryo-EM). SPR is a widely used technique for estimating the 3-D volume of a single macromolecule (often referred to as volume or signal) given several of its noisy 2-D projections taken at unknown viewing angles. In Chapter 1 we discuss the problem setting and mathematically formulate both MRA and cryo-EM. The method of moments (MoM) is a powerful technique used to suppress the noise, and provide a low-resolution ab initio initialization for the 3-D structure in cryo-EM. Maximum likelihood estimation (MLE) based approaches like Expectation Maximization (EM) or Empirical Risk Minimization (ERM) are widely used for iterative refinement of the ab initio structure to obtain high-resolution reconstructions. This thesis broadly deals with developing deep neural networks for solving inverse problems with group structured latent variables via MoM, and accelerating MLE-based methods using variance reduction techniques and second-order information. In Chapter 2 we suggest using the method of moments approach for both problems while introducing deep neural network priors. In particular, given a set of datasets, each containing observations corresponding to a single signal and distribution, our neural networks should output the signals and the distribution of group elements, with moment pairs of each dataset being the input. For MRA, we demonstrate the advantage of using the trained network to accelerate the convergence of the reconstruction of signals from moments coming from an unknown dataset. Finally, we use our method to reconstruct simulated and biological volumes in the cryo-EM setting. Chapter 3 is a direct extension of Chapter 2, in which we introduce MoM-net, a deep neural network for learning the moment inversion map for a more generalized cryo-EM setting where we assume the presence of small shifts in the projections. Our neural network is trained to output the spherical harmonic coefficients of the volumes along the distribution of rotations and shift variance, with moments from a set of datasets being the input. We also demonstrate the acceleration of convergence for the reconstruction using the trained neural network in this general cryo-EM setting, and use our method to reconstruct biological volumes. In Chapter 4 we study the same problems but using a different framework, i.e. maximum likelihood. Maximization of the likelihood function is usually carried out using first-order ERM and EM methods which suffer from slow convergence rates, while their stochastic versions have high variance in parameter updates. Stochastic variance-reduced gradient (SVRG) methods have been proposed in the literature to improve convergence rates and stability by reducing the variance of the stochastic updates. This chapter thus explores the application of SVRG and stochastic variance-reduced EM (sEM-vr) methods, along with their second-order accelerated variants, in solving MRA and SPR. A second-order acceleration of sEM-vr is also proposed. We conduct extensive experiments on simulated datasets illustrating the applicability of variance-reduced methods for both of these problems. We end with Chapter 5, where we provide final thoughts on the overarching theme of this thesis, and discuss the strengths and drawbacks of our methods, along with potential future research steps.</p
- …