40,615 research outputs found
Non-Asymptotic Convergence Analysis of Inexact Gradient Methods for Machine Learning Without Strong Convexity
Many recent applications in machine learning and data fitting call for the
algorithmic solution of structured smooth convex optimization problems.
Although the gradient descent method is a natural choice for this task, it
requires exact gradient computations and hence can be inefficient when the
problem size is large or the gradient is difficult to evaluate. Therefore,
there has been much interest in inexact gradient methods (IGMs), in which an
efficiently computable approximate gradient is used to perform the update in
each iteration. Currently, non-asymptotic linear convergence results for IGMs
are typically established under the assumption that the objective function is
strongly convex, which is not satisfied in many applications of interest; while
linear convergence results that do not require the strong convexity assumption
are usually asymptotic in nature. In this paper, we combine the best of these
two types of results and establish---under the standard assumption that the
gradient approximation errors decrease linearly to zero---the non-asymptotic
linear convergence of IGMs when applied to a class of structured convex
optimization problems. Such a class covers settings where the objective
function is not necessarily strongly convex and includes the least squares and
logistic regression problems. We believe that our techniques will find further
applications in the non-asymptotic convergence analysis of other first-order
methods
Recent Advances in Randomized Methods for Big Data Optimization
In this thesis, we discuss and develop randomized algorithms for big data problems. In particular, we study the finite-sum optimization with newly emerged variance- reduction optimization methods (Chapter 2), explore the efficiency of second-order information applied to both convex and non-convex finite-sum objectives (Chapter 3) and employ the fast first-order method in power system problems (Chapter 4).In Chapter 2, we propose two variance-reduced gradient algorithms – mS2GD and SARAH. mS2GD incorporates a mini-batching scheme for improving the theoretical complexity and practical performance of SVRG/S2GD, aiming to minimize a strongly convex function represented as the sum of an average of a large number of smooth con- vex functions and a simple non-smooth convex regularizer. While SARAH, short for StochAstic Recursive grAdient algoritHm and using a stochastic recursive gradient, targets at minimizing the average of a large number of smooth functions for both con- vex and non-convex cases. Both methods fall into the category of variance-reduction optimization, and obtain a total complexity of O((n+κ)log(1/ε)) to achieve an ε-accuracy solution for strongly convex objectives, while SARAH also maintains a sub-linear convergence for non-convex problems. Meanwhile, SARAH has a practical variant SARAH+ due to its linear convergence of the expected stochastic gradients in inner loops.In Chapter 3, we declare that randomized batches can be applied with second- order information, as to improve upon convergence in both theory and practice, with a framework of L-BFGS as a novel approach to finite-sum optimization problems. We provide theoretical analyses for both convex and non-convex objectives. Meanwhile, we propose LBFGS-F as a variant where Fisher information matrix is used instead of Hessian information, and prove it applicable to a distributed environment within the popular applications of least-square and cross-entropy losses.In Chapter 4, we develop fast randomized algorithms for solving polynomial optimization problems on the applications of alternating-current optimal power flows (ACOPF) in power system field. The traditional research on power system problem focuses on solvers using second-order method, while no randomized algorithms have been developed. First, we propose a coordinate-descent algorithm as an online solver, applied for solving time-varying optimization problems in power systems. We bound the difference between the current approximate optimal cost generated by our algorithm and the optimal cost for a relaxation using the most recent data from above by a function of the properties of the instance and the rate of change to the instance over time. Second, we focus on a steady-state problem in power systems, and study means of switching from solving a convex relaxation to Newton method working on a non-convex (augmented) Lagrangian of the problem
Conjugate gradient acceleration of iteratively re-weighted least squares methods
Iteratively Re-weighted Least Squares (IRLS) is a method for solving
minimization problems involving non-quadratic cost functions, perhaps
non-convex and non-smooth, which however can be described as the infimum over a
family of quadratic functions. This transformation suggests an algorithmic
scheme that solves a sequence of quadratic problems to be tackled efficiently
by tools of numerical linear algebra. Its general scope and its usually simple
implementation, transforming the initial non-convex and non-smooth minimization
problem into a more familiar and easily solvable quadratic optimization
problem, make it a versatile algorithm. However, despite its simplicity,
versatility, and elegant analysis, the complexity of IRLS strongly depends on
the way the solution of the successive quadratic optimizations is addressed.
For the important special case of and sparse
recovery problems in signal processing, we investigate theoretically and
numerically how accurately one needs to solve the quadratic problems by means
of the (CG) method in each iteration in order to
guarantee convergence. The use of the CG method may significantly speed-up the
numerical solution of the quadratic subproblems, in particular, when fast
matrix-vector multiplication (exploiting for instance the FFT) is available for
the matrix involved. In addition, we study convergence rates. Our modified IRLS
method outperforms state of the art first order methods such as Iterative Hard
Thresholding (IHT) or Fast Iterative Soft-Thresholding Algorithm (FISTA) in
many situations, especially in large dimensions. Moreover, IRLS is often able
to recover sparse vectors from fewer measurements than required for IHT and
FISTA.Comment: 40 page
Simba: A Scalable Bilevel Preconditioned Gradient Method for Fast Evasion of Flat Areas and Saddle Points
The convergence behaviour of first-order methods can be severely slowed down
when applied to high-dimensional non-convex functions due to the presence of
saddle points. If, additionally, the saddles are surrounded by large plateaus,
it is highly likely that the first-order methods will converge to sub-optimal
solutions. In machine learning applications, sub-optimal solutions mean poor
generalization performance. They are also related to the issue of
hyper-parameter tuning, since, in the pursuit of solutions that yield lower
errors, a tremendous amount of time is required on selecting the
hyper-parameters appropriately. A natural way to tackle the limitations of
first-order methods is to employ the Hessian information. However, methods that
incorporate the Hessian do not scale or, if they do, they are very slow for
modern applications. Here, we propose Simba, a scalable preconditioned gradient
method, to address the main limitations of the first-order methods. The method
is very simple to implement. It maintains a single precondition matrix that it
is constructed as the outer product of the moving average of the gradients. To
significantly reduce the computational cost of forming and inverting the
preconditioner, we draw links with the multilevel optimization methods. These
links enables us to construct preconditioners in a randomized manner. Our
numerical experiments verify the scalability of Simba as well as its efficacy
near saddles and flat areas. Further, we demonstrate that Simba offers a
satisfactory generalization performance on standard benchmark residual
networks. We also analyze Simba and show its linear convergence rate for
strongly convex functions
- …