22 research outputs found
Affine-invariant contracting-point methods for Convex Optimization
In this paper, we develop new affine-invariant algorithms for solving
composite convex minimization problems with bounded domain. We present a
general framework of Contracting-Point methods, which solve at each iteration
an auxiliary subproblem restricting the smooth part of the objective function
onto contraction of the initial domain. This framework provides us with a
systematic way for developing optimization methods of different order, endowed
with the global complexity bounds. We show that using an appropriate
affine-invariant smoothness condition, it is possible to implement one
iteration of the Contracting-Point method by one step of the pure tensor method
of degree . The resulting global rate of convergence in functional
residual is then , where is the iteration counter. It is
important that all constants in our bounds are affine-invariant. For ,
our scheme recovers well-known Frank-Wolfe algorithm, providing it with a new
interpretation by a general perspective of tensor methods. Finally, within our
framework, we present efficient implementation and total complexity analysis of
the inexact second-order scheme , called Contracting Newton method. It
can be seen as a proper implementation of the trust-region idea. Preliminary
numerical results confirm its good practical performance both in the number of
iterations, and in computational time
Convex optimization based on global lower second-order models
In this paper, we present new second-order algorithms for composite convex
optimization, called Contracting-domain Newton methods. These algorithms are
affine-invariant and based on global second-order lower approximation for the
smooth component of the objective. Our approach has an interpretation both as a
second-order generalization of the conditional gradient method, or as a variant
of trust-region scheme. Under the assumption, that the problem domain is
bounded, we prove global rate of convergence in
functional residual, where is the iteration counter, minimizing convex
functions with Lipschitz continuous Hessian. This significantly improves the
previously known bound for this type of algorithms.
Additionally, we propose a stochastic extension of our method, and present
computational results for solving empirical risk minimization problem
First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians
In this work, we develop first-order (Hessian-free) and zero-order
(derivative-free) implementations of the Cubically regularized Newton method
for solving general non-convex optimization problems. For that, we employ
finite difference approximations of the derivatives. We use a special adaptive
search procedure in our algorithms, which simultaneously fits both the
regularization constant and the parameters of the finite difference
approximations. It makes our schemes free from the need to know the actual
Lipschitz constants. Additionally, we equip our algorithms with the lazy
Hessian update that reuse a previously computed Hessian approximation matrix
for several iterations. Specifically, we prove the global complexity bound of
function and gradient evaluations for
our new Hessian-free method, and a bound of function evaluations for the derivative-free method, where
is the dimension of the problem and is the desired accuracy for
the gradient norm. These complexity bounds significantly improve the previously
known ones in terms of the joint dependence on and , for the
first-order and zeroth-order non-convex optimization
Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods
We study stochastic Cubic Newton methods for solving general possibly
non-convex minimization problems. We propose a new framework, which we call the
helper framework, that provides a unified view of the stochastic and
variance-reduced second-order algorithms equipped with global complexity
guarantees. It can also be applied to learning with auxiliary information. Our
helper framework offers the algorithm designer high flexibility for
constructing and analyzing the stochastic Cubic Newton methods, allowing
arbitrary size batches, and the use of noisy and possibly biased estimates of
the gradients and Hessians, incorporating both the variance reduction and the
lazy Hessian updates. We recover the best-known complexities for the stochastic
and variance-reduced Cubic Newton, under weak assumptions on the noise. A
direct consequence of our theory is the new lazy stochastic second-order
method, which significantly improves the arithmetic complexity for large
dimension problems. We also establish complexity bounds for the classes of
gradient-dominated objectives, that include convex and strongly convex
problems. For Auxiliary Learning, we show that using a helper (auxiliary
function) can outperform training alone if a given similarity measure is small
Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders
Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing
neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being
popular choices for cycling through random or single permutations of the
training data. However, the convergence properties of these algorithms in the
non-convex case are not fully understood. Existing results suggest that, in
realistic training scenarios where the number of epochs is smaller than the
training set size, RR may perform worse than SGD.
In this paper, we analyze a general SGD algorithm that allows for arbitrary
data orderings and show improved convergence rates for non-convex functions.
Specifically, our analysis reveals that SGD with random and single shuffling is
always faster or at least as good as classical SGD with replacement, regardless
of the number of iterations. Overall, our study highlights the benefits of
using SGD with random/single shuffling and provides new insights into its
convergence properties for non-convex optimization