23 research outputs found
Robust Block Coordinate Descent
In this paper we present a novel randomized block coordinate descent method
for the minimization of a convex composite objective function. The method uses
(approximate) partial second-order (curvature) information, so that the
algorithm performance is more robust when applied to highly nonseparable or ill
conditioned problems. We call the method Robust Coordinate Descent (RCD). At
each iteration of RCD, a block of coordinates is sampled randomly, a quadratic
model is formed about that block and the model is minimized
approximately/inexactly to determine the search direction. An inexpensive line
search is then employed to ensure a monotonic decrease in the objective
function and acceptance of large step sizes. We prove global convergence of the
RCD algorithm, and we also present several results on the local convergence of
RCD for strongly convex functions. Finally, we present numerical results on
large-scale problems to demonstrate the practical performance of the method.Comment: 23 pages, 6 figure
Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points
This work shows that applying Gradient Descent (GD) with a fixed step size to
minimize a (possibly nonconvex) quadratic function is equivalent to running the
Power Method (PM) on the gradients. The connection between GD with a fixed step
size and the PM, both with and without fixed momentum, is thus established.
Consequently, valuable eigen-information is available via GD.
Recent examples show that GD with a fixed step size, applied to locally
quadratic nonconvex functions, can take exponential time to escape saddle
points (Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Aarti Singh, and
Barnabas Poczos: "Gradient descent can take exponential time to escape saddle
points"; S. Paternain, A. Mokhtari, and A. Ribeiro: "A newton-based method for
nonconvex optimization with fast evasion of saddle points"). Here, those
examples are revisited and it is shown that eigenvalue information was missing,
so that the examples may not provide a complete picture of the potential
practical behaviour of GD. Thus, ongoing investigation of the behaviour of GD
on nonconvex functions, possibly with an \emph{adaptive} or \emph{variable}
step size, is warranted.
It is shown that, in the special case of a quadratic in , if an
eigenvalue is known, then GD with a fixed step size will converge in two
iterations, and a complete eigen-decomposition is available.
By considering the dynamics of the gradients and iterates, new step size
strategies are proposed to improve the practical performance of GD. Several
numerical examples are presented, which demonstrate the advantages of
exploiting the GD--PM connection
Inexact Coordinate Descent: Complexity and Preconditioning
In this paper we consider the problem of minimizing a convex function using a randomized block coordinate descent method. One of the key steps at each iteration of the algorithm is determining the update to a block of variables. Existing algorithms assume that in order to compute the update, a particular subproblem is solved exactly. In his work we relax this requirement, and allow for the subproblem to be solved inexactly, leading to an inexact block coordinate descent method. Our approach incorporates the best known results for exact updates as a special case. Moreover, these theoretical guarantees are complemented by practical considerations: the use of iterative techniques to determine the update as well as the use of preconditioning for further acceleration
Domain specific transfer learning using image mixing and stochastic image selection
Can a gradual transition from the source to the target dataset improve knowledge transfer when fine-tuning a convolutional neural network to a new domain? Can we use training examples from general image datasets to improve classification on fine-grained datasets? We present two image similarity metrics and two methods for progressively transitioning from the source dataset to the target dataset when fine-tuning to a new domain. Preliminary results, using the Flowers 102 dataset, show that the first proposed method, stochastic domain subset training, gives an improvement in classification accuracy compared to standard fine-tuning, for one of the two similarity metrics. However, the second method, continuous domain subset training, results in a reduction in classification performance