8 research outputs found
Consistent Dynamic Mode Decomposition
We propose a new method for computing Dynamic Mode Decomposition (DMD)
evolution matrices, which we use to analyze dynamical systems. Unlike the
majority of existing methods, our approach is based on a variational
formulation consisting of data alignment penalty terms and constitutive
orthogonality constraints. Our method does not make any assumptions on the
structure of the data or their size, and thus it is applicable to a wide range
of problems including non-linear scenarios or extremely small observation sets.
In addition, our technique is robust to noise that is independent of the
dynamics and it does not require input data to be sequential. Our key idea is
to introduce a regularization term for the forward and backward dynamics. The
obtained minimization problem is solved efficiently using the Alternating
Method of Multipliers (ADMM) which requires two Sylvester equation solves per
iteration. Our numerical scheme converges empirically and is similar to a
provably convergent ADMM scheme. We compare our approach to various
state-of-the-art methods on several benchmark dynamical systems
On the Iteration Complexity of Smoothed Proximal ALM for Nonconvex Optimization Problem with Convex Constraints
It is well-known that the lower bound of iteration complexity for solving
nonconvex unconstrained optimization problems is , which
can be achieved by standard gradient descent algorithm when the objective
function is smooth. This lower bound still holds for nonconvex constrained
problems, while it is still unknown whether a first-order method can achieve
this lower bound. In this paper, we show that a simple single-loop first-order
algorithm called smoothed proximal augmented Lagrangian method (ALM) can
achieve such iteration complexity lower bound. The key technical contribution
is a strong local error bound for a general convex constrained problem, which
is of independent interest
Exterior-point Optimization for Nonconvex Learning
In this paper we present the nonconvex exterior-point optimization solver
(NExOS) -- a novel first-order algorithm tailored to constrained nonconvex
learning problems. We consider the problem of minimizing a convex function over
nonconvex constraints, where the projection onto the constraint set is
single-valued around local minima. A wide range of nonconvex learning problems
have this structure including (but not limited to) sparse and low-rank
optimization problems. By exploiting the underlying geometry of the constraint
set, NExOS finds a locally optimal point by solving a sequence of penalized
problems with strictly decreasing penalty parameters. NExOS solves each
penalized problem by applying a first-order algorithm, which converges linearly
to a local minimum of the corresponding penalized formulation under regularity
conditions. Furthermore, the local minima of the penalized problems converge to
a local minimum of the original problem as the penalty parameter goes to zero.
We implement NExOS in the open-source Julia package NExOS.jl, which has been
extensively tested on many instances from a wide variety of learning problems.
We demonstrate that our algorithm, in spite of being general purpose,
outperforms specialized methods on several examples of well-known nonconvex
learning problems involving sparse and low-rank optimization. For sparse
regression problems, NExOS finds locally optimal solutions which dominate
glmnet in terms of support recovery, yet its training loss is smaller by an
order of magnitude. For low-rank optimization with real-world data, NExOS
recovers solutions with 3 fold training loss reduction, but with a proportion
of explained variance that is 2 times better compared to the nuclear norm
heuristic.Comment: 40 pages, 6 figure
Recommended from our members
Convex Optimization and Extensions, with a View Toward Large-Scale Problems
Machine learning is a major source of interesting optimization problems of current interest. These problems tend to be challenging because of their enormous scale, which makes it difficult to apply traditional optimization algorithms. We explore three avenues to designing algorithms suited to handling these challenges, with a view toward large-scale ML tasks. The first is to develop better general methods for unconstrained minimization. The second is to tailor methods to the features of modern systems, namely the availability of distributed computing. The third is to use specialized algorithms to exploit specific problem structure.
Chapters 2 and 3 focus on improving quasi-Newton methods, a mainstay of unconstrained optimization. In Chapter 2, we analyze an extension of quasi-Newton methods wherein we use block updates, which add curvature information to the Hessian approximation on a higher-dimensional subspace. This defines a family of methods, Block BFGS, that form a spectrum between the classical BFGS method and Newton's method, in terms of the amount of curvature information used. We show that by adding a correction step, the Block BFGS method inherits the convergence guarantees of BFGS for deterministic problems, most notably a Q-superlinear convergence rate for strongly convex problems. To explore the tradeoff between reduced iterations and greater work per iteration of block methods, we present a set of numerical experiments.
In Chapter 3, we focus on the problem of step size determination. To obviate the need for line searches, and for pre-computing fixed step sizes, we derive an analytic step size, which we call curvature-adaptive, for self-concordant functions. This adaptive step size allows us to generalize the damped Newton method of Nesterov to other iterative methods, including gradient descent and quasi-Newton methods. We provide simple proofs of convergence, including superlinear convergence for adaptive BFGS, allowing us to obtain superlinear convergence without line searches.
In Chapter 4, we move from general algorithms to hardware-influenced algorithms. We consider a form of distributed stochastic gradient descent that we call Leader SGD, which is inspired by the Elastic Averaging SGD method. These methods are intended for distributed settings where communication between machines may be expensive, making it important to set their consensus mechanism. We show that LSGD avoids an issue with spurious stationary points that affects EASGD, and provide a convergence analysis of LSGD. In the stochastic strongly convex setting, LSGD converges at the rate O(1/k) with diminishing step sizes, matching other distributed methods. We also analyze the impact of varying communication delays, stochasticity in the selection of the leader points, and under what conditions LSGD may produce better search directions than the gradient alone.
In Chapter 5, we switch again to focus on algorithms to exploit problem structure. Specifically, we consider problems where variables satisfy multiaffine constraints, which motivates us to apply the Alternating Direction Method of Multipliers (ADMM). Problems that can be formulated with such a structure include representation learning (e.g with dictionaries) and deep learning. We show that ADMM can be applied directly to multiaffine problems. By extending the theory of nonconvex ADMM, we prove that ADMM is convergent on multiaffine problems satisfying certain assumptions, and more broadly, analyze the theoretical properties of ADMM for general problems, investigating the effect of different types of structure