11,346 research outputs found
Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem
We study sampling as optimization in the space of measures. We focus on
gradient flow-based optimization with the Langevin dynamics as a case study. We
investigate the source of the bias of the unadjusted Langevin algorithm (ULA)
in discrete time, and consider how to remove or reduce the bias. We point out
the difficulty is that the heat flow is exactly solvable, but neither its
forward nor backward method is implementable in general, except for Gaussian
data. We propose the symmetrized Langevin algorithm (SLA), which should have a
smaller bias than ULA, at the price of implementing a proximal gradient step in
space. We show SLA is in fact consistent for Gaussian target measure, whereas
ULA is not. We also illustrate various algorithms explicitly for Gaussian
target measure, including gradient descent, proximal gradient, and
Forward-Backward, and show they are all consistent.Comment: To appear at the Conference on Learning Theory (COLT), July 201
Computational Bounds For Photonic Design
Physical design problems, such as photonic inverse design, are typically
solved using local optimization methods. These methods often produce what
appear to be good or very good designs when compared to classical design
methods, but it is not known how far from optimal such designs really are. We
address this issue by developing methods for computing a bound on the true
optimal value of a physical design problem; physical designs with objective
smaller than our bound are impossible to achieve. Our bound is based on
Lagrange duality and exploits the special mathematical structure of these
physical design problems. For a multi-mode 2D Helmholtz resonator, numerical
examples show that the bounds we compute are often close to the objective
values obtained using local optimization methods, which reveals that the
designs are not only good, but in fact nearly optimal. Our computational
bounding method also produces, as a by-product, a reasonable starting point for
local optimization methods
Inducing Uniform Asymptotic Stability in Non-Autonomous Accelerated Optimization Dynamics via Hybrid Regularization
There have been many recent efforts to study accelerated optimization
algorithms from the perspective of dynamical systems. In this paper, we focus
on the robustness properties of the time-varying continuous-time version of
these dynamics. These properties are critical for the implementation of
accelerated algorithms in feedback-based control and optimization
architectures. We show that a family of dynamics related to the continuous-time
limit of Nesterov's accelerated gradient method can be rendered unstable under
arbitrarily small bounded disturbances. Indeed, while solutions of these
dynamics may converge to the set of optimizers, in general, this set may not be
uniformly asymptotically stable. To induce uniformity, and robustness as a
byproduct, we propose a framework where we regularize the dynamics by using
resetting mechanisms that are modeled by well-posed hybrid dynamical systems.
For these hybrid dynamics, we establish uniform asymptotic stability and
robustness properties, as well as convergence rates that are similar to those
of the non-hybrid dynamics. We finish by characterizing a family of
discretization mechanisms that retain the main stability and robustness
properties of the hybrid algorithms.Comment: To appear at the 2019 IEEE Conference on Decision and Contro
Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE
We develop a distributed algorithm for convex Empirical Risk Minimization,
the problem of minimizing large but finite sum of convex functions over
networks. The proposed algorithm is derived from directly discretizing the
second-order heavy-ball differential equation and results in an accelerated
convergence rate, i.e, faster than distributed gradient descent-based methods
for strongly convex objectives that may not be smooth. Notably, we achieve
acceleration without resorting to the well-known Nesterov's momentum approach.
We provide numerical experiments and contrast the proposed method with recently
proposed optimal distributed optimization algorithms
Learning sparse optimal rule fit by safe screening
In this paper, we consider linear prediction models in the form of a sparse
linear combination of rules, where a rule is an indicator function defined over
a hyperrectangle in the input space. Since the number of all possible rules
generated from the training dataset becomes extremely large, it has been
difficult to consider all of them when fitting a sparse model. In this paper,
we propose Safe Optimal Rule Fit (SORF) as an approach to resolve this problem,
which is formulated as a convex optimization problem with sparse
regularization. The proposed SORF method utilizes the fact that the set of all
possible rules can be represented as a tree. By extending a recently
popularized convex optimization technique called safe screening, we develop a
novel method for pruning the tree such that pruned nodes are guaranteed to be
irrelevant to the prediction model. This approach allows us to efficiently
learn a prediction model constructed from an exponentially large number of all
possible rules. We demonstrate the usefulness of the proposed method by
numerical experiments using several benchmark datasets
Stochastic Particle Gradient Descent for Infinite Ensembles
The superior performance of ensemble methods with infinite models are well
known. Most of these methods are based on optimization problems in
infinite-dimensional spaces with some regularization, for instance, boosting
methods and convex neural networks use -regularization with the
non-negative constraint. However, due to the difficulty of handling
-regularization, these problems require early stopping or a rough
approximation to solve it inexactly. In this paper, we propose a new ensemble
learning method that performs in a space of probability measures, that is, our
method can handle the -constraint and the non-negative constraint in a
rigorous way. Such an optimization is realized by proposing a general purpose
stochastic optimization method for learning probability measures via
parameterization using transport maps on base models. As a result of running
the method, a transport map to output an infinite ensemble is obtained, which
forms a residual-type network. From the perspective of functional gradient
methods, we give a convergence rate as fast as that of a stochastic
optimization method for finite dimensional nonconvex problems. Moreover, we
show an interior optimality property of a local optimality condition used in
our analysis.Comment: 33 pages, 1 figur
Principled Deep Neural Network Training through Linear Programming
Deep Learning has received significant attention due to its impressive
performance in many state-of-the-art learning tasks. Unfortunately, while very
powerful, Deep Learning is not well understood theoretically and in particular
only recently results for the complexity of training deep neural networks have
been obtained. In this work we show that large classes of deep neural networks
with various architectures (e.g., DNNs, CNNs, Binary Neural Networks, and
ResNets), activation functions (e.g., ReLUs and leaky ReLUs), and loss
functions (e.g., Hinge loss, Euclidean loss, etc) can be trained to near
optimality with desired target accuracy using linear programming in time that
is exponential in the input data and parameter space dimension and polynomial
in the size of the data set; improvements of the dependence in the input
dimension are known to be unlikely assuming , and improving the
dependence on the parameter space dimension remains open. In particular, we
obtain polynomial time algorithms for training for a given fixed network
architecture. Our work applies more broadly to empirical risk minimization
problems which allows us to generalize various previous results and obtain new
complexity results for previously unstudied architectures in the proper
learning setting
A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces
In this paper, a convex optimization-based method is proposed for numerically
solving dynamic programs in continuous state and action spaces. The key idea is
to approximate the output of the Bellman operator at a particular state by the
optimal value of a convex program. The approximate Bellman operator has a
computational advantage because it involves a convex optimization problem in
the case of control-affine systems and convex costs. Using this feature, we
propose a simple dynamic programming algorithm to evaluate the approximate
value function at pre-specified grid points by solving convex optimization
problems in each iteration. We show that the proposed method approximates the
optimal value function with a uniform convergence property in the case of
convex optimal value functions. We also propose an interpolation-free design
method for a control policy, of which performance converges uniformly to the
optimum as the grid resolution becomes finer. When a nonlinear control-affine
system is considered, the convex optimization approach provides an approximate
policy with a provable suboptimality bound. For general cases, the proposed
convex formulation of dynamic programming operators can be modified as a
nonconvex bi-level program, in which the inner problem is a linear program,
without losing uniform convergence properties
Continuous-Flow Graph Transportation Distances
Optimal transportation distances are valuable for comparing and analyzing
probability distributions, but larger-scale computational techniques for the
theoretically favorable quadratic case are limited to smooth domains or
regularized approximations. Motivated by fluid flow-based transportation on
, however, this paper introduces an alternative definition of
optimal transportation between distributions over graph vertices. This new
distance still satisfies the triangle inequality but has better scaling and a
connection to continuous theories of transportation. It is constructed by
adapting a Riemannian structure over probability distributions to the graph
case, providing transportation distances as shortest-paths in probability
space. After defining and analyzing theoretical properties of our new distance,
we provide a time discretization as well as experiments verifying its
effectiveness
Lifting Vectorial Variational Problems: A Natural Formulation based on Geometric Measure Theory and Discrete Exterior Calculus
Numerous tasks in imaging and vision can be formulated as variational
problems over vector-valued maps. We approach the relaxation and
convexification of such vectorial variational problems via a lifting to the
space of currents. To that end, we recall that functionals with polyconvex
Lagrangians can be reparametrized as convex one-homogeneous functionals on the
graph of the function. This leads to an equivalent shape optimization problem
over oriented surfaces in the product space of domain and codomain. A convex
formulation is then obtained by relaxing the search space from oriented
surfaces to more general currents. We propose a discretization of the resulting
infinite-dimensional optimization problem using Whitney forms, which also
generalizes recent "sublabel-accurate" multilabeling approaches.Comment: Oral presentation at CVPR 201
- …