11,346 research outputs found

    Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem

    Full text link
    We study sampling as optimization in the space of measures. We focus on gradient flow-based optimization with the Langevin dynamics as a case study. We investigate the source of the bias of the unadjusted Langevin algorithm (ULA) in discrete time, and consider how to remove or reduce the bias. We point out the difficulty is that the heat flow is exactly solvable, but neither its forward nor backward method is implementable in general, except for Gaussian data. We propose the symmetrized Langevin algorithm (SLA), which should have a smaller bias than ULA, at the price of implementing a proximal gradient step in space. We show SLA is in fact consistent for Gaussian target measure, whereas ULA is not. We also illustrate various algorithms explicitly for Gaussian target measure, including gradient descent, proximal gradient, and Forward-Backward, and show they are all consistent.Comment: To appear at the Conference on Learning Theory (COLT), July 201

    Computational Bounds For Photonic Design

    Full text link
    Physical design problems, such as photonic inverse design, are typically solved using local optimization methods. These methods often produce what appear to be good or very good designs when compared to classical design methods, but it is not known how far from optimal such designs really are. We address this issue by developing methods for computing a bound on the true optimal value of a physical design problem; physical designs with objective smaller than our bound are impossible to achieve. Our bound is based on Lagrange duality and exploits the special mathematical structure of these physical design problems. For a multi-mode 2D Helmholtz resonator, numerical examples show that the bounds we compute are often close to the objective values obtained using local optimization methods, which reveals that the designs are not only good, but in fact nearly optimal. Our computational bounding method also produces, as a by-product, a reasonable starting point for local optimization methods

    Inducing Uniform Asymptotic Stability in Non-Autonomous Accelerated Optimization Dynamics via Hybrid Regularization

    Full text link
    There have been many recent efforts to study accelerated optimization algorithms from the perspective of dynamical systems. In this paper, we focus on the robustness properties of the time-varying continuous-time version of these dynamics. These properties are critical for the implementation of accelerated algorithms in feedback-based control and optimization architectures. We show that a family of dynamics related to the continuous-time limit of Nesterov's accelerated gradient method can be rendered unstable under arbitrarily small bounded disturbances. Indeed, while solutions of these dynamics may converge to the set of optimizers, in general, this set may not be uniformly asymptotically stable. To induce uniformity, and robustness as a byproduct, we propose a framework where we regularize the dynamics by using resetting mechanisms that are modeled by well-posed hybrid dynamical systems. For these hybrid dynamics, we establish uniform asymptotic stability and robustness properties, as well as convergence rates that are similar to those of the non-hybrid dynamics. We finish by characterizing a family of discretization mechanisms that retain the main stability and robustness properties of the hybrid algorithms.Comment: To appear at the 2019 IEEE Conference on Decision and Contro

    Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

    Full text link
    We develop a distributed algorithm for convex Empirical Risk Minimization, the problem of minimizing large but finite sum of convex functions over networks. The proposed algorithm is derived from directly discretizing the second-order heavy-ball differential equation and results in an accelerated convergence rate, i.e, faster than distributed gradient descent-based methods for strongly convex objectives that may not be smooth. Notably, we achieve acceleration without resorting to the well-known Nesterov's momentum approach. We provide numerical experiments and contrast the proposed method with recently proposed optimal distributed optimization algorithms

    Learning sparse optimal rule fit by safe screening

    Full text link
    In this paper, we consider linear prediction models in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyperrectangle in the input space. Since the number of all possible rules generated from the training dataset becomes extremely large, it has been difficult to consider all of them when fitting a sparse model. In this paper, we propose Safe Optimal Rule Fit (SORF) as an approach to resolve this problem, which is formulated as a convex optimization problem with sparse regularization. The proposed SORF method utilizes the fact that the set of all possible rules can be represented as a tree. By extending a recently popularized convex optimization technique called safe screening, we develop a novel method for pruning the tree such that pruned nodes are guaranteed to be irrelevant to the prediction model. This approach allows us to efficiently learn a prediction model constructed from an exponentially large number of all possible rules. We demonstrate the usefulness of the proposed method by numerical experiments using several benchmark datasets

    Stochastic Particle Gradient Descent for Infinite Ensembles

    Full text link
    The superior performance of ensemble methods with infinite models are well known. Most of these methods are based on optimization problems in infinite-dimensional spaces with some regularization, for instance, boosting methods and convex neural networks use L1L^1-regularization with the non-negative constraint. However, due to the difficulty of handling L1L^1-regularization, these problems require early stopping or a rough approximation to solve it inexactly. In this paper, we propose a new ensemble learning method that performs in a space of probability measures, that is, our method can handle the L1L^1-constraint and the non-negative constraint in a rigorous way. Such an optimization is realized by proposing a general purpose stochastic optimization method for learning probability measures via parameterization using transport maps on base models. As a result of running the method, a transport map to output an infinite ensemble is obtained, which forms a residual-type network. From the perspective of functional gradient methods, we give a convergence rate as fast as that of a stochastic optimization method for finite dimensional nonconvex problems. Moreover, we show an interior optimality property of a local optimality condition used in our analysis.Comment: 33 pages, 1 figur

    Principled Deep Neural Network Training through Linear Programming

    Full text link
    Deep Learning has received significant attention due to its impressive performance in many state-of-the-art learning tasks. Unfortunately, while very powerful, Deep Learning is not well understood theoretically and in particular only recently results for the complexity of training deep neural networks have been obtained. In this work we show that large classes of deep neural networks with various architectures (e.g., DNNs, CNNs, Binary Neural Networks, and ResNets), activation functions (e.g., ReLUs and leaky ReLUs), and loss functions (e.g., Hinge loss, Euclidean loss, etc) can be trained to near optimality with desired target accuracy using linear programming in time that is exponential in the input data and parameter space dimension and polynomial in the size of the data set; improvements of the dependence in the input dimension are known to be unlikely assuming P≠NPP\neq NP, and improving the dependence on the parameter space dimension remains open. In particular, we obtain polynomial time algorithms for training for a given fixed network architecture. Our work applies more broadly to empirical risk minimization problems which allows us to generalize various previous results and obtain new complexity results for previously unstudied architectures in the proper learning setting

    A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

    Full text link
    In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the optimal value of a convex program. The approximate Bellman operator has a computational advantage because it involves a convex optimization problem in the case of control-affine systems and convex costs. Using this feature, we propose a simple dynamic programming algorithm to evaluate the approximate value function at pre-specified grid points by solving convex optimization problems in each iteration. We show that the proposed method approximates the optimal value function with a uniform convergence property in the case of convex optimal value functions. We also propose an interpolation-free design method for a control policy, of which performance converges uniformly to the optimum as the grid resolution becomes finer. When a nonlinear control-affine system is considered, the convex optimization approach provides an approximate policy with a provable suboptimality bound. For general cases, the proposed convex formulation of dynamic programming operators can be modified as a nonconvex bi-level program, in which the inner problem is a linear program, without losing uniform convergence properties

    Continuous-Flow Graph Transportation Distances

    Full text link
    Optimal transportation distances are valuable for comparing and analyzing probability distributions, but larger-scale computational techniques for the theoretically favorable quadratic case are limited to smooth domains or regularized approximations. Motivated by fluid flow-based transportation on Rn\mathbb{R}^n, however, this paper introduces an alternative definition of optimal transportation between distributions over graph vertices. This new distance still satisfies the triangle inequality but has better scaling and a connection to continuous theories of transportation. It is constructed by adapting a Riemannian structure over probability distributions to the graph case, providing transportation distances as shortest-paths in probability space. After defining and analyzing theoretical properties of our new distance, we provide a time discretization as well as experiments verifying its effectiveness

    Lifting Vectorial Variational Problems: A Natural Formulation based on Geometric Measure Theory and Discrete Exterior Calculus

    Full text link
    Numerous tasks in imaging and vision can be formulated as variational problems over vector-valued maps. We approach the relaxation and convexification of such vectorial variational problems via a lifting to the space of currents. To that end, we recall that functionals with polyconvex Lagrangians can be reparametrized as convex one-homogeneous functionals on the graph of the function. This leads to an equivalent shape optimization problem over oriented surfaces in the product space of domain and codomain. A convex formulation is then obtained by relaxing the search space from oriented surfaces to more general currents. We propose a discretization of the resulting infinite-dimensional optimization problem using Whitney forms, which also generalizes recent "sublabel-accurate" multilabeling approaches.Comment: Oral presentation at CVPR 201
    • …
    corecore