747 research outputs found

    Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions

    Full text link
    We consider the problem of optimizing an approximately convex function over a bounded convex set in Rn\mathbb{R}^n using only function evaluations. The problem is reduced to sampling from an \emph{approximately} log-concave distribution using the Hit-and-Run method, which is shown to have the same O∗\mathcal{O}^* complexity as sampling from log-concave distributions. In addition to extend the analysis for log-concave distributions to approximate log-concave distributions, the implementation of the 1-dimensional sampler of the Hit-and-Run walk requires new methods and analysis. The algorithm then is based on simulated annealing which does not relies on first order conditions which makes it essentially immune to local minima. We then apply the method to different motivating problems. In the context of zeroth order stochastic convex optimization, the proposed method produces an ϵ\epsilon-minimizer after O∗(n7.5ϵ−2)\mathcal{O}^*(n^{7.5}\epsilon^{-2}) noisy function evaluations by inducing a O(ϵ/n)\mathcal{O}(\epsilon/n)-approximately log concave distribution. We also consider in detail the case when the "amount of non-convexity" decays towards the optimum of the function. Other applications of the method discussed in this work include private computation of empirical risk minimizers, two-stage stochastic programming, and approximate dynamic programming for online learning.Comment: 27 page

    Complexity of randomized algorithms for underdamped Langevin dynamics

    Full text link
    We establish an information complexity lower bound of randomized algorithms for simulating underdamped Langevin dynamics. More specifically, we prove that the worst L2L^2 strong error is of order Ω(d N−3/2)\Omega(\sqrt{d}\, N^{-3/2}), for solving a family of dd-dimensional underdamped Langevin dynamics, by any randomized algorithm with only NN queries to ∇U\nabla U, the driving Brownian motion and its weighted integration, respectively. The lower bound we establish matches the upper bound for the randomized midpoint method recently proposed by Shen and Lee [NIPS 2019], in terms of both parameters NN and dd.Comment: 27 pages; some revision (e.g., Sec 2.1), and new supplementary materials in Appendice

    Algorithms for the continuous nonlinear resource allocation problem---new implementations and numerical studies

    Full text link
    Patriksson (2008) provided a then up-to-date survey on the continuous,separable, differentiable and convex resource allocation problem with a single resource constraint. Since the publication of that paper the interest in the problem has grown: several new applications have arisen where the problem at hand constitutes a subproblem, and several new algorithms have been developed for its efficient solution. This paper therefore serves three purposes. First, it provides an up-to-date extension of the survey of the literature of the field, complementing the survey in Patriksson (2008) with more then 20 books and articles. Second, it contributes improvements of some of these algorithms, in particular with an improvement of the pegging (that is, variable fixing) process in the relaxation algorithm, and an improved means to evaluate subsolutions. Third, it numerically evaluates several relaxation (primal) and breakpoint (dual) algorithms, incorporating a variety of pegging strategies, as well as a quasi-Newton method. Our conclusion is that our modification of the relaxation algorithm performs the best. At least for problem sizes up to 30 million variables the practical time complexity for the breakpoint and relaxation algorithms is linear

    Nesterov smoothing for sampling without smoothness

    Full text link
    We study the problem of sampling from a target distribution in Rd\mathbb{R}^d whose potential is not smooth. Compared with the sampling problem with smooth potentials, this problem is much less well-understood due to the lack of smoothness. In this paper, we propose a novel sampling algorithm for a class of non-smooth potentials by first approximating them by smooth potentials using a technique that is akin to Nesterov smoothing. We then utilize sampling algorithms on the smooth potentials to generate approximate samples from the original non-smooth potentials. We select an appropriate smoothing intensity to ensure that the distance between the smoothed and un-smoothed distributions is minimal, thereby guaranteeing the algorithm's accuracy. Hence we obtain non-asymptotic convergence results based on existing analysis of smooth sampling. We verify our convergence result on a synthetic example and apply our method to improve the worst-case performance of Bayesian inference on a real-world example

    Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations

    Get PDF
    We present a framework that allows for the non-asymptotic study of the 2 -Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyze a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a d --dimensional strongly log-concave distribution with condition number κ , the algorithm is shown to produce with an O(κ5/4d1/4ϵ−1/2) complexity samples from a distribution that, in Wasserstein distance, is at most ϵ>0 away from the target distribution

    Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations

    Get PDF
    We present a framework that allows for the non-asymptotic study of the 2 -Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyze a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a d --dimensional strongly log-concave distribution with condition number κ , the algorithm is shown to produce with an O(κ5/4d1/4ϵ−1/2) complexity samples from a distribution that, in Wasserstein distance, is at most ϵ>0 away from the target distribution

    Chain of Log-Concave Markov Chains

    Full text link
    We introduce a theoretical framework for sampling from unnormalized densities based on a smoothing scheme that uses an isotropic Gaussian kernel with a single fixed noise scale. We prove one can decompose sampling from a density (minimal assumptions made on the density) into a sequence of sampling from log-concave conditional densities via accumulation of noisy measurements with equal noise levels. Our construction is unique in that it keeps track of a history of samples, making it non-Markovian as a whole, but it is lightweight algorithmically as the history only shows up in the form of a running empirical mean of samples. Our sampling algorithm generalizes walk-jump sampling (Saremi & Hyv\"arinen, 2019). The "walk" phase becomes a (non-Markovian) chain of (log-concave) Markov chains. The "jump" from the accumulated measurements is obtained by empirical Bayes. We study our sampling algorithm quantitatively using the 2-Wasserstein metric and compare it with various Langevin MCMC algorithms. We also report a remarkable capacity of our algorithm to "tunnel" between modes of a distribution

    The shifted ODE method for underdamped Langevin MCMC

    Get PDF
    In this paper, we consider the underdamped Langevin diffusion (ULD) and propose a numerical approximation using its associated ordinary differential equation (ODE). When used as a Markov Chain Monte Carlo (MCMC) algorithm, we show that the ODE approximation achieves a 22-Wasserstein error of ε\varepsilon in O(d13/ε23)\mathcal{O}\big(d^{\frac{1}{3}}/\varepsilon^{\frac{2}{3}}\big) steps under the standard smoothness and strong convexity assumptions on the target distribution. This matches the complexity of the randomized midpoint method proposed by Shen and Lee [NeurIPS 2019] which was shown to be order optimal by Cao, Lu and Wang. However, the main feature of the proposed numerical method is that it can utilize additional smoothness of the target log-density ff. More concretely, we show that the ODE approximation achieves a 22-Wasserstein error of ε\varepsilon in O(d25/ε25)\mathcal{O}\big(d^{\frac{2}{5}}/\varepsilon^{\frac{2}{5}}\big) and O(d/ε13)\mathcal{O}\big(\sqrt{d}/\varepsilon^{\frac{1}{3}}\big) steps when Lipschitz continuity is assumed for the Hessian and third derivative of ff. By discretizing this ODE using a third order Runge-Kutta method, we can obtain a practical MCMC method that uses just two additional gradient evaluations per step. In our experiment, where the target comes from a logistic regression, this method shows faster convergence compared to other unadjusted Langevin MCMC algorithms
    • …
    corecore