747 research outputs found
Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions
We consider the problem of optimizing an approximately convex function over a
bounded convex set in using only function evaluations. The
problem is reduced to sampling from an \emph{approximately} log-concave
distribution using the Hit-and-Run method, which is shown to have the same
complexity as sampling from log-concave distributions. In
addition to extend the analysis for log-concave distributions to approximate
log-concave distributions, the implementation of the 1-dimensional sampler of
the Hit-and-Run walk requires new methods and analysis. The algorithm then is
based on simulated annealing which does not relies on first order conditions
which makes it essentially immune to local minima.
We then apply the method to different motivating problems. In the context of
zeroth order stochastic convex optimization, the proposed method produces an
-minimizer after noisy function
evaluations by inducing a -approximately log concave
distribution. We also consider in detail the case when the "amount of
non-convexity" decays towards the optimum of the function. Other applications
of the method discussed in this work include private computation of empirical
risk minimizers, two-stage stochastic programming, and approximate dynamic
programming for online learning.Comment: 27 page
Complexity of randomized algorithms for underdamped Langevin dynamics
We establish an information complexity lower bound of randomized algorithms
for simulating underdamped Langevin dynamics. More specifically, we prove that
the worst strong error is of order , for
solving a family of -dimensional underdamped Langevin dynamics, by any
randomized algorithm with only queries to , the driving Brownian
motion and its weighted integration, respectively. The lower bound we establish
matches the upper bound for the randomized midpoint method recently proposed by
Shen and Lee [NIPS 2019], in terms of both parameters and .Comment: 27 pages; some revision (e.g., Sec 2.1), and new supplementary
materials in Appendice
Algorithms for the continuous nonlinear resource allocation problem---new implementations and numerical studies
Patriksson (2008) provided a then up-to-date survey on the
continuous,separable, differentiable and convex resource allocation problem
with a single resource constraint. Since the publication of that paper the
interest in the problem has grown: several new applications have arisen where
the problem at hand constitutes a subproblem, and several new algorithms have
been developed for its efficient solution. This paper therefore serves three
purposes. First, it provides an up-to-date extension of the survey of the
literature of the field, complementing the survey in Patriksson (2008) with
more then 20 books and articles. Second, it contributes improvements of some of
these algorithms, in particular with an improvement of the pegging (that is,
variable fixing) process in the relaxation algorithm, and an improved means to
evaluate subsolutions. Third, it numerically evaluates several relaxation
(primal) and breakpoint (dual) algorithms, incorporating a variety of pegging
strategies, as well as a quasi-Newton method. Our conclusion is that our
modification of the relaxation algorithm performs the best. At least for
problem sizes up to 30 million variables the practical time complexity for the
breakpoint and relaxation algorithms is linear
Nesterov smoothing for sampling without smoothness
We study the problem of sampling from a target distribution in
whose potential is not smooth. Compared with the sampling problem with smooth
potentials, this problem is much less well-understood due to the lack of
smoothness. In this paper, we propose a novel sampling algorithm for a class of
non-smooth potentials by first approximating them by smooth potentials using a
technique that is akin to Nesterov smoothing. We then utilize sampling
algorithms on the smooth potentials to generate approximate samples from the
original non-smooth potentials. We select an appropriate smoothing intensity to
ensure that the distance between the smoothed and un-smoothed distributions is
minimal, thereby guaranteeing the algorithm's accuracy. Hence we obtain
non-asymptotic convergence results based on existing analysis of smooth
sampling. We verify our convergence result on a synthetic example and apply our
method to improve the worst-case performance of Bayesian inference on a
real-world example
Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations
We present a framework that allows for the non-asymptotic study of the 2
-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyze a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a d
--dimensional strongly log-concave distribution with condition number κ
, the algorithm is shown to produce with an O(κ5/4d1/4ϵ−1/2)
complexity samples from a distribution that, in Wasserstein distance, is at most ϵ>0
away from the target distribution
Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations
We present a framework that allows for the non-asymptotic study of the 2
-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyze a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a d
--dimensional strongly log-concave distribution with condition number κ
, the algorithm is shown to produce with an O(κ5/4d1/4ϵ−1/2)
complexity samples from a distribution that, in Wasserstein distance, is at most ϵ>0
away from the target distribution
Chain of Log-Concave Markov Chains
We introduce a theoretical framework for sampling from unnormalized densities
based on a smoothing scheme that uses an isotropic Gaussian kernel with a
single fixed noise scale. We prove one can decompose sampling from a density
(minimal assumptions made on the density) into a sequence of sampling from
log-concave conditional densities via accumulation of noisy measurements with
equal noise levels. Our construction is unique in that it keeps track of a
history of samples, making it non-Markovian as a whole, but it is lightweight
algorithmically as the history only shows up in the form of a running empirical
mean of samples. Our sampling algorithm generalizes walk-jump sampling (Saremi
& Hyv\"arinen, 2019). The "walk" phase becomes a (non-Markovian) chain of
(log-concave) Markov chains. The "jump" from the accumulated measurements is
obtained by empirical Bayes. We study our sampling algorithm quantitatively
using the 2-Wasserstein metric and compare it with various Langevin MCMC
algorithms. We also report a remarkable capacity of our algorithm to "tunnel"
between modes of a distribution
The shifted ODE method for underdamped Langevin MCMC
In this paper, we consider the underdamped Langevin diffusion (ULD) and
propose a numerical approximation using its associated ordinary differential
equation (ODE). When used as a Markov Chain Monte Carlo (MCMC) algorithm, we
show that the ODE approximation achieves a -Wasserstein error of
in
steps under
the standard smoothness and strong convexity assumptions on the target
distribution. This matches the complexity of the randomized midpoint method
proposed by Shen and Lee [NeurIPS 2019] which was shown to be order optimal by
Cao, Lu and Wang. However, the main feature of the proposed numerical method is
that it can utilize additional smoothness of the target log-density . More
concretely, we show that the ODE approximation achieves a -Wasserstein error
of in
and
steps when Lipschitz
continuity is assumed for the Hessian and third derivative of . By
discretizing this ODE using a third order Runge-Kutta method, we can obtain a
practical MCMC method that uses just two additional gradient evaluations per
step. In our experiment, where the target comes from a logistic regression,
this method shows faster convergence compared to other unadjusted Langevin MCMC
algorithms
- …