Search CORE

201,733 research outputs found

Sampling Can Be Faster Than Optimization

Author: Chen Yuansi
Flammarion Nicolas
Jin Chi
Jordan Michael I.
Ma Yi-An
Publication venue
Publication date: 22/08/2019
Field of study

Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years. There is, however, limited theoretical understanding of the relationships between these two kinds of methodology, and limited understanding of relative strengths and weaknesses. Moreover, existing results have been obtained primarily in the setting of convex functions (for optimization) and log-concave functions (for sampling). In this setting, where local properties determine global properties, optimization algorithms are unsurprisingly more efficient computationally than sampling algorithms. We instead examine a class of nonconvex objective functions that arise in mixture modeling and multi-stable systems. In this nonconvex setting, we find that the computational complexity of sampling algorithms scales linearly with the model dimension while that of optimization algorithms scales exponentially

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

Author: Caetano Tibério S.
Defazio Aaron J.
Domke Justin
Publication venue
Publication date: 10/07/2014
Field of study

Recent advances in optimization theory have shown that smooth strongly convex finite sums can be minimized faster than by treating them as a black box "batch" problem. In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms. This method is also amendable to a sampling without replacement scheme that in practice gives further speed-ups. We give empirical results showing state of the art performance

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

Author: Bach Francis
Lacoste-Julien Simon
Lindsten Fredrik
Publication venue
Publication date: 10/02/2015
Field of study

Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and "kernel herding" was shown to be a special case of this procedure). In this paper, we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization. By optimizing the position of the particles, we can obtain better accuracy than random or quasi-Monte Carlo sampling. In applications where the evaluation of the emission probabilities is expensive (such as in robot localization), the additional computational cost to generate the particles through optimization can be justified. Experiments on standard synthetic examples as well as on a robot localization task indicate indeed an improvement of accuracy over random and quasi-Monte Carlo sampling.Comment: in 18th International Conference on Artificial Intelligence and Statistics (AISTATS), May 2015, San Diego, United States. 38, JMLR Workshop and Conference Proceeding

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

Author: Bach Francis
Holzmüller David
Publication venue
Publication date: 06/04/2023
Field of study

Sampling from Gibbs distributions

p(x) \propto \exp(-V(x)/\varepsilon)

and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials

V

, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions

V

allow faster convergence rates. Specifically, for

m

-times differentiable functions in

d

dimensions, the optimal rate for algorithms with

n

function evaluations is known to be

O(n^{-m/d})

, where the constant can potentially depend on

m, d

and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime

O(n^{3.5})

, where the exponent

3.5

is independent of

m

d

. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of

m

and

d

. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.Comment: Changes in v2: Minor corrections and formatting changes. Plots can be reproduced using the code at https://github.com/dholzmueller/sampling_experiment

arXiv.org e-Print Archive

Recommended from our members

Exploiting the potential energy landscape to sample free energy

Author: Ballard AJ
Martiniani S
Somani S
Stevenson JD
Wales DJ
Publication venue: Wiley Interdisciplinary Reviews: Computational Molecular Science
Publication date: 20/03/2015
Field of study

We review a number of recently developed strategies for enhanced sampling of complex systems based on knowledge of the potential energy landscape. We describe four approaches, replica exchange, Kirkwood sampling, superposition-enhanced nested sampling, and basin sampling, and show how each of them can exploit information for low-lying potential energy minima obtained using basin-hopping global optimization. Characterizing these minima is generally much faster than equilibrium thermodynamic sampling, because large steps in configuration space between local minima can be used without concern for maintaining detailed balance.The authors gratefully acknowledge financial support from the EPSRC and the ERC. S.M acknowledges financial support from the Gates Cambridge Scholarship.This is the accepted manuscript. The final published version is available at http://onlinelibrary.wiley.com/doi/10.1002/wcms.1217/abstract

Apollo (Cambridge)