201,733 research outputs found

    Sampling Can Be Faster Than Optimization

    Full text link
    Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years. There is, however, limited theoretical understanding of the relationships between these two kinds of methodology, and limited understanding of relative strengths and weaknesses. Moreover, existing results have been obtained primarily in the setting of convex functions (for optimization) and log-concave functions (for sampling). In this setting, where local properties determine global properties, optimization algorithms are unsurprisingly more efficient computationally than sampling algorithms. We instead examine a class of nonconvex objective functions that arise in mixture modeling and multi-stable systems. In this nonconvex setting, we find that the computational complexity of sampling algorithms scales linearly with the model dimension while that of optimization algorithms scales exponentially

    Finito: A Faster, Permutable Incremental Gradient Method for Big Data Problems

    Full text link
    Recent advances in optimization theory have shown that smooth strongly convex finite sums can be minimized faster than by treating them as a black box "batch" problem. In this work we introduce a new method in this class with a theoretical convergence rate four times faster than existing methods, for sums with sufficiently many terms. This method is also amendable to a sampling without replacement scheme that in practice gives further speed-ups. We give empirical results showing state of the art performance

    Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

    Get PDF
    Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and "kernel herding" was shown to be a special case of this procedure). In this paper, we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization. By optimizing the position of the particles, we can obtain better accuracy than random or quasi-Monte Carlo sampling. In applications where the evaluation of the emission probabilities is expensive (such as in robot localization), the additional computational cost to generate the particles through optimization can be justified. Experiments on standard synthetic examples as well as on a robot localization task indicate indeed an improvement of accuracy over random and quasi-Monte Carlo sampling.Comment: in 18th International Conference on Artificial Intelligence and Statistics (AISTATS), May 2015, San Diego, United States. 38, JMLR Workshop and Conference Proceeding

    Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

    Full text link
    Sampling from Gibbs distributions p(x)∝exp⁥(−V(x)/Δ)p(x) \propto \exp(-V(x)/\varepsilon) and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. However, while efficient algorithms are known for convex potentials VV, the situation is much more difficult in the non-convex case, where algorithms necessarily suffer from the curse of dimensionality in the worst case. For optimization, which can be seen as a low-temperature limit of sampling, it is known that smooth functions VV allow faster convergence rates. Specifically, for mm-times differentiable functions in dd dimensions, the optimal rate for algorithms with nn function evaluations is known to be O(n−m/d)O(n^{-m/d}), where the constant can potentially depend on m,dm, d and the function to be optimized. Hence, the curse of dimensionality can be alleviated for smooth functions at least in terms of the convergence rate. Recently, it has been shown that similarly fast rates can also be achieved with polynomial runtime O(n3.5)O(n^{3.5}), where the exponent 3.53.5 is independent of mm or dd. Hence, it is natural to ask whether similar rates for sampling and log-partition computation are possible, and whether they can be realized in polynomial time with an exponent independent of mm and dd. We show that the optimal rates for sampling and log-partition computation are sometimes equal and sometimes faster than for optimization. We then analyze various polynomial-time sampling algorithms, including an extension of a recent promising optimization approach, and find that they sometimes exhibit interesting behavior but no near-optimal rates. Our results also give further insights on the relation between sampling, log-partition, and optimization problems.Comment: Changes in v2: Minor corrections and formatting changes. Plots can be reproduced using the code at https://github.com/dholzmueller/sampling_experiment
    • 

    corecore