Search CORE

994 research outputs found

Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)

Author: Bach Francis
Moulines Eric
Publication venue
Publication date: 10/06/2013
Field of study

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/n^{1/2}). We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running time complexity as stochastic gradient descent. For these algorithms, we provide a non-asymptotic analysis of the generalization error (in expectation, and also in high probability for least-squares), and run extensive experiments on standard machine learning benchmarks showing that they often outperform existing approaches

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

On the ergodicity properties of some adaptive MCMC algorithms

Author: Andrieu Christophe
Moulines Éric
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

In this paper we study the ergodicity properties of some adaptive Markov chain Monte Carlo algorithms (MCMC) that have been recently proposed in the literature. We prove that under a set of verifiable conditions, ergodic averages calculated from the output of a so-called adaptive MCMC sampler converge to the required value and can even, under more stringent assumptions, satisfy a central limit theorem. We prove that the conditions required are satisfied for the independent Metropolis--Hastings algorithm and the random walk Metropolis algorithm with symmetric increments. Finally, we propose an application of these results to the case where the proposal distribution of the Metropolis--Hastings update is a mixture of distributions from a curved exponential family.Comment: Published at http://dx.doi.org/10.1214/105051606000000286 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Explore Bristol Research

High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm

Author: Durmus Alain
Moulines Eric
Publication venue
Publication date: 15/07/2018
Field of study

We consider in this paper the problem of sampling a high-dimensional probability distribution

\pi

having a density with respect to the Lebesgue measure on

\mathbb{R}^d

, known up to a normalization constant

x \mapsto \pi(x)= \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y

. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that

U

is continuously differentiable,

\nabla U

is globally Lipschitz and

U

is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order

2

and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims.Comment: Supplementary material available at https://hal.inria.fr/hal-01176084/. arXiv admin note: substantial text overlap with arXiv:1507.0502

arXiv.org e-Print Archive

HAL-Polytechnique

Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm

Author: Durmus Alain
Moulines Eric
Publication venue
Publication date: 19/12/2016
Field of study

In this paper, we study a method to sample from a target distribution

\pi

over

\mathbb{R}^d

having a positive density with respect to the Lebesgue measure, known up to a normalisation factor. This method is based on the Euler discretization of the overdamped Langevin stochastic differential equation associated with

\pi

. For both constant and decreasing step sizes in the Euler discretization, we obtain non-asymptotic bounds for the convergence to the target distribution

\pi

in total variation distance. A particular attention is paid to the dependency on the dimension

d

, to demonstrate the applicability of this method in the high dimensional setting. These bounds improve and extend the results of (Dalalyan 2014)

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Testing for Homogeneity with Kernel Fisher Discriminant Analysis

Author: Bach Francis
Harchaoui Zaid
Moulines Eric
Publication venue
Publication date: 07/04/2008
Field of study

We propose to investigate test statistics for testing homogeneity in reproducing kernel Hilbert spaces. Asymptotic null distributions under null hypothesis are derived, and consistency against fixed and local alternatives is assessed. Finally, experimental evidence of the performance of the proposed approach on both artificial data and a speaker verification task is provided

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server