994 research outputs found
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)
We consider the stochastic approximation problem where a convex function has
to be minimized, given only the knowledge of unbiased estimates of its
gradients at certain points, a framework which includes machine learning
methods based on the minimization of the empirical risk. We focus on problems
without strong convexity, for which all previously known algorithms achieve a
convergence rate for function values of O(1/n^{1/2}). We consider and analyze
two algorithms that achieve a rate of O(1/n) for classical supervised learning
problems. For least-squares regression, we show that averaged stochastic
gradient descent with constant step-size achieves the desired rate. For
logistic regression, this is achieved by a simple novel stochastic gradient
algorithm that (a) constructs successive local quadratic approximations of the
loss functions, while (b) preserving the same running time complexity as
stochastic gradient descent. For these algorithms, we provide a non-asymptotic
analysis of the generalization error (in expectation, and also in high
probability for least-squares), and run extensive experiments on standard
machine learning benchmarks showing that they often outperform existing
approaches
On the ergodicity properties of some adaptive MCMC algorithms
In this paper we study the ergodicity properties of some adaptive Markov
chain Monte Carlo algorithms (MCMC) that have been recently proposed in the
literature. We prove that under a set of verifiable conditions, ergodic
averages calculated from the output of a so-called adaptive MCMC sampler
converge to the required value and can even, under more stringent assumptions,
satisfy a central limit theorem. We prove that the conditions required are
satisfied for the independent Metropolis--Hastings algorithm and the random
walk Metropolis algorithm with symmetric increments. Finally, we propose an
application of these results to the case where the proposal distribution of the
Metropolis--Hastings update is a mixture of distributions from a curved
exponential family.Comment: Published at http://dx.doi.org/10.1214/105051606000000286 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm
We consider in this paper the problem of sampling a high-dimensional
probability distribution having a density with respect to the Lebesgue
measure on , known up to a normalization constant . Such problem naturally occurs for example in Bayesian inference and machine
learning. Under the assumption that is continuously differentiable, is globally Lipschitz and is strongly convex, we obtain non-asymptotic
bounds for the convergence to stationarity in Wasserstein distance of order
and total variation distance of the sampling method based on the Euler
discretization of the Langevin stochastic differential equation, for both
constant and decreasing step sizes. The dependence on the dimension of the
state space of these bounds is explicit. The convergence of an appropriately
weighted empirical measure is also investigated and bounds for the mean square
error and exponential deviation inequality are reported for functions which are
measurable and bounded. An illustration to Bayesian inference for binary
regression is presented to support our claims.Comment: Supplementary material available at
https://hal.inria.fr/hal-01176084/. arXiv admin note: substantial text
overlap with arXiv:1507.0502
Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm
In this paper, we study a method to sample from a target distribution
over having a positive density with respect to the Lebesgue
measure, known up to a normalisation factor. This method is based on the Euler
discretization of the overdamped Langevin stochastic differential equation
associated with . For both constant and decreasing step sizes in the Euler
discretization, we obtain non-asymptotic bounds for the convergence to the
target distribution in total variation distance. A particular attention
is paid to the dependency on the dimension , to demonstrate the
applicability of this method in the high dimensional setting. These bounds
improve and extend the results of (Dalalyan 2014)
Testing for Homogeneity with Kernel Fisher Discriminant Analysis
We propose to investigate test statistics for testing homogeneity in
reproducing kernel Hilbert spaces. Asymptotic null distributions under null
hypothesis are derived, and consistency against fixed and local alternatives is
assessed. Finally, experimental evidence of the performance of the proposed
approach on both artificial data and a speaker verification task is provided
- …