11,122 research outputs found
Stochastic Gradient Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for
defining distant proposals with high acceptance probabilities in a
Metropolis-Hastings framework, enabling more efficient exploration of the state
space than standard random-walk proposals. The popularity of such methods has
grown significantly in recent years. However, a limitation of HMC methods is
the required gradient computation for simulation of the Hamiltonian dynamical
system-such computation is infeasible in problems involving a large sample size
or streaming data. Instead, we must rely on a noisy gradient estimate computed
from a subset of the data. In this paper, we explore the properties of such a
stochastic gradient HMC approach. Surprisingly, the natural implementation of
the stochastic approximation can be arbitrarily bad. To address this problem we
introduce a variant that uses second-order Langevin dynamics with a friction
term that counteracts the effects of the noisy gradient, maintaining the
desired target distribution as the invariant distribution. Results on simulated
data validate our theory. We also provide an application of our methods to a
classification task using neural networks and to online Bayesian matrix
factorization.Comment: ICML 2014 versio
Trajectory-Based Off-Policy Deep Reinforcement Learning
Policy gradient methods are powerful reinforcement learning algorithms and
have been demonstrated to solve many complex tasks. However, these methods are
also data-inefficient, afflicted with high variance gradient estimates, and
frequently get stuck in local optima. This work addresses these weaknesses by
combining recent improvements in the reuse of off-policy data and exploration
in parameter space with deterministic behavioral policies. The resulting
objective is amenable to standard neural network optimization strategies like
stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo.
Incorporation of previous rollouts via importance sampling greatly improves
data-efficiency, whilst stochastic optimization schemes facilitate the escape
from local optima. We evaluate the proposed approach on a series of continuous
control benchmark tasks. The results show that the proposed algorithm is able
to successfully and reliably learn solutions using fewer system interactions
than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
- …