Search CORE

73,074 research outputs found

Stochastic Gradient Hamiltonian Monte Carlo

Author: Carlos Guestrin
Emily B. Fox
Hamiltonian Monte Carlo
Tianqi Chen
Publication venue
Publication date: 12/05/2014
Field of study

Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.Comment: ICML 2014 versio

arXiv.org e-Print Archive

CiteSeerX

Rainbow: Combining Improvements in Deep Reinforcement Learning

Author: Azar Mohammad
Dabney Will
Hessel Matteo
Horgan Dan
Modayil Joseph
Ostrovski Georg
Piot Bilal
Schaul Tom
Silver David
van Hasselt Hado
Publication venue
Publication date: 06/10/2017
Field of study

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.Comment: Under review as a conference paper at AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications