Search CORE

2 research outputs found

Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

Author: Luo Rui
Wang Jianhong
Wang Jun
Yang Yaodong
Zhu Zhanxing
Publication venue
Publication date: 01/01/2018
Field of study

We propose a new sampling method, the thermostat-assisted continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large datasets and multimodal distributions. It simulates the Nos\'e-Hoover dynamics of a continuously-tempered Hamiltonian system built on the distribution of interest. A significant advantage of this method is that it is not only able to efficiently draw representative i.i.d. samples when the distribution contains multiple isolated modes, but capable of adaptively neutralising the noise arising from mini-batches and maintaining accurate sampling. While the properties of this method have been studied using synthetic distributions, experiments on three real datasets also demonstrated the gain of performance over several strong baselines with various types of neural networks plunged in

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

UCL Discovery

A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning

Author: Abdulsamad Hany
Belousov Boris
D'Eramo Carlo
Klink Pascal
Pajarinen Joni
Peters Jan
Publication venue
Publication date: 01/07/2021
Field of study

Across machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives. For reinforcement learning (RL), curricula are especially interesting, as the underlying optimization has a strong tendency to get stuck in local optima due to the exploration-exploitation trade-off. Recently, a number of approaches for an automatic generation of curricula for RL have been shown to increase performance while requiring less expert knowledge compared to manually designed curricula. However, these approaches are seldomly investigated from a theoretical perspective, preventing a deeper understanding of their mechanics. In this paper, we present an approach for automated curriculum generation in RL with a clear theoretical underpinning. More precisely, we formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks, which trades off between task complexity and the objective to match a desired task distribution. Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms in different tasks with uninformative rewards and challenging exploration requirements

arXiv.org e-Print Archive

Aaltodoc Publication Archive