2 research outputs found
Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning
We propose a new sampling method, the thermostat-assisted
continuously-tempered Hamiltonian Monte Carlo, for Bayesian learning on large
datasets and multimodal distributions. It simulates the Nos\'e-Hoover dynamics
of a continuously-tempered Hamiltonian system built on the distribution of
interest. A significant advantage of this method is that it is not only able to
efficiently draw representative i.i.d. samples when the distribution contains
multiple isolated modes, but capable of adaptively neutralising the noise
arising from mini-batches and maintaining accurate sampling. While the
properties of this method have been studied using synthetic distributions,
experiments on three real datasets also demonstrated the gain of performance
over several strong baselines with various types of neural networks plunged in
A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning
Across machine learning, the use of curricula has shown strong empirical
potential to improve learning from data by avoiding local optima of training
objectives. For reinforcement learning (RL), curricula are especially
interesting, as the underlying optimization has a strong tendency to get stuck
in local optima due to the exploration-exploitation trade-off. Recently, a
number of approaches for an automatic generation of curricula for RL have been
shown to increase performance while requiring less expert knowledge compared to
manually designed curricula. However, these approaches are seldomly
investigated from a theoretical perspective, preventing a deeper understanding
of their mechanics. In this paper, we present an approach for automated
curriculum generation in RL with a clear theoretical underpinning. More
precisely, we formalize the well-known self-paced learning paradigm as inducing
a distribution over training tasks, which trades off between task complexity
and the objective to match a desired task distribution. Experiments show that
training on this induced distribution helps to avoid poor local optima across
RL algorithms in different tasks with uninformative rewards and challenging
exploration requirements