31,753 research outputs found
On the construction of probabilistic Newton-type algorithms
It has recently been shown that many of the existing quasi-Newton algorithms
can be formulated as learning algorithms, capable of learning local models of
the cost functions. Importantly, this understanding allows us to safely start
assembling probabilistic Newton-type algorithms, applicable in situations where
we only have access to noisy observations of the cost function and its
derivatives. This is where our interest lies.
We make contributions to the use of the non-parametric and probabilistic
Gaussian process models in solving these stochastic optimisation problems.
Specifically, we present a new algorithm that unites these approximations
together with recent probabilistic line search routines to deliver a
probabilistic quasi-Newton approach.
We also show that the probabilistic optimisation algorithms deliver promising
results on challenging nonlinear system identification problems where the very
nature of the problem is such that we can only access the cost function and its
derivative via noisy observations, since there are no closed-form expressions
available
Trajectory-Based Off-Policy Deep Reinforcement Learning
Policy gradient methods are powerful reinforcement learning algorithms and
have been demonstrated to solve many complex tasks. However, these methods are
also data-inefficient, afflicted with high variance gradient estimates, and
frequently get stuck in local optima. This work addresses these weaknesses by
combining recent improvements in the reuse of off-policy data and exploration
in parameter space with deterministic behavioral policies. The resulting
objective is amenable to standard neural network optimization strategies like
stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo.
Incorporation of previous rollouts via importance sampling greatly improves
data-efficiency, whilst stochastic optimization schemes facilitate the escape
from local optima. We evaluate the proposed approach on a series of continuous
control benchmark tasks. The results show that the proposed algorithm is able
to successfully and reliably learn solutions using fewer system interactions
than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
- …