31,753 research outputs found

    On the construction of probabilistic Newton-type algorithms

    Full text link
    It has recently been shown that many of the existing quasi-Newton algorithms can be formulated as learning algorithms, capable of learning local models of the cost functions. Importantly, this understanding allows us to safely start assembling probabilistic Newton-type algorithms, applicable in situations where we only have access to noisy observations of the cost function and its derivatives. This is where our interest lies. We make contributions to the use of the non-parametric and probabilistic Gaussian process models in solving these stochastic optimisation problems. Specifically, we present a new algorithm that unites these approximations together with recent probabilistic line search routines to deliver a probabilistic quasi-Newton approach. We also show that the probabilistic optimisation algorithms deliver promising results on challenging nonlinear system identification problems where the very nature of the problem is such that we can only access the cost function and its derivative via noisy observations, since there are no closed-form expressions available

    Trajectory-Based Off-Policy Deep Reinforcement Learning

    Full text link
    Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201
    • …
    corecore