28,035 research outputs found
Analytic Long-Term Forecasting with Periodic Gaussian Processes
Gaussian processes are a state-of-the-art method for learning models from data. Data with an underlying periodic structure appears in many areas, e.g., in climatology or robotics. It is often important to predict the long-term evolution of such a time series, and to take the inherent periodicity explicitly into account. In a Gaussian process, periodicity can be accounted for by an appropriate kernel choice. However, the standard periodic kernel does not allow for analytic long-term forecasting. To address this shortcoming, we re-parametrize the periodic kernel, which, in combination with a double approximation, allows for analytic longterm forecasting of a periodic state evolution with Gaussian processes. Our model allows for probabilistic long-term forecasting of periodic processes, which can be valuable in Bayesian decision making, optimal control, reinforcement learning, and robotics
Safe Trajectory Sampling in Model-Based Reinforcement Learning
Model-based reinforcement learning aims to learn a policy to solve a target task by leveraging a learned dynamics model. This approach, paired with principled handling of uncertainty allows for data-efficient policy learning in robotics. However, the physical environment has feasibility and safety constraints that need to be incorporated into the policy before it is safe to execute on a real robot. In this work, we study how to enforce the aforementioned constraints in the context of model-based reinforcement learning with probabilistic dynamics models. In particular, we investigate how trajectories sampled from the learned dynamics model can be used on a real robot, while fulfilling user-specified safety requirements. We present a model-based reinforcement learning approach using Gaussian processes where safety constraints are taken into account without simplifying Gaussian assumptions on the predictive state distributions. We evaluate the proposed approach on different continuous control tasks with varying complexity and demonstrate how our safe trajectory-sampling approach can be directly used on a real robot without violating safety constraints
Dynamics of Temporal Difference Reinforcement Learning
Reinforcement learning has been successful across several applications in
which agents have to learn to act in environments with sparse feedback.
However, despite this empirical success there is still a lack of theoretical
understanding of how the parameters of reinforcement learning models and the
features used to represent states interact to control the dynamics of learning.
In this work, we use concepts from statistical physics, to study the typical
case learning curves for temporal difference learning of a value function with
linear function approximators. Our theory is derived under a Gaussian
equivalence hypothesis where averages over the random trajectories are replaced
with temporally correlated Gaussian feature averages and we validate our
assumptions on small scale Markov Decision Processes. We find that the
stochastic semi-gradient noise due to subsampling the space of possible
episodes leads to significant plateaus in the value error, unlike in
traditional gradient descent dynamics. We study how learning dynamics and
plateaus depend on feature structure, learning rate, discount factor, and
reward function. We then analyze how strategies like learning rate annealing
and reward shaping can favorably alter learning dynamics and plateaus. To
conclude, our work introduces new tools to open a new direction towards
developing a theory of learning dynamics in reinforcement learning
Optimal Reinforcement Learning for Gaussian Systems
The exploration-exploitation trade-off is among the central challenges of
reinforcement learning. The optimal Bayesian solution is intractable in
general. This paper studies to what extent analytic statements about optimal
learning are possible if all beliefs are Gaussian processes. A first order
approximation of learning of both loss and dynamics, for nonlinear,
time-varying systems in continuous time and space, subject to a relatively weak
restriction on the dynamics, is described by an infinite-dimensional partial
differential equation. An approximate finite-dimensional projection gives an
impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again,
please note some nontrivial changes to exposition and interpretation of the
results, in particular in Equation (9) and Eqs. 11-14. The algorithm and
results have remained the same, but their theoretical interpretation has
change
- …