1,601 research outputs found

    PILCO: A Model-Based and Data-Efficient Approach to Policy Search

    No full text
    In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. Copyright 2011 by the author(s)/owner(s)

    State-Space Inference and Learning with Gaussian Processes

    No full text
    State-space inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear state-space models that are described probabilistically by non-parametric GP models. We apply the expectation maximization algorithm to iterate between inference in the latent state-space and learning the parameters of the underlying GP dynamics model. Copyright 2010 by the authors

    Model-Based Reinforcement Learning with Continuous States and Actions

    No full text
    Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment

    Approximate Dynamic Programming with Gaussian Processes

    Get PDF
    In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal statefeedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem

    Multi-view Regularized Gaussian Processes

    Full text link
    Gaussian processes (GPs) have been proven to be powerful tools in various areas of machine learning. However, there are very few applications of GPs in the scenario of multi-view learning. In this paper, we present a new GP model for multi-view learning. Unlike existing methods, it combines multiple views by regularizing marginal likelihood with the consistency among the posterior distributions of latent functions from different views. Moreover, we give a general point selection scheme for multi-view learning and improve the proposed model by this criterion. Experimental results on multiple real world data sets have verified the effectiveness of the proposed model and witnessed the performance improvement through employing this novel point selection scheme

    Gaussian process model based predictive control

    Get PDF
    Gaussian process models provide a probabilistic non-parametric modelling approach for black-box identification of non-linear dynamic systems. The Gaussian processes can highlight areas of the input space where prediction quality is poor, due to the lack of data or its complexity, by indicating the higher variance around the predicted mean. Gaussian process models contain noticeably less coefficients to be optimized. This paper illustrates possible application of Gaussian process models within model-based predictive control. The extra information provided within Gaussian process model is used in predictive control, where optimization of control signal takes the variance information into account. The predictive control principle is demonstrated on control of pH process benchmark

    Suggesting Cooking Recipes Through Simulation and Bayesian Optimization

    Full text link
    Cooking typically involves a plethora of decisions about ingredients and tools that need to be chosen in order to write a good cooking recipe. Cooking can be modelled in an optimization framework, as it involves a search space of ingredients, kitchen tools, cooking times or temperatures. If we model as an objective function the quality of the recipe, several problems arise. No analytical expression can model all the recipes, so no gradients are available. The objective function is subjective, in other words, it contains noise. Moreover, evaluations are expensive both in time and human resources. Bayesian Optimization (BO) emerges as an ideal methodology to tackle problems with these characteristics. In this paper, we propose a methodology to suggest recipe recommendations based on a Machine Learning (ML) model that fits real and simulated data and BO. We provide empirical evidence with two experiments that support the adequacy of the methodology

    Gaussian Process priors with uncertain inputs? Application to multiple-step ahead time series forecasting

    Get PDF
    We consider the problem of multi-step ahead prediction in time series analysis using the non-parametric Gaussian process model. k-step ahead forecasting of a discrete-time non-linear dynamic system can be performed by doing repeated one-step ahead predictions. For a state-space model of the form y t = f(Yt-1 ,..., Yt-L ), the prediction of y at time t + k is based on the point estimates of the previous outputs. In this paper, we show how, using an analytical Gaussian approximation, we can formally incorporate the uncertainty about intermediate regressor values, thus updating the uncertainty on the current prediction

    Robust Filtering and Smoothing with Gaussian Processes

    Full text link
    We propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior probability distributions. This modern way of "system identification" is more robust than finding point estimates of a parametric function representation. In this article, we present a principled algorithm for robust analytic smoothing in GP dynamic systems, which are increasingly used in robotics and control. Our numerical evaluations demonstrate the robustness of the proposed approach in situations where other state-of-the-art Gaussian filters and smoothers can fail.Comment: 7 pages, 1 figure, draft version of paper accepted at IEEE Transactions on Automatic Contro

    A Practical and Conceptual Framework for Learning in Control

    No full text
    We propose a fully Bayesian approach for efficient reinforcement learning (RL) in Markov decision processes with continuous-valued state and action spaces when no expert knowledge is available. Our framework is based on well-established ideas from statistics and machine learning and learns fast since it carefully models, quantifies, and incorporates available knowledge when making decisions. The key ingredient of our framework is a probabilistic model, which is implemented using a Gaussian process (GP), a distribution over functions. In the context of dynamic systems, the GP models the transition function. By considering all plausible transition functions simultaneously, we reduce model bias, a problem that frequently occurs when deterministic models are used. Due to its generality and efficiency, our RL framework can be considered a conceptual and practical approach to learning models and controllers whe
    corecore