Search CORE

552 research outputs found

Empowerment for Continuous Agent-Environment Systems

Author: Anthony T.
Anthony T.
Brafman R.
Daniel Polani
Der R.
Der R.
Dietterich T.G.
Ernst D.
Girard A.
Kaplan F.
Klyubin A.S.
Klyubin A.S.
Lagoudakis M.G.
Lungarella M.
Peter Stone
Prokopenko M.
Rasmussen C.E.
Schmidhuber J.
Singh S.
Sutton R.
Tishby N.
Tobias Jung
Publication venue
Publication date: 14/01/2011
Field of study

This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning

arXiv.org e-Print Archive

Crossref

University of Hertfordshire Research Archive

Markov chain monte carlo algorithm for bayesian policy search

Author: Tavakol Aghaei Vahid
Publication venue
Publication date: 24/06/2019
Field of study

The fundamental intention in Reinforcement Learning (RL) is to seek for optimal parameters of a given parameterized policy. Policy search algorithms have paved the way for making the RL suitable for applying to complex dynamical systems, such as robotics domain, where the environment comprised of high-dimensional state and action spaces. Although many policy search techniques are based on the wide spread policy gradient methods, thanks to their appropriateness to such complex environments, their performance might be a ected by slow convergence or local optima complications. The reason for this is due to the urge for computation of the gradient components of the parameterized policy. In this study, we avail a Bayesian approach for policy search problem pertinent to the RL framework, The problem of interest is to control a discrete time Markov decision process (MDP) with continuous state and action spaces. We contribute to the eld by propounding a Particle Markov Chain Monte Carlo (P-MCMC) algorithm as a method of generating samples for the policy parameters from a posterior distribution, instead of performing gradient approximations. To do so, we adopt a prior density over policy parameters and aim for the posterior distribution where the `likelihood' is assumed to be the expected total reward. In terms of risk-sensitive scenarios, where a multiplicative expected total reward is employed to measure the performance of the policy, rather than its cumulative counterpart, our methodology is t for purpose owing to the fact that by utilizing a reward function in a multiplicative form, one can fully take sequential Monte Carlo (SMC), known as the particle lter within the iterations of the P-MCMC. it is worth mentioning that these methods have widely been used in statistical and engineering applications in recent years. Furthermore, in order to deal with the challenging problem of the policy search in large-dimensional state spaces an Adaptive MCMC algorithm will be proposed. This research is organized as follows: In Chapter 1, we commence with a general introduction and motivation to the current work and highlight the topics that are going to be covered. In Chapter 2ö a literature review pursuant to the context of the thesis will be conducted. In Chapter 3, a brief review of some popular policy gradient based RL methods is provided. We proceed with Bayesian inference notion and present Markov Chain Monte Carlo methods in Chapter 4. The original work of the thesis is formulated in this chapter where a novel SMC algorithm for policy search in RL setting is advocated. In order to exhibit the fruitfulness of the proposed algorithm in learning a parameterized policy, numerical simulations are incorporated in Chapter 5. To validate the applicability of the proposed method in real-time it will be implemented on a control problem of a physical setup of a two degree of freedom (2-DoF) robotic manipulator where its corresponding results appear in Chapter 6. Finally, concluding remarks and future work are expressed in chapter

Sabanci University Research Database

Reinforcement Learning and Planning for Preference Balancing Tasks

Author: Faust Aleksandra
Publication venue: UNM Digital Repository
Publication date: 01/07/2014
Field of study

Robots are often highly non-linear dynamical systems with many degrees of freedom, making solving motion problems computationally challenging. One solution has been reinforcement learning (RL), which learns through experimentation to automatically perform the near-optimal motions that complete a task. However, high-dimensional problems and task formulation often prove challenging for RL. We address these problems with PrEference Appraisal Reinforcement Learning (PEARL), which solves Preference Balancing Tasks (PBTs). PBTs define a problem as a set of preferences that the system must balance to achieve a goal. The method is appropriate for acceleration-controlled systems with continuous state-space and either discrete or continuous action spaces with unknown system dynamics. We show that PEARL learns a sub-optimal policy on a subset of states and actions, and transfers the policy to the expanded domain to produce a more refined plan on a class of robotic problems. We establish convergence to task goal conditions, and even when preconditions are not verifiable, show that this is a valuable method to use before other more expensive approaches. Evaluation is done on several robotic problems, such as Aerial Cargo Delivery, Multi-Agent Pursuit, Rendezvous, and Inverted Flying Pendulum both in simulation and experimentally. Additionally, PEARL is leveraged outside of robotics as an array sorting agent. The results demonstrate high accuracy and fast learning times on a large set of practical applications

A survey on policy search algorithms for learning robot controllers in a handful of trials

Author: Calinon Sylvain
Chatzilygeroudis Konstantinos
Mouret Jean-Baptiste
Stulp Freek
Vassiliades Vassilis
Publication venue
Publication date: 04/12/2019
Field of study

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on Robotic

arXiv.org e-Print Archive

Institute of Transport Research:Publications

ZENODO

INRIA a CCSD electronic archive server

HAL Descartes

Probabilistic Inference for Model Based Control

Author: De Oliveira Lucas Barcelos
Publication venue: Faculty of Engineering, School of Computer Science
Publication date: 01/01/2023
Field of study

Robotic systems are essential for enhancing productivity, automation, and performing hazardous tasks. Addressing the unpredictability of physical systems, this thesis advances robotic planning and control under uncertainty, introducing learning-based methods for managing uncertain parameters and adapting to changing environments in real-time. Our first contribution is a framework using Bayesian statistics for likelihood-free inference of model parameters. This allows employing complex simulators for designing efficient, robust controllers. The method, integrating the unscented transform with a variant of information theoretical model predictive control, shows better performance in trajectory evaluation compared to Monte Carlo sampling, easing the computational load in various control and robotics tasks. Next, we reframe robotic planning and control as a Bayesian inference problem, focusing on the posterior distribution of actions and model parameters. An implicit variational inference algorithm, performing Stein Variational Gradient Descent, estimates distributions over model parameters and control inputs in real-time. This Bayesian approach effectively handles complex multi-modal posterior distributions, vital for dynamic and realistic robot navigation. Finally, we tackle diversity in high-dimensional spaces. Our approach mitigates underestimation of uncertainty in posterior distributions, which leads to locally optimal solutions. Using the theory of rough paths, we develop an algorithm for parallel trajectory optimisation, enhancing solution diversity and avoiding mode collapse. This method extends our variational inference approach for trajectory estimation, employing diversity-enhancing kernels and leveraging path signature representation of trajectories. Empirical tests, ranging from 2-D navigation to robotic manipulators in cluttered environments, affirm our method's efficiency, outperforming existing alternatives

Sydney eScholarship

Combining reinforcement learning and optimal control for the control of nonlinear dynamical systems

Author: Abramova Ekaterina
Publication venue: Computing, Imperial College London
Publication date: 01/03/2016
Field of study

This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control, for controlling nonlinear dynamical systems with continuous states and actions. The adapted approach mimics the neural computations that allow our brain to bridge across the divide between symbolic action-selection and low-level actuation control by operating at two levels of abstraction. First, current findings demonstrate that at the level of limb coordination human behaviour is explained by linear optimal feedback control theory, where cost functions match energy and timing constraints of tasks. Second, humans learn cognitive tasks involving learning symbolic level action selection, in terms of both model-free and model-based reinforcement learning algorithms. We postulate that the ease with which humans learn complex nonlinear tasks arises from combining these two levels of abstraction. The Reinforcement Learning Optimal Control framework learns the local task dynamics from naive experience using an expectation maximization algorithm for estimation of linear dynamical systems and forms locally optimal Linear Quadratic Regulators, producing continuous low-level control. A high-level reinforcement learning agent uses these available controllers as actions and learns how to combine them in state space, while maximizing a long term reward. The optimal control costs form training signals for high-level symbolic learner. The algorithm demonstrates that a small number of locally optimal linear controllers can be combined in a smart way to solve global nonlinear control problems and forms a proof-of-principle to how the brain may bridge the divide between low-level continuous control and high-level symbolic action selection. It competes in terms of computational cost and solution quality with state-of-the-art control, which is illustrated with solutions to benchmark problems.Open Acces

Spiral - Imperial College Digital Repository

Languages and Tools for Optimization of Large-Scale Systems

Author: Åkesson Johan
Publication venue: Department of Automatic Control, Lund Institute of Technology, Lund University
Publication date: 01/01/2007
Field of study

Modeling and simulation are established techniques for solving design problems in a wide range of engineering disciplines today. Dedicated computer languages, such as Modelica, and efficient software tools are available. In this thesis, an extension of Modelica, Optimica, targeted at dynamic optimization of Modelica models is proposed. In order to demonstrate the Optimica extension, supporting software has been developed. This includes a modularly extensible Modelica compiler, the JModelica compiler, and an extension that supports also Optimica. A Modelica library for paper machine dryer section modeling, DryLib, has been developed. The classes in the library enable structured and hierarchical modeling of dryer sections at the application user level, while offering extensibility for the expert user. Based on DryLib, a parameter optimization problem, a model reduction problem, and an optimization-based control problem have been formulated and solved. A start-up optimization problem for a plate reactor has been formulated in Optimica, and solved by means of the Optimica compiler. In addition, the robustness properties of the start-up trajectories have been evaluated by means of Monte-Carlo simulation. In many control systems, it is necessary to consider interaction with a user. In this thesis, a manual control scheme for an unstable inverted pendulum system, where the inputs are bounded, is presented. The proposed controller is based on the notion of reachability sets and guarantees semi global stability for all references. An inverted pendulum on a two wheels robot has been developed. A distributed control system, including sensor processing algorithms and a stabilizing control scheme has been implemented on three on-board embedded processors

Lund University Publications

Online Learning and Planning of Dynamical Systems Using Gaussian Processes

Author: ANKIT GOYAL
Publication venue
Publication date: 13/01/2015
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

Reinforcement Learning Control with Approximation of Time-Dependent Agent Dynamics

Author: Kirkpatrick Kenton
Publication venue
Publication date: 03/10/2013
Field of study

Reinforcement Learning has received a lot of attention over the years for systems ranging from static game playing to dynamic system control. Using Reinforcement Learning for control of dynamical systems provides the benefit of learning a control policy without needing a model of the dynamics. This opens the possibility of controlling systems for which the dynamics are unknown, but Reinforcement Learning methods like Q-learning do not explicitly account for time. In dynamical systems, time-dependent characteristics can have a significant effect on the control of the system, so it is necessary to account for system time dynamics while not having to rely on a predetermined model for the system. In this dissertation, algorithms are investigated for expanding the Q-learning algorithm to account for the learning of sampling rates and dynamics approximations. For determining a proper sampling rate, it is desired to find the largest sample time that still allows the learning agent to control the system to goal achievement. An algorithm called Sampled-Data Q-learning is introduced for determining both this sample time and the control policy associated with that sampling rate. Results show that the algorithm is capable of achieving a desired sampling rate that allows for system control while not sampling “as fast as possible”. Determining an approximation of an agent’s dynamics can be beneficial for the control of hierarchical multiagent systems by allowing a high-level supervisor to use the dynamics approximations for task allocation decisions. To this end, algorithms are investigated for learning first- and second-order dynamics approximations. These algorithms are respectively called First-Order Dynamics Learning and Second-Order Dynamics Learning. The dynamics learning algorithms are evaluated on several examples that show their capability to learn accurate approximations of state dynamics. All of these algorithms are then evaluated on hierarchical multiagent systems for determining task allocation. The results show that the algorithms successfully determine appropriated sample times and accurate dynamics approximations for the agents investigated

Texas A&M Repository