Search CORE

27,361 research outputs found

Data-efficient learning of feedback policies from image pixels using deep dynamical models

Author: Assael J-AM
Deisenroth MP
Schön TB
Wahlström N
Publication venue
Publication date: 08/10/2015
Field of study

Data-efficient reinforcement learning (RL) in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. We consider a particularly important instance of this challenge, the pixels-to-torques problem, where an RL agent learns a closed-loop control policy ( torques ) from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model for learning a low-dimensional feature embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning is crucial for long-term predictions, which lie at the core of the adaptive nonlinear model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art RL methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces, is lightweight and an important step toward fully autonomous end-to-end learning from pixels to torques

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Event-triggered near optimal adaptive control of interconnected systems

Author: Narayanan Vignesh
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2017
Field of study

Increased interest in complex interconnected systems like smart-grid, cyber manufacturing have attracted researchers to develop optimal adaptive control schemes to elicit a desired performance when the complex system dynamics are uncertain. In this dissertation, motivated by the fact that aperiodic event sampling saves network resources while ensuring system stability, a suite of novel event-sampled distributed near-optimal adaptive control schemes are introduced for uncertain linear and affine nonlinear interconnected systems in a forward-in-time and online manner. First, a novel stochastic hybrid Q-learning scheme is proposed to generate optimal adaptive control law and to accelerate the learning process in the presence of random delays and packet losses resulting from the communication network for an uncertain linear interconnected system. Subsequently, a novel online reinforcement learning (RL) approach is proposed to solve the Hamilton-Jacobi-Bellman (HJB) equation by using neural networks (NNs) for generating distributed optimal control of nonlinear interconnected systems using state and output feedback. To relax the state vector measurements, distributed observers are introduced. Next, using RL, an improved NN learning rule is derived to solve the HJB equation for uncertain nonlinear interconnected systems with event-triggered feedback. Distributed NN identifiers are introduced both for approximating the uncertain nonlinear dynamics and to serve as a model for online exploration. Next, the control policy and the event-sampling errors are considered as non-cooperative players and a min-max optimization problem is formulated for linear and affine nonlinear systems by using zero-sum game approach for simultaneous optimization of both the control policy and the event based sampling instants. The net result is the development of optimal adaptive event-triggered control of uncertain dynamic systems --Abstract, page iv

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Fault Tolerant Deep Reinforcement Learning for Aerospace Applications

Author: Aoun Christoph Elias
Publication venue: Scholarly Commons
Publication date: 01/07/2021
Field of study

With the growing use of Unmanned Aerial Systems, a new need has risen for intelligent algorithms that not only stabilize or control the system, but rather would also include various factors such as optimality, robustness, adaptability, tracking, decision making, and many more. In this thesis, a deep-learning-based control system is designed with fault-tolerant and disturbance rejection capabilities and applied to a high-order nonlinear dynamic system. The approach uses a Reinforcement Learning architecture that combines concepts from optimal control, robust control, and game theory to create an optimally adaptive control for disturbance rejection. Additionally, a cascaded Observer-based Kalman Filter is formulated for estimating adverse inputs to the system. Numerical simulations are presented using different nonlinear model dynamics and scenarios. The Deep Reinforcement Learning and Observer architecture is demonstrated to be a promising control system alternative for fault tolerant applications

Embry-Riddle Aeronautical University

Online Reinforcement Learning Neural Network Controller Design for Nanomanipulation

Author: Sarangapani Jagannathan
Yang Qinmin
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2007
Field of study

In this paper, a novel reinforcement learning neural network (NN)-based controller, referred to adaptive critic controller, is proposed for affine nonlinear discrete-time systems with applications to nanomanipulation. In the online NN reinforcement learning method, one NN is designated as the critic NN, which approximates the long-term cost function by assuming that the states of the nonlinear systems is available for measurement. An action NN is employed to derive an optimal control signal to track a desired system trajectory while minimizing the cost function. Online updating weight tuning schemes for these two NNs are also derived. By using the Lyapunov approach, the uniformly ultimate boundedness (UUB) of the tracking error and weight estimates is shown. Nanomanipulation implies manipulating objects with nanometer size. It takes several hours to perform a simple task in the nanoscale world. To accomplish the task automatically the proposed online learning control design is evaluated for the task of nanomanipulation and verified in the simulation environmen

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Ball and Beam Control using Adaptive PID based on Q-Learning

Author: Amiruddin Brilian Putra
Kadir Rusdhianto Effendi Abdul
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 23/11/2020
Field of study

The ball and beam system is one of the most used systems for benchmarking the controller response because it has nonlinear and unstable characteristics. Furthermore, in line with the increasing of computation power availability and artificial intelligence research intensity, especially the reinforcement learning field, nowadays plenty of researchers are working on a learning control approach for controlling systems. Due to that, in this paper, the adaptive PID controller based on Q-Learning (Q-PID) was used to control the ball position on the ball and beam system. From the simulation result, Q-PID outperforms the conventional PID and heuristic PID controller technique with the swifter settling time and lower overshoot percentage

Proceeding of the Electrical Engineering Computer Science and Informatics

Adapt-to-learn policy transfer in reinforcement learning and deep model reference adaptive control

Author: Joshi Girish
Publication venue
Publication date: 01/08/2020
Field of study

Adaptation and Learning from exploration have been a key in biological learning; Humans and animals do not learn every task in isolation; rather are able to quickly adapt the learned behaviors between similar tasks and learn new skills when presented with new situations. Inspired by this, adaptation has been an important direction of research in control as Adaptive Controllers. However, the Adaptive Controllers like Model Reference Adaptive Controller are mainly model-based controllers and do not rely on exploration instead make informed decisions exploiting the model's structure. Therefore such controllers are characterized by high sample efficiency and stability conditions and, therefore, suitable for safety-critical systems. On the other hand, we have Learning-based optimal control algorithms like Reinforcement Learning. Reinforcement learning is a trial and error method, where an agent explores the environment by taking random action and maximizing the likelihood of those particular actions that result in a higher return. However, these exploration techniques are expected to fail many times before exploring optimal policy. Therefore, they are highly sample-expensive and lack stability guarantees and hence not suitable for safety-critical systems. This thesis presents control algorithms for robotics where the best of both worlds that is ``Adaptation'' and ``Learning from exploration'' are brought together to propose new algorithms that can perform better than their conventional counterparts. In this effort, we first present an Adapt to learn policy transfer Algorithm, where we use control theoretical ideas of adaptation to transfer policy between two related but different tasks using the policy gradient method of reinforcement learning. Efficient and robust policy transfer remains a key challenge in reinforcement learning. Policy transfer through warm initialization, imitation, or interacting over a large set of agents with randomized instances, have been commonly applied to solve a variety of Reinforcement Learning (RL) tasks. However, this is far from how behavior transfer happens in the biological world: Here, we seek to answer the question: Will learning to combine adaptation reward with environmental reward lead to a more efficient transfer of policies between domains? We introduce a principled mechanism that can ``Adapt-to-Learn", which is adapt the source policy to learn to solve a target task with significant transition differences and uncertainties. Through theory and experiments, we show that our method leads to a significantly reduced sample complexity of transferring the policies between the tasks. In the second part of this thesis, information-enabled learning-based adaptive controllers like ``Gaussian Process adaptive controller using Model Reference Generative Network'' (GP-MRGeN), ``Deep Model Reference Adaptive Controller'' (DMRAC) are presented. Model reference adaptive control (MRAC) is a widely studied adaptive control methodology that aims to ensure that a nonlinear plant with significant model uncertainty behaves like a chosen reference model. MRAC methods try to adapt the system to changes by representing the system uncertainties as weighted combinations of known nonlinear functions and using weight update law that ensures that network weights are moved in the direction of minimizing the instantaneous tracking error. However, most MRAC adaptive controllers use a shallow network and only the instantaneous data for adaptation, restricting their representation capability and limiting their performance under fast-changing uncertainties and faults in the system. In this thesis, we propose a Gaussian process based adaptive controller called GP-MRGeN. We present a new approach to the online supervised training of GP models using a new architecture termed as Model Reference Generative Network (MRGeN). Our architecture is very loosely inspired by the recent success of generative neural network models. Nevertheless, our contributions ensure that the inclusion of such a model in closed-loop control does not affect the stability properties. The GP-MRGeN controller, through using a generative network, is capable of achieving higher adaptation rates without losing robustness properties of the controller, hence suitable for mitigating faults in fast-evolving systems. Further, in this thesis, we present a new neuroadaptive architecture: Deep Neural Network-based Model Reference Adaptive Control. This architecture utilizes deep neural network representations for modeling significant nonlinearities while marrying it with the boundedness guarantees that characterize MRAC based controllers. We demonstrate through simulations and analysis that DMRAC can subsume previously studied learning-based MRAC methods, such as concurrent learning and GP-MRAC. This makes DMRAC a highly powerful architecture for high-performance control of nonlinear systems with long-term learning properties. Theoretical proofs of the controller generalizing capability over unseen data points and boundedness properties of the tracking error are also presented. Experiments with the quadrotor vehicle demonstrate the controller performance in achieving reference model tracking in the presence of significant matched uncertainties. A software+communication architecture is designed to ensure online real-time inference of the deep network on a high-bandwidth computation-limited platform to achieve these results. These results demonstrate the efficacy of deep networks for high bandwidth closed-loop attitude control of unstable and nonlinear robots operating in adverse situations. We expect that this work will benefit other closed-loop deep-learning control architectures for robotics

Illinois Digital Environment for Access to Learning and Scholarship Repository

Design and evaluation of advanced intelligent flight controllers

Author: Abadi M.
Brockman G.
Dorf R. C.
Goupil P.
Kalman R. E.
Keijzer T.
Lillicrap T. P.
Lin S. H.
Looye G.
Mnih V.
Mnih V.
Sutton R. S.
Weiser C.
Zribi A.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 05/01/2020
Field of study

Reinforcement learning based methods could be feasible of solving adaptive optimal control problems for nonlinear dynamical systems. This work presents a proof of concept for applying reinforcement learning based methods to robust and adaptive flight control tasks. A framework for designing and examining these methods is introduced by means of the open research civil aircraft model (RCAM) and optimality criteria. A state-of-the-art robust flight controller - the incremental nonlinear dynamic inversion (INDI) controller - serves as a reference controller. Two intelligent control methods are introduced and examined. The deep deterministic policy gradient (DDPG) controller is selected as a promising actor critic reinforcement learning method that currently gains much attraction in the field of robotics. In addition, an adaptive version of a proportional-integral-derivative (PID) controller, the PID neural network (PIDNN) controller, is selected as the second method. The results show that all controllers are able to control the aircraft model. Moreover, the PIDNN controller exhibits improved reference tracking if a good initial guess of its weights is available. In turn, the DDPG algorithm is able to control the nonlinear aircraft model while minimizing a multi-objective value function. This work provides insight into the usability of selected intelligent controllers as flight control functions as well as a comparison to state-of-the-art flight control functions

Institute of Transport Research:Publications

Crossref

Optimal adaptive control of time-delay dynamical systems with known and uncertain dynamics

Author: Moghadam Rohollah
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2020
Field of study

Delays are found in many industrial pneumatic and hydraulic systems, and as a result, the performance of the overall closed-loop system deteriorates unless they are explicitly accounted. It is also possible that the dynamics of such systems are uncertain. On the other hand, optimal control of time-delay systems in the presence of known and uncertain dynamics by using state and output feedback is of paramount importance. Therefore, in this research, a suite of novel optimal adaptive control (OAC) techniques are undertaken for linear and nonlinear continuous time-delay systems in the presence of uncertain system dynamics using state and/or output feedback. First, the optimal regulation of linear continuous-time systems with state and input delays by utilizing a quadratic cost function over infinite horizon is addressed using state and output feedback. Next, the optimal adaptive regulation is extended to uncertain linear continuous-time systems under a mild assumption that the bounds on system matrices are known. Subsequently, the event-triggered optimal adaptive regulation of partially unknown linear continuous time systems with state-delay is addressed by using integral reinforcement learning (IRL). It is demonstrated that the optimal control policy renders asymptotic stability of the closed-loop system provided the linear time-delayed system is controllable and observable. The proposed event-triggered approach relaxed the need for continuous availability of state vector and proven to be zeno-free. Finally, the OAC using IRL neural network based control of uncertain nonlinear time-delay systems with input and state delays is investigated. An identifier is proposed for nonlinear time-delay systems to approximate the system dynamics and relax the need for the control coefficient matrix in generating the control policy. Lyapunov analysis is utilized to design the optimal adaptive controller, derive parameter/weight tuning law and verify stability of the closed-loop system”--Abstract, page iv

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine