Search CORE

476 research outputs found

Probabilistic models for data efficient reinforcement learning

Author: Kamthe Sanket
Publication venue: Computing, Imperial College London
Publication date: 01/11/2021
Field of study

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the standard deep learning methods often overlook the progress made in control theory by treating systems as black-box. We propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach not only achieves the state-of-the-art data efficiency, but also is a principled way for RL in constrained environments. When the true state of the dynamical system cannot be fully observed the standard model based methods cannot be directly applied. For these systems an additional step of state estimation is needed. We propose distributed message passing for state estimation in non-linear dynamical systems. In particular, we propose to use expectation propagation (EP) to iteratively refine the state estimate, i.e., the Gaussian posterior distribution on the latent state. We show two things: (a) Classical Rauch-Tung-Striebel (RTS) smoothers, such as the extended Kalman smoother (EKS) or the unscented Kalman smoother (UKS), are special cases of our message passing scheme; (b) running the message passing scheme more than once can lead to significant improvements over the classical RTS smoothers. We show the explicit connection between message passing with EP and well-known RTS smoothers and provide a practical implementation of the suggested algorithm. Furthermore, we address convergence issues of EP by generalising this framework to damped updates and the consideration of general -divergences. Probabilistic models can also be used to generate synthetic data. In model based RL we use ’synthetic’ data as a proxy to real environments and in order to achieve high data efficiency. The ability to generate high-fidelity synthetic data is crucial when available (real) data is limited as in RL or where privacy and data protection standards allow only for limited use of the given data, e.g., in medical and financial data-sets. Current state-of-the-art methods for synthetic data generation are based on generative models, such as Generative Adversarial Networks (GANs). Even though GANs have achieved remarkable results in synthetic data generation, they are often challenging to interpret. Furthermore, GAN-based methods can suffer when used with mixed real and categorical variables. Moreover, the loss function (discriminator loss) design itself is problem specific, i.e., the generative model may not be useful for tasks it was not explicitly trained for. In this paper, we propose to use a probabilistic model as a synthetic data generator. Learning the probabilistic model for the data is equivalent to estimating the density of the data. Based on the copula theory, we divide the density estimation task into two parts, i.e., estimating univariate marginals and estimating the multivariate copula density over the univariate marginals. We use normalising flows to learn both the copula density and univariate marginals. We benchmark our method on both simulated and real data-sets in terms of density estimation as well as the ability to generate high-fidelity synthetic data.Open Acces

Spiral - Imperial College Digital Repository

Data-driven nonlinear aeroelastic models of morphing wings for control

Author: Brunton Steven L.
Fasel Urban
Fonzi Nicola
Publication venue
Publication date: 08/02/2020
Field of study

Accurate and efficient aeroelastic models are critically important for enabling the optimization and control of highly flexible aerospace structures, which are expected to become pervasive in future transportation and energy systems. Advanced materials and morphing wing technologies are resulting in next-generation aeroelastic systems that are characterized by highly-coupled and nonlinear interactions between the aerodynamic and structural dynamics. In this work, we leverage emerging data-driven modeling techniques to develop highly accurate and tractable reduced-order aeroelastic models that are valid over a wide range of operating conditions and are suitable for control. In particular, we develop two extensions to the recent dynamic mode decomposition with control (DMDc) algorithm to make it suitable for flexible aeroelastic systems: 1) we introduce a formulation to handle algebraic equations, and 2) we develop an interpolation scheme to smoothly connect several linear DMDc models developed in different operating regimes. Thus, the innovation lies in accurately modeling the nonlinearities of the coupled aerostructural dynamics over multiple operating regimes, not restricting the validity of the model to a narrow region around a linearization point. We demonstrate this approach on a high-fidelity, three-dimensional numerical model of an airborne wind energy (AWE) system, although the methods are generally applicable to any highly coupled aeroelastic system or dynamical system operating over multiple operating regimes. Our proposed modeling framework results in real-time prediction of nonlinear unsteady aeroelastic responses of flexible aerospace structures, and we demonstrate the enhanced model performance for model predictive control. Thus, the proposed architecture may help enable the widespread adoption of next-generation morphing wing technologies

arXiv.org e-Print Archive

PubMed Central

Spiral - Imperial College Digital Repository

Recommended from our members

Bayesian Learning for Data-Efficient Control

Author: McAllister Rowan
Publication venue: University of Cambridge
Publication date: 28/04/2017
Field of study

Applications to learn control of unfamiliar dynamical systems with increasing autonomy are ubiquitous. From robotics, to finance, to industrial processing, autonomous learning helps obviate a heavy reliance on experts for system identification and controller design. Often real world systems are nonlinear, stochastic, and expensive to operate (e.g. slow, energy intensive, prone to wear and tear). Ideally therefore, nonlinear systems can be identified with minimal system interaction. This thesis considers data efficient autonomous learning of control of nonlinear, stochastic systems. Data efficient learning critically requires probabilistic modelling of dynamics. Traditional control approaches use deterministic models, which easily overfit data, especially small datasets. We use probabilistic Bayesian modelling to learn systems from scratch, similar to the PILCO algorithm, which achieved unprecedented data efficiency in learning control of several benchmarks. We extend PILCO in three principle ways. First, we learn control under significant observation noise by simulating a filtered control process using a tractably analytic framework of Gaussian distributions. In addition, we develop the ‘latent variable belief Markov decision process’ when filters must predict under real-time constraints. Second, we improve PILCO’s data efficiency by directing exploration with predictive loss uncertainty and Bayesian optimisation, including a novel approximation to the Gittins index. Third, we take a step towards data efficient learning of high-dimensional control using Bayesian neural networks (BNN). Experimentally we show although filtering mitigates adverse effects of observation noise, much greater performance is achieved when optimising controllers with evaluations faithful to reality: by simulating closed-loop filtered control if executing closed-loop filtered control. Thus, controllers are optimised w.r.t. how they are used, outperforming filters applied to systems optimised by unfiltered simulations. We show directed exploration improves data efficiency. Lastly, we show BNN dynamics models are almost as data efficient as Gaussian process models. Results show data efficient learning of high-dimensional control is possible as BNNs scale to high-dimensional state inputs

Apollo (Cambridge)

Guidance, navigation and control of multirotors

Author: Rubí Perelló Bertomeu
Publication venue: Universitat Politècnica de Catalunya
Publication date: 11/12/2020
Field of study

Aplicat embargament des de la data de defensa fins el dia 31 de desembre de 2021This thesis presents contributions to the Guidance, Navigation and Control (GNC) systems for multirotor vehicles by applying and developing diverse control techniques and machine learning theory with innovative results. The aim of the thesis is to obtain a GNC system able to make the vehicle follow predefined paths while avoiding obstacles in the vehicle's route. The system must be adaptable to different paths, situations and missions, reducing the tuning effort and parametrisation of the proposed approaches. The multirotor platform, formed by the Asctec Hummingbird quadrotor vehicle, is studied and described in detail. A complete mathematical model is obtained and a freely available and open simulation platform is built. Furthermore, an autopilot controller is designed and implemented in the real platform. The control part is focused on the path following problem. That is, following a predefined path in space without any time constraint. Diverse control-oriented and geometrical algorithms are studied, implemented and compared. Then, the geometrical algorithms are improved by obtaining adaptive approaches that do not need any parameter tuning. The adaptive geometrical approaches are developed by means of Neural Networks. To end up, a deep reinforcement learning approach is developed to solve the path following problem. This approach implements the Deep Deterministic Policy Gradient algorithm. The resulting approach is trained in a realistic multirotor simulator and tested in real experiments with success. The proposed approach is able to accurately follow a path while adapting the vehicle's velocity depending on the path's shape. In the navigation part, an obstacle detection system based on the use of a LIDAR sensor is implemented. A model of the sensor is derived and included in the simulator. Moreover, an approach for treating the sensor data to eliminate the possible ground detections is developed. The guidance part is focused on the reactive path planning problem. That is, a path planning algorithm that is able to re-plan the trajectory online if an unexpected event, such as detecting an obstacle in the vehicle's route, occurs. A deep reinforcement learning approach for the reactive obstacle avoidance problem is developed. This approach implements the Deep Deterministic Policy Gradient algorithm. The developed deep reinforcement learning agent is trained and tested in the realistic simulation platform. This agent is combined with the path following agent and the rest of the elements developed in the thesis obtaining a GNC system that is able to follow different types of paths while avoiding obstacle in the vehicle's route.Aquesta tesi doctoral presenta diverses contribucions relaciones amb els sistemes de Guiat, Navegació i Control (GNC) per a vehicles multirrotor, aplicant i desenvolupant diverses tècniques de control i de machine learning amb resultats innovadors. L'objectiu principal de la tesi és obtenir un sistema de GNC capaç de dirigir el vehicle perquè segueixi una trajectòria predefinida mentre evita els obstacles que puguin aparèixer en el recorregut del vehicle. El sistema ha de ser adaptable a diferents trajectòries, situacions i missions, reduint l'esforç realitzat en l'ajust i la parametrització dels mètodes proposats. La plataforma experimental, formada pel cuadricòpter Asctec Hummingbird, s'estudia i es descriu en detall. S'obté un model matemàtic complet de la plataforma i es desenvolupa una eina de simulació, la qual és de codi lliure. A més, es dissenya un controlador autopilot i s'implementa en la plataforma real. La part de control està enfocada al problema de path following. En aquest problema, el vehicle ha de seguir una trajectòria predefinida en l'espai sense cap tipus de restricció temporal. S'estudien, s'implementen i es comparen diversos algoritmes de control i geomètrics de path following. Després, es milloren els algoritmes geomètrics usant xarxes neuronals per convertirlos en algoritmes adaptatius. Per finalitzar, es desenvolupa un mètode de path following basat en tècniques d'aprenentatge per reforç profund (deep Reinforcement learning). Aquest mètode implementa l'algoritme Deep Deterministic Policy Gradient. L'agent intel. ligent resultant és entrenat en un simulador realista de multirotors i validat en la plataforma experimental real amb èxit. Els resultats mostren que l'agent és capaç de seguir de forma precisa la trajectòria de referència adaptant la velocitat del vehicle segons la curvatura del recorregut. A la part de navegació, s'implementa un sistema de detecció d'obstacles basat en l'ús d'un sensor LIDAR. Es deriva un model del sensor i aquest s'inclou en el simulador. A més, es desenvolupa un mètode per tractar les mesures del sensor per eliminar les possibles deteccions del terra. Pel que fa a la part de guiatge, aquesta està focalitzada en el problema de reactive path planning. És a dir, un algoritme de planificació de trajectòria que és capaç de re-planejar el recorregut del vehicle a l'instant si algun esdeveniment inesperat ocorre, com ho és la detecció d'un obstacle en el recorregut del vehicle. Es desenvolupa un mètode basat en aprenentatge per reforç profund per l'evasió d'obstacles. Aquest mètode implementa l'algoritme Deep Deterministic Policy Gradient. L'agent d'aprenentatge per reforç s'entrena i valida en un simulador de multirotors realista. Aquest agent es combina amb l'agent de path following i la resta d'elements desenvolupats en la tesi per obtenir un sistema GNC capaç de seguir diferents tipus de trajectòries, evadint els obstacles que estiguin en el recorregut del vehicle.Esta tesis doctoral presenta varias contribuciones relacionas con los sistemas de Guiado, Navegación y Control (GNC) para vehículos multirotor, aplicando y desarrollando diversas técnicas de control y de machine learning con resultados innovadores. El objetivo principal de la tesis es obtener un sistema de GNC capaz de dirigir el vehículo para que siga una trayectoria predefinida mientras evita los obstáculos que puedan aparecer en el recorrido del vehículo. El sistema debe ser adaptable a diferentes trayectorias, situaciones y misiones, reduciendo el esfuerzo realizado en el ajuste y la parametrización de los métodos propuestos. La plataforma experimental, formada por el cuadricoptero Asctec Hummingbird, se estudia y describe en detalle. Se obtiene un modelo matemático completo de la plataforma y se desarrolla una herramienta de simulación, la cual es de código libre. Además, se diseña un controlador autopilot, el cual es implementado en la plataforma real. La parte de control está enfocada en el problema de path following. En este problema, el vehículo debe seguir una trayectoria predefinida en el espacio tridimensional sin ninguna restricción temporal Se estudian, implementan y comparan varios algoritmos de control y geométricos de path following. Luego, se mejoran los algoritmos geométricos usando redes neuronales para convertirlos en algoritmos adaptativos. Para finalizar, se desarrolla un método de path following basado en técnicas de aprendizaje por refuerzo profundo (deep reinforcement learning). Este método implementa el algoritmo Deep Deterministic Policy Gradient. El agente inteligente resultante es entrenado en un simulador realista de multirotores y validado en la plataforma experimental real con éxito. Los resultados muestran que el agente es capaz de seguir de forma precisa la trayectoria de referencia adaptando la velocidad del vehículo según la curvatura del recorrido. En la parte de navegación se implementa un sistema de detección de obstáculos basado en el uso de un sensor LIDAR. Se deriva un modelo del sensor y este se incluye en el simulador. Además, se desarrolla un método para tratar las medidas del sensor para eliminar las posibles detecciones del suelo. En cuanto a la parte de guiado, está focalizada en el problema de reactive path planning. Es decir, un algoritmo de planificación de trayectoria que es capaz de re-planear el recorrido del vehículo al instante si ocurre algún evento inesperado, como lo es la detección de un obstáculo en el recorrido del vehículo. Se desarrolla un método basado en aprendizaje por refuerzo profundo para la evasión de obstáculos. Este implementa el algoritmo Deep Deterministic Policy Gradient. El agente de aprendizaje por refuerzo se entrena y valida en un simulador de multirotors realista. Este agente se combina con el agente de path following y el resto de elementos desarrollados en la tesis para obtener un sistema GNC capaz de seguir diferentes tipos de trayectorias evadiendo los obstáculos que estén en el recorrido del vehículo.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Piecewise constant model predictive control for autonomous helicopters

Author: Cunjia Liu (1176420)
J.D. Andrews (7120562)
Wen-Hua Chen (1251597)
Publication venue
Publication date: 01/01/2011
Field of study

This paper introduces an optimisation based control framework for autonomous helicopters. The framework contains a high-level model predictive control (MPC) and a low-level linear controller. The proposed MPC works in a piecewise constant fashion to reduce the computation burden and to increase the time available for performing online optimisation. The linear feedback controller responds to fast dynamics of the helicopter and compensates the low bandwidth of the high-level controller. This configuration allows the computationally intensive algorithm applied on systems with fast dynamics. The stability issues of the high-level MPC and the overall control scheme are discussed. Simulations and flight tests on a small-scale helicopter are carried out to verify the proposed control scheme

Loughborough University Institutional Repository

A hybrid model predictive control scheme for energy and cost savings in commercial buildings: simulation and experiment

Author: Chen L.
Hu E.
Huang H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This paper presents a hybrid model predictive control (MPC) scheme for energy-saving control in commercial buildings. The proposed method combines a linear MPC with a neural network feedback linearisation (NNFL) method. The control model for the linear MPC is developed using a simplified physical model, while nonlinearities associated with the building system are handled by an affine recurrent neural network (ARNN) model through system feedback. The proposed MPC integrates several advanced air-conditioning control strategies, such as an economizer control, an optimal start-stop control, and a pre-cooling control. The developed MPC has been tested in the check-in hall of T-1 building, Adelaide Airport, through both simulation and field experiment. The result shows that the proposed control scheme can achieve a considerable amount of savings without violating occupants’ thermal comfort.Hao Huang, Lei Chen, and Eric H

Crossref

Adelaide Research & Scholarship

Iterative learning control of crystallisation systems

Author: Nahid Sanzida (7127939)
Publication venue
Publication date: 01/01/2014
Field of study

Under the increasing pressure of issues like reducing the time to market, managing lower production costs, and improving the flexibility of operation, batch process industries thrive towards the production of high value added commodity, i.e. specialty chemicals, pharmaceuticals, agricultural, and biotechnology enabled products. For better design, consistent operation and improved control of batch chemical processes one cannot ignore the sensing and computational blessings provided by modern sensors, computers, algorithms, and software. In addition, there is a growing demand for modelling and control tools based on process operating data. This study is focused on developing process operation data-based iterative learning control (ILC) strategies for batch processes, more specifically for batch crystallisation systems. In order to proceed, the research took a step backward to explore the existing control strategies, fundamentals, mechanisms, and various process analytical technology (PAT) tools used in batch crystallisation control. From the basics of the background study, an operating data-driven ILC approach was developed to improve the product quality from batch-to-batch. The concept of ILC is to exploit the repetitive nature of batch processes to automate recipe updating using process knowledge obtained from previous runs. The methodology stated here was based on the linear time varying (LTV) perturbation model in an ILC framework to provide a convergent batch-to-batch improvement of the process performance indicator. In an attempt to create uniqueness in the research, a novel hierarchical ILC (HILC) scheme was proposed for the systematic design of the supersaturation control (SSC) of a seeded batch cooling crystalliser. This model free control approach is implemented in a hierarchical structure by assigning data-driven supersaturation controller on the upper level and a simple temperature controller in the lower level. In order to familiarise with other data based control of crystallisation processes, the study rehearsed the existing direct nucleation control (DNC) approach. However, this part was more committed to perform a detailed strategic investigation of different possible structures of DNC and to compare the results with that of a first principle model based optimisation for the very first time. The DNC results in fact outperformed the model based optimisation approach and established an ultimate guideline to select the preferable DNC structure. Batch chemical processes are distributed as well as nonlinear in nature which need to be operated over a wide range of operating conditions and often near the boundary of the admissible region. As the linear lumped model predictive controllers (MPCs) often subject to severe performance limitations, there is a growing demand of simple data driven nonlinear control strategy to control batch crystallisers that will consider the spatio-temporal aspects. In this study, an operating data-driven polynomial chaos expansion (PCE) based nonlinear surrogate modelling and optimisation strategy was presented for batch crystallisation processes. Model validation and optimisation results confirmed this approach as a promise to nonlinear control. The evaluations of the proposed data based methodologies were carried out by simulation case studies, laboratory experiments and industrial pilot plant experiments. For all the simulation case studies a detailed mathematical models covering reaction kinetics and heat mass balances were developed for a batch cooling crystallisation system of Paracetamol in water. Based on these models, rigorous simulation programs were developed in MATLAB®, which was then treated as the real batch cooling crystallisation system. The laboratory experimental works were carried out using a lab scale system of Paracetamol and iso-Propyl alcohol (IPA). All the experimental works including the qualitative and quantitative monitoring of the crystallisation experiments and products demonstrated an inclusive application of various in situ process analytical technology (PAT) tools, such as focused beam reflectance measurement (FBRM), UV/Vis spectroscopy and particle vision measurement (PVM) as well. The industrial pilot scale study was carried out in GlaxoSmithKline Bangladesh Limited, Bangladesh, and the system of experiments was Paracetamol and other powdered excipients used to make paracetamol tablets. The methodologies presented in this thesis provide a comprehensive framework for data-based dynamic optimisation and control of crystallisation processes. All the simulation and experimental evaluations of the proposed approaches emphasised the potential of the data-driven techniques to provide considerable advances in the current state-of-the-art in crystallisation control

Loughborough University Institutional Repository

Development of U-model enhansed nonlinear systems

Author: Liu Xin
Publication venue
Publication date
Field of study

Nonlinear control system design has been widely recognised as a challenging issue where the key objective is to develop a general model prototype with conciseness, flexibility and manipulability, so that the designed control system can best match the required performance or specifications. As a generic systematic approach, U-model concept appeared in Prof. Quanmin Zhu’s Doctoral thesis, and U-model approach was firstly published in the journal paper titled with ‘U-model based pole placement for nonlinear plants’ in 2002.The U-model polynomial prototype precisely describes a wide range of smooth nonlinear polynomial models, defined as a controller output u(t-1) based time-varying polynomial models converted from the original nonlinear model. Within this equivalent U-model expression, the first study of U-model based pole placement controller design for nonlinear plants is a simple mapping exercise from ordinary linear and nonlinear difference equations to time-varying polynomials in terms of the plant input u(t-1). The U-model framework realised the concise and applicable design for nonlinear control system by using such linear polynomial control system design approaches.Since the first publication, the U-model methodology has progressed and evolved over the course of a decade. By using the U-model technique, researchers have proposed many different linear algorithms for the design of control systems for the nonlinear polynomial model including; adaptive control, internal control, sliding mode control, predictive control and neural network control. However, limited research has been concerned with the design and analysis of robust stability and performance of U-model based control systems.This project firstly proposes a suitable method to analyse the robust stability of the developed U-model based pole placement control systems against uncertainty. The parameter variation is bounded, thus the robust stability margin of the closed loop system can be determined by using LMI (Linear Matrix Inequality) based robust stability analysis procedure. U-block model is defined as an input output linear closed loop model with pole assignor converted from the U-model based control system. With the bridge of U-model approach, it connects the linear state space design approach with the nonlinear polynomial model. Therefore, LMI based linear robust controller design approaches are able to design enhanced robust control system within the U-block model structure.With such development, the first stage U-model methodology provides concise and flexible solutions for complex problems, where linear controller design methodologies are directly applied to nonlinear polynomial plant-based control system design. The next milestone work expands the U-model technique into state space control systems to establish the new framework, defined as the U-state space model, providing a generic prototype for the simplification of nonlinear state space design approaches.The U-state space model is first described as a controller output u(t-1) based time-varying state equations, which is equivalent to the original linear/nonlinear state space models after conversion. Then, a basic idea of corresponding U-state feedback control system design method is proposed based on the U-model principle. The linear state space feedback control design approach is employed to nonlinear plants described in state space realisation under U-state space structure. The desired state vectors defined as xd(t), are determined by closed loop performance (such as pole placement) or designer specifications (such as LQR). Then the desired state vectors substitute the desired state vectors into original state space equations (regarded as next time state variable xd(t) = x(t) ). Therefore, the controller output u(t-1) can be obtained from one of the roots of a root-solving iterative algorithm.A quad-rotor rotorcraft dynamic model and inverted pendulum system are introduced to verify the U-state space control system design approach for MIMO/SIMO system. The linear design approach is used to determine the closed loop state equation, then the controller output can be obtained from root solver. Numerical examples and case studies are employed in this study to demonstrate the effectiveness of the proposed methods

UWE Bristol Research Repository