133 research outputs found

    Model-based Reinforcement Learning of Nonlinear Dynamical Systems

    Get PDF
    Model-based Reinforcement Learning (MBRL) techniques accelerate the learning task by employing a transition model to make predictions. In this dissertation, we present novel techniques for online learning of unknown dynamics by iteratively computing a feedback controller based on the most recent update of the model. Assuming a structured continuous-time model of the system in terms of a set of bases, we formulate an infinite horizon optimal control problem addressing a given control objective. The structure of the system along with a value function parameterized in the quadratic form provides flexibility in analytically calculating an update rule for the parameters. Hence, a matrix differential equation of the parameters is obtained, where the solution is used to characterize the optimal feedback control in terms of the bases, at any time step. Moreover, the quadratic form of the value function suggests a compact way of updating the parameters that considerably decreases the computational complexity. In the convergence analysis, we demonstrate asymptotic stability and optimality of the obtained learning algorithm around the equilibrium by revealing its connections with the analogous Linear Quadratic Regulator (LQR). Moreover, the results are extended to the trajectory tracking problem. Assuming a structured unknown nonlinear system augmented with the dynamics of a commander system, we obtain a control rule minimizing a given quadratic tracking objective function. Furthermore, in an alternative technique for learning, a piecewise nonlinear affine framework is developed for controlling nonlinear systems with unknown dynamics. Therefore, we extend the results to obtain a general piecewise nonlinear framework where each piece is responsible for locally learning and controlling over some partition of the domain. Then, we consider the Piecewise Affine (PWA) system with a bounded uncertainty as a special case, for which we suggest an optimization-based verification technique. Accordingly, given a discretization of the learned PWA system, we iteratively search for a common piecewise Lyapunov function in a set of positive definite functions, where a non-monotonic convergence is allowed. Then, this Lyapunov candidate is verified for the uncertain system. To demonstrate the applicability of the approaches presented in this dissertation, simulation results on benchmark nonlinear systems are included, such as quadrotor, vehicle, etc. Moreover, as another detailed application, we investigate the Maximum Power Point Tracking (MPPT) problem of solar Photovoltaic (PV) systems. Therefore, we develop an analytical nonlinear optimal control approach that assumes a known model. Then, we apply the obtained nonlinear optimal controller together with the piecewise MBRL technique presented previously

    Algorithm for Optimal Mode Scheduling in Switched Systems

    Get PDF
    This paper considers the problem of computing the schedule of modes in a switched dynamical system, that minimizes a cost functional defined on the trajectory of the system's continuous state variable. A recent approach to such optimal control problems consists of algorithms that alternate between computing the optimal switching times between modes in a given sequence, and updating the mode-sequence by inserting to it a finite number of new modes. These algorithms have an inherent inefficiency due to their sparse update of the mode-sequences, while spending most of the computing times on optimizing with respect to the switching times for a given mode-sequence. This paper proposes an algorithm that operates directly in the schedule space without resorting to the timing optimization problem. It is based on the Armijo step size along certain Gateaux derivatives of the performance functional, thereby avoiding some of the computational difficulties associated with discrete scheduling parameters. Its convergence to local minima as well as its rate of convergence are proved, and a simulation example on a nonlinear system exhibits quite a fast convergence

    Dynamic Programming and Time-Varying Delay Systems

    Get PDF
    This thesis is divided into two separate parts. The first part is about Dynamic Programming for non-trivial optimal control problems. The second part introduces some useful tools for analysis of stability and performance of systems with time-varying delays. The two papers presented in the first part attacks optimal control problems with finite but rapidly increasing search space. In the first paper we try it reduce the complexity of the optimization by exploiting the structure of a certain problem. The result, if found, is an optimal solution. The second paper introduces a new general approach of relaxing the optimality constraint. The main contribution of the paper is an extension of the Bellman equality to a double inequality. This inequality is a sufficient condition for a suboptimal solution to be within a certain distance to the optimal solution. The main approach of solving the inequality in the paper is value iteration, which is shown to work well in many different applications. In the second part of the thesis, two analysis methods for systems with time-varying delays are presented in two papers. The first paper presents a set of simple graphical stability (and performance) criteria when the delays are bounded but otherwise unknown. All that is needed to verify stability is a Bode diagram of the closed loop system. For more exact computations, the last paper presents a toolbox for Matlab called Jitterbug. It calculates quadratic costs and power spectral densities of interconnected continuous-time and discrete-time linear systems. The main contribution of the toolbox is to make well known theory easily applicable for analysis of real-time systems

    1-Bit processing based model predictive control for fractionated satellite missions

    Get PDF
    In this thesis, a 1-bit processing based Model Predictive Control (OBMPC) structure is proposed for a fractionated satellite attitude control mission. Despite the appealing advantages of the MPC algorithm towards constrained MIMO control applications, implementing the MPC algorithm onboard a small satellite is certainly challenging due to the limited onboard resources. The proposed design is based on the 1-bit processing concept, which takes advantage of the affine relation between the 1-bit state feedback and multi-bit parameters to implement a multiplier free MPC controller. As multipliers are the major power consumer in online optimization, the OBMPC structure is proven to be more efficient in comparison to the conventional MPC implementation in term of power and circuit complexity. The system is in digital control nature, affected by quantization noise introduced by Δ∑ modulators. The stability issues and practical design criteria are also discussed in this work. Some other aspects are considered in this work to complete the control system. Firstly, the implementation of the OBMPC system relies on the 1-bit state feedbacks. Hence, 1-bit sensing components are needed to implement the OBMPC system. While the ∆∑ modulator based Microelectromechanical systems (MEMS) gyroscope is considered in this work, it is possible to implement this concept into other sensing components. Secondly, as the proposed attitude mission is based on the wireless inter-satellite link (ISL), a state estimator is required. However, conventional state estimators will once again introduce multi-bit signals, and compromise the simple, direct implementation of the OBMPC controller. Therefore, the 1-bit state estimator is also designed in this work to satisfy the requirements of the proposed fractionated attitude control mission. The simulation for the OBMPC is based on a 2U CubeSat model in a fractionated satellite structure, in which the payload and actuators are separated from the controller and controlled via the ISL. Matlab simulations and FPGA implementation based performance analysis shows that the OBMPC is feasible for fractionated satellite missions and is advantageous over the conventional MPC controllers

    1-Bit processing based model predictive control for fractionated satellite missions

    Get PDF
    In this thesis, a 1-bit processing based Model Predictive Control (OBMPC) structure is proposed for a fractionated satellite attitude control mission. Despite the appealing advantages of the MPC algorithm towards constrained MIMO control applications, implementing the MPC algorithm onboard a small satellite is certainly challenging due to the limited onboard resources. The proposed design is based on the 1-bit processing concept, which takes advantage of the affine relation between the 1-bit state feedback and multi-bit parameters to implement a multiplier free MPC controller. As multipliers are the major power consumer in online optimization, the OBMPC structure is proven to be more efficient in comparison to the conventional MPC implementation in term of power and circuit complexity. The system is in digital control nature, affected by quantization noise introduced by Δ∑ modulators. The stability issues and practical design criteria are also discussed in this work. Some other aspects are considered in this work to complete the control system. Firstly, the implementation of the OBMPC system relies on the 1-bit state feedbacks. Hence, 1-bit sensing components are needed to implement the OBMPC system. While the ∆∑ modulator based Microelectromechanical systems (MEMS) gyroscope is considered in this work, it is possible to implement this concept into other sensing components. Secondly, as the proposed attitude mission is based on the wireless inter-satellite link (ISL), a state estimator is required. However, conventional state estimators will once again introduce multi-bit signals, and compromise the simple, direct implementation of the OBMPC controller. Therefore, the 1-bit state estimator is also designed in this work to satisfy the requirements of the proposed fractionated attitude control mission. The simulation for the OBMPC is based on a 2U CubeSat model in a fractionated satellite structure, in which the payload and actuators are separated from the controller and controlled via the ISL. Matlab simulations and FPGA implementation based performance analysis shows that the OBMPC is feasible for fractionated satellite missions and is advantageous over the conventional MPC controllers

    H∞ Suboptimal Tracking Control for Bilinear Power Converter Systems with Dynamic Feedback - Theory and Experiment

    Get PDF
    In this thesis, bilinear power converters are considered that arise for state-averaged models in continuous conduction mode. Since such power converters are often not feedback linearizable with respect to the output to be controlled,they are an interesting and demanding class of control systems. One control objective for the considered system class is to include trajectory tracking in the system equations. With a state and input transformation into the so called error system representation, where the error between real variables and reference variables is considered, the error system equations show to be time-varying. Another objective is to cope with disturbances, noise, parameter uncertainties, etc. Therefore, integral feedback is included in the feedback strategy, which leads to input-affine systems with a special structure due to the originally bilinear system equations. A slightly different strategy is a disturbance feedback approach. It addresses the same control objectives, is structurally similar to integral feedback and allows for more freedom in choice of feedback design parameters. However, it is less general and requires online-replanning of the reference trajectory. For state feedback design, we choose H∞ control with a quadratic performance functional since we want to have low control effort and want to keep the error of the output to be controlled small in case of appearing disturbances. Finally, so as to address stability properties in the closed-loop, integral Input-to-State Stability (iISS) theory is a good choice to cope with nonzero disturbances. In order to guarantee stability for the closed-loop system in the presence of disturbances, we link the solution of the nonlinear H control problem with iISS. It is possible to derive conditions, when the suboptimal state feedback H∞ control problem for the bilinear power converter systems with integral feedback / disturbance feedback and trajectory tracking can be solved. At the same time, it can be shown that the closed-loop systems is iISS. To underline the generality of the approach, the obtained theory for bilinear power converter systems is extended to general bilinear systems and it is even possible to discuss the more demanding multiple-input case. Equipped with the required theory to solve the posed control problem, we address the experimental setup of a boost converter / DC motor system. Here, the control task is to track the angular velocity of the motor shaft and attenuate appearing load disturbances. Therefore, we implement disturbance feedback and proof boundedness of trajectories for the online-replanning of the approximate trajectory generation method. Various experiments are presented in order to investigate the applicability of the approach.In der vorliegenden Dissertation werden bilineare Leistungskonvertersysteme untersucht, wie sie für Modellgleichungen mit gemittelten Zuständen im kontinuierlichen Betrieb (engl. "continuous conduction mode")auftreten. Da eine große Zahl dieser Leistungskonverter nicht eingangs-zustandslinearisierbar hinsichtlich des Regelausgangs und dann oft sogar nicht-minimalphasig sind, zählen sie zur Klasse der schwierig zu regelnden Systeme. Ein Regelungsziel für die betrachtete Systemklasse ist die Berücksichtigung von Referenztrajektorien für einen Wunschausgang des Systemmodells. Dazu wird ein sogenanntes Fehlersystem eingeführt, das die Differenz zwischen tatsächlichen Größen und Referenzgrößen widerspiegelt. Aufgrund der Bilinearität der ursprünglichen Modellgleichung ist dieses Fehlersystem dann zeitvariant. Ein weiteres Ziel ist das Ausregeln von auftretenden Störungen, Messrauschen, Modellunsicherheiten, usw., was üblicherweise anhand eines Integratoranteils (kurz: I-Anteils) im Regelgesetz berücksichtigt wird. Ein I-Anteil ist eine dynamische Erweiterung der Zustandsgleichungen und führt zu einem zusätzlichen Zustand. Damit die zusätzliche Differentialgleichung nicht entkoppelt vorliegt, muss mit einer geeigneten Eingangstransformation dafür gesorgt werden, dass der Integriererzustand im Regelgesetz vorkommt. Dadurch wird jedoch die ursprüngliche Bilinearität der Gleichungen zerstört, so dass am Ende ein eingangsaffines System vorliegt, das aber natürlich aufgrund der Bilinearität der ursprünglichen Systemgleichungen eine spezifische Struktur aufweist. Eine ähnliche Herangehensweise wie beim I-Anteil ermöglicht die Schätzung und Rückführung der Störung, womit dieselben Regelungsziele verfolgt werden wie bei der Variante mit dem I-Anteil. Hier führt die dynamische Erweiterung mit dem Schätzer im Gegensatz zum I-Anteil allerdings wieder auf eine bilineare Systemgleichung. Allerdings ist dieser Ansatz weniger allgemein und erfordert eine Neuplanung der Referenztrajektorien in Echtzeit, birgt aber mehr Freiheiten in der Wahl der Reglerparameter für den geschlossenen Regelkreis. Als Rückführstrategie wird eine H∞-Zustandsregelung gewählt, um auftretenden Störungen mit möglichst minimalem Stellaufwand auszuregeln. Außerdem soll gleichzeitig der Fehler des Regelausgangs klein gehalten werden. Um schließlich die Stabilität des geschlossenen Regelkreises für nichtverschwindende Störungen untersuchen zu können, wird die sogenannten integral Input-to-State Stability (iISS) verwendet. Als Ergebnis der Arbeit können Bedingungen formuliert werden, wann eine suboptimale H∞-Zustandsregelung gefunden werden kann. Unter Annahme dieser Bedingungen folgt dann sofort die iISS-Eigenschaft des geschlossenen Regelkreises. Die Allgemeinheit des Verfahrens zeigt sich dadurch, dass es sogar möglich ist, den vorgestellten Ansatz auf allgemeine bilineare Systeme mit mehreren Eingängen zu erweitern. Das experimentelle Beispiel eines Hochsetzstellers in Kombination mit einem Gleichstrommotor wird dann zum Testen des Regelentwurfsverfahrens herangezogen. Dabei ist die Regelungsaufgabe, die Winkelgeschwindigkeit der Motorwelle einer vorgegeben Referenztrajektorie nachfahren zu lassen und auftretende Laststörungenauszuregeln. Dazu wurde die Variante der dynamischen Erweiterung anhand der Rückführung der Störung mit Trajektorienneuplanung verwendet. Mit einer suboptimalen H∞-Zustandsregelung wird der Regelkreis geschlossen, so dass iISS gewährleistet werden kann. Für die Echtzeitgenerierung der durch ein Approximationsverfahren ermöglichten Trajektorienneuplanung wird außerdem Beschränktheit gezeigt. Eine Vielzahl von Experimenten dient der genaueren Untersuchung des Verfahrens

    Numerical Methods for Nonlinear Optimal Control Problems and Their Applications in Indoor Climate Control

    Get PDF
    Efficiency, comfort, and convenience are three major aspects in the design of control systems for residential Heating, Ventilation, and Air Conditioning (HVAC) units. In this dissertation, we study optimization-based algorithms for HVAC control that minimizes energy consumption while maintaining a desired temperature, or even human comfort in a room. Our algorithm uses a Computer Fluid Dynamics (CFD) model, mathematically formulated using Partial Differential Equations (PDEs), to describe the interactions between temperature, pressure, and air flow. Our model allows us to naturally formulate problems such as controlling the temperature of a small region of interest within a room, or to control the speed of the air flow at the vents, which are hard to describe using finite-dimensional Ordinary Partial Differential (ODE) models. Our results show that our HVAC control algorithms produce significant energy savings without a decrease in comfort. Also, we formulate a gradient-based estimation algorithm capable of reconstructing the states of doors in a building, as well as its temperature distribution, based on a floor plan and a set of thermostats. The estimation algorithm solves in real time a convection-diffusion CFD model for the air flow in the building as a function of its geometric configuration. We formulate the estimation algorithm as an optimization problem, and we solve it by computing the adjoint equations of our CFD model, which we then use to obtain the gradients of the cost function with respect to the flow’s temperature and door states. We evaluate the performance of our method using simulations of a real apartment in the St. Louis area. Our results show that the estimation method is both efficient and accurate, establishing its potential for the design of smarter control schemes in the operation of high-performance buildings. The optimization problems we generate for HVAC system\u27s control and estimation are large-scale optimal control problem. While some optimal control problems can be efficiently solved using algebraic or convex methods, most general forms of optimal control must be solved using memory-expensive numerical methods. In this dissertation we present theoretical formulations and corresponding numerical algorithms that can find optimal inputs for general dynamical systems by using direct methods. The results show these algorithms\u27 performance and potentials to be applied to solve large-scale nonlinear optimal control problem in real time
    corecore