Search CORE

46,504 research outputs found

Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems

Author: Jiang Yu
Jiang Zhong-Ping
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/01/2017
Field of study

This paper presents a novel method of global adaptive dynamic programming (ADP) for the adaptive optimal control of nonlinear polynomial systems. The strategy consists of relaxing the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is solved via a new policy iteration method. The proposed method distinguishes from previously known nonlinear ADP methods in that the neural network approximation is avoided, giving rise to significant computational improvement. Instead of semiglobally or locally stabilizing, the resultant control policy is globally stabilizing for a general class of nonlinear polynomial systems. Furthermore, in the absence of the a priori knowledge of the system dynamics, an online learning method is devised to implement the proposed policy iteration technique by generalizing the current ADP theory. Finally, three numerical examples are provided to validate the effectiveness of the proposed method.Comment: This is an updated version of the publication "Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2917-2929, Nov. 2015. Few typos have been fixed in this versio

arXiv.org e-Print Archive

Continuous-Time Robust Dynamic Programming

Author: Bian Tao
Jiang Zhong-Ping
Publication venue
Publication date: 16/09/2018
Field of study

This paper presents a new theory, known as robust dynamic pro- gramming, for a class of continuous-time dynamical systems. Different from traditional dynamic programming (DP) methods, this new theory serves as a fundamental tool to analyze the robustness of DP algorithms, and in par- ticular, to develop novel adaptive optimal control and reinforcement learning methods. In order to demonstrate the potential of this new framework, four illustrative applications in the fields of stochastic optimal control and adaptive DP are presented. Three numerical examples arising from both finance and engineering industries are also given, along with several possible extensions of the proposed framework

arXiv.org e-Print Archive

Off-policy reinforcement learning for $H_\infty$ control design

Author: Huang Tingwen
Luo Biao
Wu Huai-Ning
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/05/2014
Field of study

The

H_\infty

control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear

H_\infty

control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN) based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.Comment: Accepted by IEEE Transactions on Cybernetics. IEEE Transactions on Cybernetics, Online Available, 201

arXiv.org e-Print Archive

Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming

Author: Luo Yanhong
Yang Guang-Hong
Zhang Huaguang
Zhang Jilie
Publication venue
Publication date: 30/11/2017
Field of study

In this paper, a new on-line scheme is presented to design the optimal coordination control for the consensus problem of multi-agent differential games by fuzzy adaptive dynamic programming (FADP), which brings together game theory, generalized fuzzy hyperbolic model (GFHM) and adaptive dynamic programming. In general, the optimal coordination control for multi-agent differential games is the solution of the coupled Hamilton-Jacobi (HJ) equations. Here, for the first time, GFHMs are used to approximate the solution (value functions) of the coupled HJ equations, based on policy iteration (PI) algorithm. Namely, for each agent, GFHM is used to capture the mapping between the local consensus error and local value function. Since our scheme uses the single-network rchitecture for each agent (which eliminates the action network model compared with dual-network architecture), it is a more reasonable architecture for multi-agent systems. Furthermore, the approximation solution is utilized to obtain the optimal coordination controls. Finally, we give the stability analysis for our scheme, and prove the weight estimation error and the local consensus error are uniformly ultimately bounded. Further, the control node trajectory is proven to be cooperative uniformly ultimately bounded.Comment: 10 pages, 4 figure

arXiv.org e-Print Archive

Data-based approximate policy iteration for nonlinear continuous-time optimal control design

Author: Huang Tingwen
Liu Derong
Luo Biao
Wu Huai-Ning
Publication venue
Publication date: 02/11/2013
Field of study

This paper addresses the model-free nonlinear optimal problem with generalized cost functional, and a data-based reinforcement learning technique is developed. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most of practical systems are too complicated to establish their accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than system model. Firstly, a model-free policy iteration algorithm is derived for constrained optimal control problem and its convergence is proved, which can learn the solution of HJB equation and optimal control policy without requiring any knowledge of system mathematical model. The implementation of the algorithm is based on the thought of actor-critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The whole data-based API method includes two parts, where the first part is implemented online to collect real system information, and the second part is conducting offline policy iteration to learn the solution of HJB equation and the control policy. Then, the data-based API algorithm is simplified for solving unconstrained optimal control problem of nonlinear and linear systems. Finally, we test the efficiency of the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.Comment: 22 pages, 21 figures, submitted for Peer Revie

arXiv.org e-Print Archive

State Following (StaF) Kernel Functions for Function Approximation

Author: Dixon Warren E.
Kamalapurkar Rushikesh
Rosenfeld Joel A.
Publication venue
Publication date: 10/12/2015
Field of study

A function approximation method is developed that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the

n

-dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient descent, is introduced where arbitrarily close accuracy can be achieved provided the weight update law is iterated at a sufficient frequency, as detailed in Theorem 6.1. To illustrate the advantage, the impact of the StaF method is that for some applications the number of basis functions can be reduced. The StaF method is applied to an adaptive dynamic programming (ADP) application to demonstrate that stability is maintained with a reduced number of basis functions. Simulation results demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving an infinite horizon optimal regulation problem through ADP. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used.Comment: 24 page

arXiv.org e-Print Archive

Verification for Machine Learning, Autonomy, and Neural Networks Survey

Author: Hamilton Nathaniel
Johnson Taylor T.
Lopez Diego Manzanas
Musau Patrick
Rosenfeld Joel
Wild Ayana A.
Xiang Weiming
Yang Xiaodong
Publication venue
Publication date: 03/10/2018
Field of study

This survey presents an overview of verification techniques for autonomous systems, with a focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances in artificial intelligence (AI) and machine learning (ML) through approaches such as deep neural networks (DNNs), embedded in so-called learning enabled components (LECs) that accomplish tasks from classification to control. Recently, the formal methods and formal verification community has developed methods to characterize behaviors in these LECs with eventual goals of formally verifying specifications for LECs, and this article presents a survey of many of these recent approaches

arXiv.org e-Print Archive

Transcription Methods for Trajectory Optimization: a beginners tutorial

Author: Kelly Matthew P.
Publication venue
Publication date: 02/07/2017
Field of study

This report is an introduction to transcription methods for trajectory optimization techniques. The first few sections describe the two classes of transcription methods (shooting \& simultaneous) that are used to convert the trajectory optimization problem into a general constrained optimization form. The middle of the report discusses a few extensions to the basic methods, including how to deal with hybrid systems (such as walking robots). The final section goes over a variety of implementation details.Comment: 14 pages, 9 figure

arXiv.org e-Print Archive

Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives

Author: Basak Sanchita
Peters II Richard Alan
Sengupta Saptarshi
Publication venue: 'MDPI AG'
Publication date: 04/01/2019
Field of study

Particle Swarm Optimization (PSO) is a metaheuristic global optimization paradigm that has gained prominence in the last two decades due to its ease of application in unsupervised, complex multidimensional problems which cannot be solved using traditional deterministic algorithms. The canonical particle swarm optimizer is based on the flocking behavior and social co-operation of birds and fish schools and draws heavily from the evolutionary behavior of these organisms. This paper serves to provide a thorough survey of the PSO algorithm with special emphasis on the development, deployment and improvements of its most basic as well as some of the state-of-the-art implementations. Concepts and directions on choosing the inertia weight, constriction factor, cognition and social weights and perspectives on convergence, parallelization, elitism, niching and discrete optimization as well as neighborhood topologies are outlined. Hybridization attempts with other evolutionary and swarm paradigms in selected applications are covered and an up-to-date review is put forward for the interested reader.Comment: 34 pages, 7 table

arXiv.org e-Print Archive

An Extremum-Seeking Co-Simulation Based Framework for Passivation Theory and its Application in Adaptive Cruise Control Systems

Author: Antsaklis Panos J.
Rahnama Arash
Wang Shige
Xia Meng
Publication venue
Publication date: 13/07/2016
Field of study

In this report, we apply an input-output transformation passivation method, described in our previous works, to an Adaptive Cruise Control system. We analyze the system's performance under a co-simulation framework that makes use of an online optimization method called extremum-seeking to achieve the optimized behavior. The matrix for passivation method encompasses commonly used methods of series, feedback and feed-forward interconnections for passivating the system. We have previously shown that passivity levels can be guaranteed for a system using our passivation method. In this work, an extremum-seeking algorithm was used to determine the passivation parameters. It is known that systems with input-output time-delays are not passive. On the other hand, time-delays are unavoidable in automotive systems and commonly emerge in software implementations and communication units as well as driver's behavior. We show that by using our passivation method, we can passivate the system and improve its overall performance. Our simulation examples in CarSim and Simulink will show that the passive system has a considerably better performance.Comment: 39 pages, 18 figures, Technical Report at the University of Notre Dame, American Control Conference (ACC), 201

arXiv.org e-Print Archive