46,504 research outputs found

    Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems

    Full text link
    This paper presents a novel method of global adaptive dynamic programming (ADP) for the adaptive optimal control of nonlinear polynomial systems. The strategy consists of relaxing the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is solved via a new policy iteration method. The proposed method distinguishes from previously known nonlinear ADP methods in that the neural network approximation is avoided, giving rise to significant computational improvement. Instead of semiglobally or locally stabilizing, the resultant control policy is globally stabilizing for a general class of nonlinear polynomial systems. Furthermore, in the absence of the a priori knowledge of the system dynamics, an online learning method is devised to implement the proposed policy iteration technique by generalizing the current ADP theory. Finally, three numerical examples are provided to validate the effectiveness of the proposed method.Comment: This is an updated version of the publication "Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems," in IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2917-2929, Nov. 2015. Few typos have been fixed in this versio

    Continuous-Time Robust Dynamic Programming

    Full text link
    This paper presents a new theory, known as robust dynamic pro- gramming, for a class of continuous-time dynamical systems. Different from traditional dynamic programming (DP) methods, this new theory serves as a fundamental tool to analyze the robustness of DP algorithms, and in par- ticular, to develop novel adaptive optimal control and reinforcement learning methods. In order to demonstrate the potential of this new framework, four illustrative applications in the fields of stochastic optimal control and adaptive DP are presented. Three numerical examples arising from both finance and engineering industries are also given, along with several possible extensions of the proposed framework

    Off-policy reinforcement learning for H∞ H_\infty control design

    Full text link
    The H∞H_\infty control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ H_\infty control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN) based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.Comment: Accepted by IEEE Transactions on Cybernetics. IEEE Transactions on Cybernetics, Online Available, 201

    Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming

    Full text link
    In this paper, a new on-line scheme is presented to design the optimal coordination control for the consensus problem of multi-agent differential games by fuzzy adaptive dynamic programming (FADP), which brings together game theory, generalized fuzzy hyperbolic model (GFHM) and adaptive dynamic programming. In general, the optimal coordination control for multi-agent differential games is the solution of the coupled Hamilton-Jacobi (HJ) equations. Here, for the first time, GFHMs are used to approximate the solution (value functions) of the coupled HJ equations, based on policy iteration (PI) algorithm. Namely, for each agent, GFHM is used to capture the mapping between the local consensus error and local value function. Since our scheme uses the single-network rchitecture for each agent (which eliminates the action network model compared with dual-network architecture), it is a more reasonable architecture for multi-agent systems. Furthermore, the approximation solution is utilized to obtain the optimal coordination controls. Finally, we give the stability analysis for our scheme, and prove the weight estimation error and the local consensus error are uniformly ultimately bounded. Further, the control node trajectory is proven to be cooperative uniformly ultimately bounded.Comment: 10 pages, 4 figure

    Data-based approximate policy iteration for nonlinear continuous-time optimal control design

    Full text link
    This paper addresses the model-free nonlinear optimal problem with generalized cost functional, and a data-based reinforcement learning technique is developed. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most of practical systems are too complicated to establish their accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than system model. Firstly, a model-free policy iteration algorithm is derived for constrained optimal control problem and its convergence is proved, which can learn the solution of HJB equation and optimal control policy without requiring any knowledge of system mathematical model. The implementation of the algorithm is based on the thought of actor-critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The whole data-based API method includes two parts, where the first part is implemented online to collect real system information, and the second part is conducting offline policy iteration to learn the solution of HJB equation and the control policy. Then, the data-based API algorithm is simplified for solving unconstrained optimal control problem of nonlinear and linear systems. Finally, we test the efficiency of the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.Comment: 22 pages, 21 figures, submitted for Peer Revie

    State Following (StaF) Kernel Functions for Function Approximation

    Full text link
    A function approximation method is developed that aims to approximate a function in a small neighborhood of a state that travels within a compact set. The development is based on the theory of universal reproducing kernel Hilbert spaces over the nn-dimensional Euclidean space. Several theorems are introduced that support the development of this State Following (StaF) method. In particular, it is shown that there is a bound on the number of kernel functions required for the maintenance of an accurate function approximation as a state moves through a compact set. Additionally, a weight update law, based on gradient descent, is introduced where arbitrarily close accuracy can be achieved provided the weight update law is iterated at a sufficient frequency, as detailed in Theorem 6.1. To illustrate the advantage, the impact of the StaF method is that for some applications the number of basis functions can be reduced. The StaF method is applied to an adaptive dynamic programming (ADP) application to demonstrate that stability is maintained with a reduced number of basis functions. Simulation results demonstrate the utility of the StaF methodology for the maintenance of accurate function approximation as well as solving an infinite horizon optimal regulation problem through ADP. The results of the simulation indicate that fewer basis functions are required to guarantee stability and approximate optimality than are required when a global approximation approach is used.Comment: 24 page

    Verification for Machine Learning, Autonomy, and Neural Networks Survey

    Full text link
    This survey presents an overview of verification techniques for autonomous systems, with a focus on safety-critical autonomous cyber-physical systems (CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances in artificial intelligence (AI) and machine learning (ML) through approaches such as deep neural networks (DNNs), embedded in so-called learning enabled components (LECs) that accomplish tasks from classification to control. Recently, the formal methods and formal verification community has developed methods to characterize behaviors in these LECs with eventual goals of formally verifying specifications for LECs, and this article presents a survey of many of these recent approaches

    Transcription Methods for Trajectory Optimization: a beginners tutorial

    Full text link
    This report is an introduction to transcription methods for trajectory optimization techniques. The first few sections describe the two classes of transcription methods (shooting \& simultaneous) that are used to convert the trajectory optimization problem into a general constrained optimization form. The middle of the report discusses a few extensions to the basic methods, including how to deal with hybrid systems (such as walking robots). The final section goes over a variety of implementation details.Comment: 14 pages, 9 figure

    Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives

    Full text link
    Particle Swarm Optimization (PSO) is a metaheuristic global optimization paradigm that has gained prominence in the last two decades due to its ease of application in unsupervised, complex multidimensional problems which cannot be solved using traditional deterministic algorithms. The canonical particle swarm optimizer is based on the flocking behavior and social co-operation of birds and fish schools and draws heavily from the evolutionary behavior of these organisms. This paper serves to provide a thorough survey of the PSO algorithm with special emphasis on the development, deployment and improvements of its most basic as well as some of the state-of-the-art implementations. Concepts and directions on choosing the inertia weight, constriction factor, cognition and social weights and perspectives on convergence, parallelization, elitism, niching and discrete optimization as well as neighborhood topologies are outlined. Hybridization attempts with other evolutionary and swarm paradigms in selected applications are covered and an up-to-date review is put forward for the interested reader.Comment: 34 pages, 7 table

    An Extremum-Seeking Co-Simulation Based Framework for Passivation Theory and its Application in Adaptive Cruise Control Systems

    Full text link
    In this report, we apply an input-output transformation passivation method, described in our previous works, to an Adaptive Cruise Control system. We analyze the system's performance under a co-simulation framework that makes use of an online optimization method called extremum-seeking to achieve the optimized behavior. The matrix for passivation method encompasses commonly used methods of series, feedback and feed-forward interconnections for passivating the system. We have previously shown that passivity levels can be guaranteed for a system using our passivation method. In this work, an extremum-seeking algorithm was used to determine the passivation parameters. It is known that systems with input-output time-delays are not passive. On the other hand, time-delays are unavoidable in automotive systems and commonly emerge in software implementations and communication units as well as driver's behavior. We show that by using our passivation method, we can passivate the system and improve its overall performance. Our simulation examples in CarSim and Simulink will show that the passive system has a considerably better performance.Comment: 39 pages, 18 figures, Technical Report at the University of Notre Dame, American Control Conference (ACC), 201
    • …
    corecore