1,011 research outputs found

    Approximate dynamic programming based solutions for fixed-final-time optimal control and optimal switching

    Get PDF
    Optimal solutions with neural networks (NN) based on an approximate dynamic programming (ADP) framework for new classes of engineering and non-engineering problems and associated difficulties and challenges are investigated in this dissertation. In the enclosed eight papers, the ADP framework is utilized for solving fixed-final-time problems (also called terminal control problems) and problems with switching nature. An ADP based algorithm is proposed in Paper 1 for solving fixed-final-time problems with soft terminal constraint, in which, a single neural network with a single set of weights is utilized. Paper 2 investigates fixed-final-time problems with hard terminal constraints. The optimality analysis of the ADP based algorithm for fixed-final-time problems is the subject of Paper 3, in which, it is shown that the proposed algorithm leads to the global optimal solution providing certain conditions hold. Afterwards, the developments in Papers 1 to 3 are used to tackle a more challenging class of problems, namely, optimal control of switching systems. This class of problems is divided into problems with fixed mode sequence (Papers 4 and 5) and problems with free mode sequence (Papers 6 and 7). Each of these two classes is further divided into problems with autonomous subsystems (Papers 4 and 6) and problems with controlled subsystems (Papers 5 and 7). Different ADP-based algorithms are developed and proofs of convergence of the proposed iterative algorithms are presented. Moreover, an extension to the developments is provided for online learning of the optimal switching solution for problems with modeling uncertainty in Paper 8. Each of the theoretical developments is numerically analyzed using different real-world or benchmark problems --Abstract, page v

    Model-Free δ\delta-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H∞\infty Tracking Control

    Full text link
    This paper presents a {\delta}-PI algorithm which is based on damped Newton method for the H{\infty} tracking control problem of unknown continuous-time nonlinear system. A discounted performance function and an augmented system are used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI equation is a nonlinear partial differential equation, traditional reinforcement learning methods for solving the tracking HJI equation are mostly based on the Newton method, which usually only satisfies local convergence and needs a good initial guess. Based upon the damped Newton iteration operator equation, a generalized tracking Bellman equation is derived firstly. The {\delta}-PI algorithm can seek the optimal solution of the tracking HJI equation by iteratively solving the generalized tracking Bellman equation. On-policy learning and off-policy learning {\delta}-PI reinforcement learning methods are provided, respectively. Off-policy version {\delta}-PI algorithm is a model-free algorithm which can be performed without making use of a priori knowledge of the system dynamics. NN-based implementation scheme for the off-policy {\delta}-PI algorithms is shown. The suitability of the model-free {\delta}-PI algorithm is illustrated with a nonlinear system simulation.Comment: 10 pages, 8 figure

    Optimal Adaptive Tracking Control Of Partially Uncertain Nonlinear Discrete-Time Systems Using Lifelong Hybrid Learning

    Get PDF
    This article addresses a multilayer neural network (MNN)-based optimal adaptive tracking of partially uncertain nonlinear discrete-time (DT) systems in affine form. By employing an actor–critic neural network (NN) to approximate the value function and optimal control policy, the critic NN is updated via a novel hybrid learning scheme, where its weights are adjusted once at a sampling instant and also in a finite iterative manner within the instants to enhance the convergence rate. Moreover, to deal with the persistency of excitation (PE) condition, a replay buffer is incorporated into the critic update law through concurrent learning. To address the vanishing gradient issue, the actor and critic MNN weights are tuned using control input and temporal difference errors (TDEs), respectively. In addition, a weight consolidation scheme is incorporated into the critic MNN update law to attain lifelong learning and overcome catastrophic forgetting, thus lowering the cumulative cost. The tracking error, and the actor and critic weight estimation errors are shown to be bounded using the Lyapunov analysis. Simulation results using the proposed approach on a two-link robot manipulator show a significant reduction in tracking error by 44%44\% and cumulative cost by 31%31\% in a multitask environment

    A brief review of neural networks based learning and control and their applications for robots

    Get PDF
    As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation

    Optimal Tracking Of Nonlinear Discrete-time Systems Using Zero-Sum Game Formulation And Hybrid Learning

    Get PDF
    This paper presents a novel hybrid learning-based optimal tracking method to address zero-sum game problems for partially uncertain nonlinear discrete-time systems. An augmented system and its associated discounted cost function are defined to address optimal tracking. Three multi-layer neural networks (NNs) are utilized to approximate the optimal control and the worst-case disturbance inputs, and the value function. The critic weights are tuned using the hybrid technique, whose weights are updated once at the sampling instants and in an iterative manner over finite times within the sampling instants. The proposed hybrid technique helps accelerate the convergence of the approximated value functional to its actual value, which makes the optimal policy attain quicker. A two-layer NN-based actor generates the optimal control input, and its weights are adjusted based on control input errors. Moreover, the concurrent learning method is utilized to ease the requirement of persistent excitation. Further, the Lyapunov method investigates the stability of the closed-loop system. Finally, the proposed method is evaluated on a two-link robot arm and demonstrates promising results
    • …
    corecore