46,504 research outputs found
Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems
This paper presents a novel method of global adaptive dynamic programming
(ADP) for the adaptive optimal control of nonlinear polynomial systems. The
strategy consists of relaxing the problem of solving the
Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is
solved via a new policy iteration method. The proposed method distinguishes
from previously known nonlinear ADP methods in that the neural network
approximation is avoided, giving rise to significant computational improvement.
Instead of semiglobally or locally stabilizing, the resultant control policy is
globally stabilizing for a general class of nonlinear polynomial systems.
Furthermore, in the absence of the a priori knowledge of the system dynamics,
an online learning method is devised to implement the proposed policy iteration
technique by generalizing the current ADP theory. Finally, three numerical
examples are provided to validate the effectiveness of the proposed method.Comment: This is an updated version of the publication "Global Adaptive
Dynamic Programming for Continuous-Time Nonlinear Systems," in IEEE
Transactions on Automatic Control, vol. 60, no. 11, pp. 2917-2929, Nov. 2015.
Few typos have been fixed in this versio
Continuous-Time Robust Dynamic Programming
This paper presents a new theory, known as robust dynamic pro- gramming, for
a class of continuous-time dynamical systems. Different from traditional
dynamic programming (DP) methods, this new theory serves as a fundamental tool
to analyze the robustness of DP algorithms, and in par- ticular, to develop
novel adaptive optimal control and reinforcement learning methods. In order to
demonstrate the potential of this new framework, four illustrative applications
in the fields of stochastic optimal control and adaptive DP are presented.
Three numerical examples arising from both finance and engineering industries
are also given, along with several possible extensions of the proposed
framework
Off-policy reinforcement learning for control design
The control design problem is considered for nonlinear systems
with unknown internal system model. It is known that the nonlinear
control problem can be transformed into solving the so-called
Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial
differential equation that is generally impossible to be solved analytically.
Even worse, model-based approaches cannot be used for approximately solving HJI
equation, when the accurate system model is unavailable or costly to obtain in
practice. To overcome these difficulties, an off-policy reinforcement leaning
(RL) method is introduced to learn the solution of HJI equation from real
system data instead of mathematical system model, and its convergence is
proved. In the off-policy RL method, the system data can be generated with
arbitrary policies rather than the evaluating policy, which is extremely
important and promising for practical systems. For implementation purpose, a
neural network (NN) based actor-critic structure is employed and a least-square
NN weight update algorithm is derived based on the method of weighted
residuals. Finally, the developed NN-based off-policy RL method is tested on a
linear F16 aircraft plant, and further applied to a rotational/translational
actuator system.Comment: Accepted by IEEE Transactions on Cybernetics. IEEE Transactions on
Cybernetics, Online Available, 201
Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming
In this paper, a new on-line scheme is presented to design the optimal
coordination control for the consensus problem of multi-agent differential
games by fuzzy adaptive dynamic programming (FADP), which brings together game
theory, generalized fuzzy hyperbolic model (GFHM) and adaptive dynamic
programming. In general, the optimal coordination control for multi-agent
differential games is the solution of the coupled Hamilton-Jacobi (HJ)
equations. Here, for the first time, GFHMs are used to approximate the solution
(value functions) of the coupled HJ equations, based on policy iteration (PI)
algorithm. Namely, for each agent, GFHM is used to capture the mapping between
the local consensus error and local value function. Since our scheme uses the
single-network rchitecture for each agent (which eliminates the action network
model compared with dual-network architecture), it is a more reasonable
architecture for multi-agent systems. Furthermore, the approximation solution
is utilized to obtain the optimal coordination controls. Finally, we give the
stability analysis for our scheme, and prove the weight estimation error and
the local consensus error are uniformly ultimately bounded. Further, the
control node trajectory is proven to be cooperative uniformly ultimately
bounded.Comment: 10 pages, 4 figure
Data-based approximate policy iteration for nonlinear continuous-time optimal control design
This paper addresses the model-free nonlinear optimal problem with
generalized cost functional, and a data-based reinforcement learning technique
is developed. It is known that the nonlinear optimal control problem relies on
the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a
nonlinear partial differential equation that is generally impossible to be
solved analytically. Even worse, most of practical systems are too complicated
to establish their accurate mathematical model. To overcome these difficulties,
we propose a data-based approximate policy iteration (API) method by using real
system data rather than system model. Firstly, a model-free policy iteration
algorithm is derived for constrained optimal control problem and its
convergence is proved, which can learn the solution of HJB equation and optimal
control policy without requiring any knowledge of system mathematical model.
The implementation of the algorithm is based on the thought of actor-critic
structure, where actor and critic neural networks (NNs) are employed to
approximate the control policy and cost function, respectively. To update the
weights of actor and critic NNs, a least-square approach is developed based on
the method of weighted residuals. The whole data-based API method includes two
parts, where the first part is implemented online to collect real system
information, and the second part is conducting offline policy iteration to
learn the solution of HJB equation and the control policy. Then, the data-based
API algorithm is simplified for solving unconstrained optimal control problem
of nonlinear and linear systems. Finally, we test the efficiency of the
data-based API control design method on a simple nonlinear system, and further
apply it to a rotational/translational actuator system. The simulation results
demonstrate the effectiveness of the proposed method.Comment: 22 pages, 21 figures, submitted for Peer Revie
State Following (StaF) Kernel Functions for Function Approximation
A function approximation method is developed that aims to approximate a
function in a small neighborhood of a state that travels within a compact set.
The development is based on the theory of universal reproducing kernel Hilbert
spaces over the -dimensional Euclidean space. Several theorems are
introduced that support the development of this State Following (StaF) method.
In particular, it is shown that there is a bound on the number of kernel
functions required for the maintenance of an accurate function approximation as
a state moves through a compact set. Additionally, a weight update law, based
on gradient descent, is introduced where arbitrarily close accuracy can be
achieved provided the weight update law is iterated at a sufficient frequency,
as detailed in Theorem 6.1.
To illustrate the advantage, the impact of the StaF method is that for some
applications the number of basis functions can be reduced. The StaF method is
applied to an adaptive dynamic programming (ADP) application to demonstrate
that stability is maintained with a reduced number of basis functions.
Simulation results demonstrate the utility of the StaF methodology for the
maintenance of accurate function approximation as well as solving an infinite
horizon optimal regulation problem through ADP. The results of the simulation
indicate that fewer basis functions are required to guarantee stability and
approximate optimality than are required when a global approximation approach
is used.Comment: 24 page
Verification for Machine Learning, Autonomy, and Neural Networks Survey
This survey presents an overview of verification techniques for autonomous
systems, with a focus on safety-critical autonomous cyber-physical systems
(CPS) and subcomponents thereof. Autonomy in CPS is enabling by recent advances
in artificial intelligence (AI) and machine learning (ML) through approaches
such as deep neural networks (DNNs), embedded in so-called learning enabled
components (LECs) that accomplish tasks from classification to control.
Recently, the formal methods and formal verification community has developed
methods to characterize behaviors in these LECs with eventual goals of formally
verifying specifications for LECs, and this article presents a survey of many
of these recent approaches
Transcription Methods for Trajectory Optimization: a beginners tutorial
This report is an introduction to transcription methods for trajectory
optimization techniques. The first few sections describe the two classes of
transcription methods (shooting \& simultaneous) that are used to convert the
trajectory optimization problem into a general constrained optimization form.
The middle of the report discusses a few extensions to the basic methods,
including how to deal with hybrid systems (such as walking robots). The final
section goes over a variety of implementation details.Comment: 14 pages, 9 figure
Particle Swarm Optimization: A survey of historical and recent developments with hybridization perspectives
Particle Swarm Optimization (PSO) is a metaheuristic global optimization
paradigm that has gained prominence in the last two decades due to its ease of
application in unsupervised, complex multidimensional problems which cannot be
solved using traditional deterministic algorithms. The canonical particle swarm
optimizer is based on the flocking behavior and social co-operation of birds
and fish schools and draws heavily from the evolutionary behavior of these
organisms. This paper serves to provide a thorough survey of the PSO algorithm
with special emphasis on the development, deployment and improvements of its
most basic as well as some of the state-of-the-art implementations. Concepts
and directions on choosing the inertia weight, constriction factor, cognition
and social weights and perspectives on convergence, parallelization, elitism,
niching and discrete optimization as well as neighborhood topologies are
outlined. Hybridization attempts with other evolutionary and swarm paradigms in
selected applications are covered and an up-to-date review is put forward for
the interested reader.Comment: 34 pages, 7 table
An Extremum-Seeking Co-Simulation Based Framework for Passivation Theory and its Application in Adaptive Cruise Control Systems
In this report, we apply an input-output transformation passivation method,
described in our previous works, to an Adaptive Cruise Control system. We
analyze the system's performance under a co-simulation framework that makes use
of an online optimization method called extremum-seeking to achieve the
optimized behavior. The matrix for passivation method encompasses commonly used
methods of series, feedback and feed-forward interconnections for passivating
the system. We have previously shown that passivity levels can be guaranteed
for a system using our passivation method. In this work, an extremum-seeking
algorithm was used to determine the passivation parameters. It is known that
systems with input-output time-delays are not passive. On the other hand,
time-delays are unavoidable in automotive systems and commonly emerge in
software implementations and communication units as well as driver's behavior.
We show that by using our passivation method, we can passivate the system and
improve its overall performance. Our simulation examples in CarSim and Simulink
will show that the passive system has a considerably better performance.Comment: 39 pages, 18 figures, Technical Report at the University of Notre
Dame, American Control Conference (ACC), 201
- β¦