12,626 research outputs found
Data-based approximate policy iteration for nonlinear continuous-time optimal control design
This paper addresses the model-free nonlinear optimal problem with
generalized cost functional, and a data-based reinforcement learning technique
is developed. It is known that the nonlinear optimal control problem relies on
the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a
nonlinear partial differential equation that is generally impossible to be
solved analytically. Even worse, most of practical systems are too complicated
to establish their accurate mathematical model. To overcome these difficulties,
we propose a data-based approximate policy iteration (API) method by using real
system data rather than system model. Firstly, a model-free policy iteration
algorithm is derived for constrained optimal control problem and its
convergence is proved, which can learn the solution of HJB equation and optimal
control policy without requiring any knowledge of system mathematical model.
The implementation of the algorithm is based on the thought of actor-critic
structure, where actor and critic neural networks (NNs) are employed to
approximate the control policy and cost function, respectively. To update the
weights of actor and critic NNs, a least-square approach is developed based on
the method of weighted residuals. The whole data-based API method includes two
parts, where the first part is implemented online to collect real system
information, and the second part is conducting offline policy iteration to
learn the solution of HJB equation and the control policy. Then, the data-based
API algorithm is simplified for solving unconstrained optimal control problem
of nonlinear and linear systems. Finally, we test the efficiency of the
data-based API control design method on a simple nonlinear system, and further
apply it to a rotational/translational actuator system. The simulation results
demonstrate the effectiveness of the proposed method.Comment: 22 pages, 21 figures, submitted for Peer Revie
Off-policy reinforcement learning for control design
The control design problem is considered for nonlinear systems
with unknown internal system model. It is known that the nonlinear
control problem can be transformed into solving the so-called
Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial
differential equation that is generally impossible to be solved analytically.
Even worse, model-based approaches cannot be used for approximately solving HJI
equation, when the accurate system model is unavailable or costly to obtain in
practice. To overcome these difficulties, an off-policy reinforcement leaning
(RL) method is introduced to learn the solution of HJI equation from real
system data instead of mathematical system model, and its convergence is
proved. In the off-policy RL method, the system data can be generated with
arbitrary policies rather than the evaluating policy, which is extremely
important and promising for practical systems. For implementation purpose, a
neural network (NN) based actor-critic structure is employed and a least-square
NN weight update algorithm is derived based on the method of weighted
residuals. Finally, the developed NN-based off-policy RL method is tested on a
linear F16 aircraft plant, and further applied to a rotational/translational
actuator system.Comment: Accepted by IEEE Transactions on Cybernetics. IEEE Transactions on
Cybernetics, Online Available, 201
Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems
This paper presents a novel method of global adaptive dynamic programming
(ADP) for the adaptive optimal control of nonlinear polynomial systems. The
strategy consists of relaxing the problem of solving the
Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is
solved via a new policy iteration method. The proposed method distinguishes
from previously known nonlinear ADP methods in that the neural network
approximation is avoided, giving rise to significant computational improvement.
Instead of semiglobally or locally stabilizing, the resultant control policy is
globally stabilizing for a general class of nonlinear polynomial systems.
Furthermore, in the absence of the a priori knowledge of the system dynamics,
an online learning method is devised to implement the proposed policy iteration
technique by generalizing the current ADP theory. Finally, three numerical
examples are provided to validate the effectiveness of the proposed method.Comment: This is an updated version of the publication "Global Adaptive
Dynamic Programming for Continuous-Time Nonlinear Systems," in IEEE
Transactions on Automatic Control, vol. 60, no. 11, pp. 2917-2929, Nov. 2015.
Few typos have been fixed in this versio
A Separation-based Approach to Data-based Control for Large-Scale Partially Observed Systems
This paper studies the partially observed stochastic optimal control problem
for systems with state dynamics governed by partial differential equations
(PDEs) that leads to an extremely large problem. First, an open-loop
deterministic trajectory optimization problem is solved using a black-box
simulation model of the dynamical system. Next, a Linear Quadratic Gaussian
(LQG) controller is designed for the nominal trajectory-dependent linearized
system which is identified using input-output experimental data consisting of
the impulse responses of the optimized nominal system. A computational
nonlinear heat example is used to illustrate the performance of the proposed
approach.Comment: arXiv admin note: text overlap with arXiv:1705.09761,
arXiv:1707.0309
GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems
As we aim to control complex systems, use of a simulator in model-based
reinforcement learning is becoming more common. However, it has been
challenging to overcome the Reality Gap, which comes from nonlinear model bias
and susceptibility to disturbance. To address these problems, we propose a
novel algorithm that combines data-driven system identification approach
(Gaussian Process) with a Differential-Dynamic-Programming-based robust optimal
control method (Iterative Linear Quadratic Control). Our algorithm uses the
simulator's model as the mean function for a Gaussian Process and learns only
the difference between the simulator's prediction and actual observations,
making it a natural hybrid of simulation and real-world observation. We show
that our approach quickly corrects incorrect models, comes up with robust
optimal controllers, and transfers its acquired model knowledge to new tasks
efficiently
Robust Policy Iteration for Continuous-time Linear Quadratic Regulation
This paper studies the robustness of policy iteration in the context of
continuous-time infinite-horizon linear quadratic regulation (LQR) problem. It
is shown that Kleinman's policy iteration algorithm is inherently robust to
small disturbances and enjoys local input-to-state stability in the sense of
Sontag. More precisely, whenever the disturbance-induced input term in each
iteration is bounded and small, the solutions of the policy iteration algorithm
are also bounded and enter a small neighborhood of the optimal solution of the
LQR problem. Based on this result, an off-policy data-driven policy iteration
algorithm for the LQR problem is shown to be robust when the system dynamics
are subjected to small additive unknown bounded disturbances. The theoretical
results are validated by a numerical example
Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems
This paper addresses the problem of learning the optimal control policy for a
nonlinear stochastic dynamical system with continuous state space, continuous
action space and unknown dynamics. This class of problems are typically
addressed in stochastic adaptive control and reinforcement learning literature
using model-based and model-free approaches respectively. Both methods rely on
solving a dynamic programming problem, either directly or indirectly, for
finding the optimal closed loop control policy. The inherent `curse of
dimensionality' associated with dynamic programming method makes these
approaches also computationally difficult.
This paper proposes a novel decoupled data-based control (D2C) algorithm that
addresses this problem using a decoupled, `open loop - closed loop', approach.
First, an open-loop deterministic trajectory optimization problem is solved
using a black-box simulation model of the dynamical system. Then, a closed loop
control is developed around this open loop trajectory by linearization of the
dynamics about this nominal trajectory. By virtue of linearization, a linear
quadratic regulator based algorithm can be used for this closed loop control.
We show that the performance of D2C algorithm is approximately optimal.
Moreover, simulation performance suggests significant reduction in training
time compared to other state of the art algorithms
Reinforcement Learning for Batch Bioprocess Optimization
Bioprocesses have received a lot of attention to produce clean and
sustainable alternatives to fossil-based materials. However, they are generally
difficult to optimize due to their unsteady-state operation modes and
stochastic behaviours. Furthermore, biological systems are highly complex,
therefore plant-model mismatch is often present. To address the aforementioned
challenges we propose a Reinforcement learning based optimization strategy for
batch processes.
In this work, we applied the Policy Gradient method from batch-to-batch to
update a control policy parametrized by a recurrent neural network. We assume
that a preliminary process model is available, which is exploited to obtain a
preliminary optimal control policy. Subsequently, this policy is updatedbased
on measurements from thetrueplant. The capabilities of our proposed approach
were tested on three case studies (one of which is nonsmooth) using a more
complex process model for thetruesystemembedded with adequate process
disturbance. Lastly, we discussed the advantages and disadvantages of this
strategy compared against current existing approaches such as nonlinear model
predictive control
Sample Efficient Path Integral Control under Uncertainty
We present a data-driven optimal control framework that can be viewed as a
generalization of the path integral (PI) control approach. We find iterative
feedback control laws without parameterization based on probabilistic
representation of learned dynamics model. The proposed algorithm operates in a
forward-backward manner which differentiate from other PI-related methods that
perform forward sampling to find optimal controls. Our method uses
significantly less samples to find optimal controls compared to other
approaches within the PI control family that relies on extensive sampling from
given dynamics models or trials on physical systems in model-free fashions. In
addition, the learned controllers can be generalized to new tasks without
re-sampling based on the compositionality theory for the linearly-solvable
optimal control framework. We provide experimental results on three different
systems and comparisons with state-of-the-art model-based methods to
demonstrate the efficiency and generalizability of the proposed framework
Approximate Optimal Trajectory Tracking for Continuous Time Nonlinear Systems
Approximate dynamic programming has been investigated and used as a method to
approximately solve optimal regulation problems. However, the extension of this
technique to optimal tracking problems for continuous time nonlinear systems
has remained a non-trivial open problem. The control development in this paper
guarantees ultimately bounded tracking of a desired trajectory, while also
ensuring that the controller converges to an approximate optimal policy
- …