4,736 research outputs found
Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees
Continuous-time nonlinear optimal control problems hold great promise in
real-world applications. After decades of development, reinforcement learning
(RL) has achieved some of the greatest successes as a general nonlinear control
design method. However, a recent comprehensive analysis of state-of-the-art
continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming
(ADP)-based CT-RL algorithms, reveals they face significant design challenges
due to their complexity, numerical conditioning, and dimensional scaling
issues. Despite advanced theoretical results, existing ADP CT-RL synthesis
methods are inadequate in solving even small, academic problems. The goal of
this work is thus to introduce a suite of new CT-RL algorithms for control of
affine nonlinear systems. Our design approach relies on two important factors.
First, our methods are applicable to physical systems that can be partitioned
into smaller subproblems. This constructive consideration results in reduced
dimensionality and greatly improved intuitiveness of design. Second, we
introduce a new excitation framework to improve persistence of excitation (PE)
and numerical conditioning performance via classical input/output insights.
Such a design-centric approach is the first of its kind in the ADP CT-RL
community. In this paper, we progressively introduce a suite of (decentralized)
excitable integral reinforcement learning (EIRL) algorithms. We provide
convergence and closed-loop stability guarantees, and we demonstrate these
guarantees on a significant application problem of controlling an unstable,
nonminimum phase hypersonic vehicle (HSV)
A brief review of neural networks based learning and control and their applications for robots
As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation
Linear Hamilton Jacobi Bellman Equations in High Dimensions
The Hamilton Jacobi Bellman Equation (HJB) provides the globally optimal
solution to large classes of control problems. Unfortunately, this generality
comes at a price, the calculation of such solutions is typically intractible
for systems with more than moderate state space size due to the curse of
dimensionality. This work combines recent results in the structure of the HJB,
and its reduction to a linear Partial Differential Equation (PDE), with methods
based on low rank tensor representations, known as a separated representations,
to address the curse of dimensionality. The result is an algorithm to solve
optimal control problems which scales linearly with the number of states in a
system, and is applicable to systems that are nonlinear with stochastic forcing
in finite-horizon, average cost, and first-exit settings. The method is
demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with
system dimension two, six, and twelve respectively.Comment: 8 pages. Accepted to CDC 201
Special Issue on Deep Reinforcement Learning and Adaptive Dynamic Programming
none5siThe sixteen papers in this special section focus on deep reinforcement learning and adaptive dynamic programming (deep RL/ADP). Deep RL is able to output control signal directly based on input images, which incorporates both the advantages of the perception of deep learning (DL) and the decision making of RL or adaptive dynamic programming (ADP). This mechanism makes the artificial intelligence much closer to human thinking modes. Deep RL/ADP has achieved remarkable success in terms of theory and applications since it was proposed. Successful applications cover video games, Go, robotics, smart driving, healthcare, and so on. However, it is still an open problem to perform the theoretical analysis on deep RL/ADP, e.g., the convergence, stability, and optimality analyses. The learning efficiency needs to be improved by proposing new algorithms or combined with other methods. More practical demonstrations are encouraged to be presented. Therefore, the aim of this special issue is to call for the most advanced research and state-of-the-art works in the field of deep RL/ADP.openZhao D.; Liu D.; Lewis F.L.; Principe J.C.; Squartini S.Zhao, D.; Liu, D.; Lewis, F. L.; Principe, J. C.; Squartini, S
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
Approximate Dynamic Programming for Constrained Piecewise Affine Systems with Stability and Safety Guarantees
Infinite-horizon optimal control of constrained piecewise affine (PWA)
systems has been approximately addressed by hybrid model predictive control
(MPC), which, however, has computational limitations, both in offline design
and online implementation. In this paper, we consider an alternative approach
based on approximate dynamic programming (ADP), an important class of methods
in reinforcement learning. We accommodate non-convex union-of-polyhedra state
constraints and linear input constraints into ADP by designing PWA penalty
functions. PWA function approximation is used, which allows for a mixed-integer
encoding to implement ADP. The main advantage of the proposed ADP method is its
online computational efficiency. Particularly, we propose two control policies,
which lead to solving a smaller-scale mixed-integer linear program than
conventional hybrid MPC, or a single convex quadratic program, depending on
whether the policy is implicitly determined online or explicitly computed
offline. We characterize the stability and safety properties of the closed-loop
systems, as well as the sub-optimality of the proposed policies, by quantifying
the approximation errors of value functions and policies. We also develop an
offline mixed-integer linear programming-based method to certify the
reliability of the proposed method. Simulation results on an inverted pendulum
with elastic walls and on an adaptive cruise control problem validate the
control performance in terms of constraint satisfaction and CPU time
- …