Search CORE

4,736 research outputs found

Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

Author: Si Jennie
Wallace Brent A.
Publication venue
Publication date: 17/07/2023
Field of study

Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL algorithms, reveals they face significant design challenges due to their complexity, numerical conditioning, and dimensional scaling issues. Despite advanced theoretical results, existing ADP CT-RL synthesis methods are inadequate in solving even small, academic problems. The goal of this work is thus to introduce a suite of new CT-RL algorithms for control of affine nonlinear systems. Our design approach relies on two important factors. First, our methods are applicable to physical systems that can be partitioned into smaller subproblems. This constructive consideration results in reduced dimensionality and greatly improved intuitiveness of design. Second, we introduce a new excitation framework to improve persistence of excitation (PE) and numerical conditioning performance via classical input/output insights. Such a design-centric approach is the first of its kind in the ADP CT-RL community. In this paper, we progressively introduce a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms. We provide convergence and closed-loop stability guarantees, and we demonstrate these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV)

arXiv.org e-Print Archive

A brief review of neural networks based learning and control and their applications for robots

Author: Jiang Yiming
Li Guang
Li Yanan
Na Jing
Yang Chenguang
Zhong Junpei
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation

Crossref

Directory of Open Access Journals

Queen Mary Research Online

Sussex Research Online

Linear Hamilton Jacobi Bellman Equations in High Dimensions

Author: Burdick Joel W.
Damle Anil
Horowitz Matanya B.
Publication venue
Publication date: 21/09/2014
Field of study

The Hamilton Jacobi Bellman Equation (HJB) provides the globally optimal solution to large classes of control problems. Unfortunately, this generality comes at a price, the calculation of such solutions is typically intractible for systems with more than moderate state space size due to the curse of dimensionality. This work combines recent results in the structure of the HJB, and its reduction to a linear Partial Differential Equation (PDE), with methods based on low rank tensor representations, known as a separated representations, to address the curse of dimensionality. The result is an algorithm to solve optimal control problems which scales linearly with the number of states in a system, and is applicable to systems that are nonlinear with stochastic forcing in finite-horizon, average cost, and first-exit settings. The method is demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with system dimension two, six, and twelve respectively.Comment: 8 pages. Accepted to CDC 201

arXiv.org e-Print Archive

Crossref

Caltech Authors

Special Issue on Deep Reinforcement Learning and Adaptive Dynamic Programming

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/09/2019
Field of study

none5siThe sixteen papers in this special section focus on deep reinforcement learning and adaptive dynamic programming (deep RL/ADP). Deep RL is able to output control signal directly based on input images, which incorporates both the advantages of the perception of deep learning (DL) and the decision making of RL or adaptive dynamic programming (ADP). This mechanism makes the artificial intelligence much closer to human thinking modes. Deep RL/ADP has achieved remarkable success in terms of theory and applications since it was proposed. Successful applications cover video games, Go, robotics, smart driving, healthcare, and so on. However, it is still an open problem to perform the theoretical analysis on deep RL/ADP, e.g., the convergence, stability, and optimality analyses. The learning efficiency needs to be improved by proposing new algorithms or combined with other methods. More practical demonstrations are encouraged to be presented. Therefore, the aim of this special issue is to call for the most advanced research and state-of-the-art works in the field of deep RL/ADP.openZhao D.; Liu D.; Lewis F.L.; Principe J.C.; Squartini S.Zhao, D.; Liu, D.; Lewis, F. L.; Principe, J. C.; Squartini, S

IRIS UniversitÃ Politecnica delle Marche

Batch Policy Learning under Constraints

Author: Le Hoang M.
Voloshin Cameron
Yue Yisong
Publication venue
Publication date: 20/03/2019
Field of study

When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admits any batch reinforcement learning and online learning procedure as subroutines. We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints. To certify constraint satisfaction, we propose a new and simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We also show experimentally that our OPE method outperforms other popular OPE techniques on a standalone basis, especially in a high-dimensional setting

arXiv.org e-Print Archive

Caltech Authors

Approximate Dynamic Programming for Constrained Piecewise Affine Systems with Stability and Safety Guarantees

Author: Boom Ton van den
De Schutter Bart
He Kanghui
Shi Shengling
Publication venue
Publication date: 27/06/2023
Field of study

Infinite-horizon optimal control of constrained piecewise affine (PWA) systems has been approximately addressed by hybrid model predictive control (MPC), which, however, has computational limitations, both in offline design and online implementation. In this paper, we consider an alternative approach based on approximate dynamic programming (ADP), an important class of methods in reinforcement learning. We accommodate non-convex union-of-polyhedra state constraints and linear input constraints into ADP by designing PWA penalty functions. PWA function approximation is used, which allows for a mixed-integer encoding to implement ADP. The main advantage of the proposed ADP method is its online computational efficiency. Particularly, we propose two control policies, which lead to solving a smaller-scale mixed-integer linear program than conventional hybrid MPC, or a single convex quadratic program, depending on whether the policy is implicitly determined online or explicitly computed offline. We characterize the stability and safety properties of the closed-loop systems, as well as the sub-optimality of the proposed policies, by quantifying the approximation errors of value functions and policies. We also develop an offline mixed-integer linear programming-based method to certify the reliability of the proposed method. Simulation results on an inverted pendulum with elastic walls and on an adaptive cruise control problem validate the control performance in terms of constraint satisfaction and CPU time

arXiv.org e-Print Archive