1,562 research outputs found

    Reinforcement learning-based approximate optimal control for attitude reorientation under state constraints

    Get PDF
    This paper addresses the attitude reorientation problems of rigid bodies under multiple state constraints. A novel reinforcement learning (RL)-based approximate optimal control method is proposed to make the trade-off between control cost and performance. The novelty lies in that it guarantees constraint handling abilities on attitude forbidden zones and angular-velocity limits. To achieve this, barrier functions are employed to encode the constraint information into the cost function. Then an RL-based learning strategy is developed to approximate the optimal cost function and control policy. A simplified critic-only neural network (NN) is employed to replace the conventional actor-critic structure once adequate data is collected online. This design guarantees the uniform boundedness of reorientation errors and NN weight estimation errors subject to the satisfaction of a finite excitation condition, which is a relaxation compared with the persistent excitation condition that is typically required for this class of problems. More importantly, all underlying state constraints are strictly obeyed during the online learning process. The effectiveness and advantages of the proposed controller are verified by both numerical simulations and experimental tests based on a comprehensive hardware-in-loop testbed

    Learning-based Predictive Control for Nonlinear Systems with Unknown Dynamics Subject to Safety Constraints

    Full text link
    Model predictive control (MPC) has been widely employed as an effective method for model-based constrained control. For systems with unknown dynamics, reinforcement learning (RL) and adaptive dynamic programming (ADP) have received notable attention to solve the adaptive optimal control problems. Recently, works on the use of RL in the framework of MPC have emerged, which can enhance the ability of MPC for data-driven control. However, the safety under state constraints and the closed-loop robustness are difficult to be verified due to approximation errors of RL with function approximation structures. Aiming at the above problem, we propose a data-driven robust MPC solution based on incremental RL, called data-driven robust learning-based predictive control (dr-LPC), for perturbed unknown nonlinear systems subject to safety constraints. A data-driven robust MPC (dr-MPC) is firstly formulated with a learned predictor. The incremental Dual Heuristic Programming (DHP) algorithm using an actor-critic architecture is then utilized to solve the online optimization problem of dr-MPC. In each prediction horizon, the actor and critic learn time-varying laws for approximating the optimal control policy and costate respectively, which is different from classical MPCs. The state and control constraints are enforced in the learning process via building a Hamilton-Jacobi-Bellman (HJB) equation and a regularized actor-critic learning structure using logarithmic barrier functions. The closed-loop robustness and safety of the dr-LPC are proven under function approximation errors. Simulation results on two control examples have been reported, which show that the dr-LPC can outperform the DHP and dr-MPC in terms of state regulation, and its average computational time is much smaller than that with the dr-MPC in both examples.Comment: The paper has been submitted at a IEEE Journal for possible publicatio

    Continual Learning-Based Optimal Output Tracking of Nonlinear Discrete-Time Systems with Constraints: Application to Safe Cargo Transfer

    Get PDF
    This Paper Addresses a Novel Lifelong Learning (LL)-Based Optimal Output Tracking Control of Uncertain Non-Linear Affine Discrete-Time Systems (DT) with State Constraints. First, to Deal with Optimal Tracking and Reduce the Steady State Error, a Novel Augmented System, Including Tracking Error and its Integral Value and Desired Trajectory, is Proposed. to Guarantee Safety, an Asymmetric Barrier Function (BF) is Incorporated into the Utility Function to Keep the Tracking Error in a Safe Region. Then, an Adaptive Neural Network (NN) Observer is Employed to Estimate the State Vector and the Control Input Matrix of the Uncertain Nonlinear System. Next, an NN-Based Actor-Critic Framework is Utilized to Estimate the Optimal Control Input and the Value Function by using the Estimated State Vector and Control Coefficient Matrix. to Achieve LL for a Multitask Environment in Order to Avoid the Catastrophic Forgetting Issue, the Exponential Weight Velocity Attenuation (EWVA) Scheme is Integrated into the Critic Update Law. Finally, the Proposed Tracker is Applied to a Safe Cargo/ Crew Transfer from a Large Cargo Ship to a Lighter Surface Effect Ship (SES) in Severe Sea Conditions

    Continual Reinforcement Learning Formulation For Zero-Sum Game-Based Constrained Optimal Tracking

    Get PDF
    This study provides a novel reinforcement learning-based optimal tracking control of partially uncertain nonlinear discrete-time (DT) systems with state constraints using zero-sum game (ZSG) formulation. To address optimal tracking, a novel augmented system consisting of tracking error and its integral value, along with an uncertain desired trajectory, is constructed. A barrier function (BF) with a tradeoff factor is incorporated into the cost function to keep the state trajectories to remain within a compact set and to balance safety with optimality. Next, by using the modified value functional, the ZSG formulation is introduced wherein an actor–critic neural network (NN) framework is employed to approximate the value functional, optimal control input, and worst disturbance. The critic NN weights are tuned once at the sample instants and then iteratively within sampling instants. Using control input errors, the actor NN weights are adjusted once a sampling instant. The concurrent learning term in the critic weight tuning law overcomes the need for the persistency excitation (PE) condition. Further, a weight consolidation scheme is incorporated into the critic update law to attain lifelong learning by overcoming catastrophic forgetting. Finally, a numerical example supports the analytical claims

    A brief review of neural networks based learning and control and their applications for robots

    Get PDF
    As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation

    Safety-aware model-based reinforcement learning using barrier transformation

    Get PDF
    The ability to learn and execute optimal control policies safely is critical to the realization of complex autonomy, especially where task restarts are not available and/or when the systems are safety-critical. Safety requirements are often expressed in terms of state and/or control constraints. Methods such as barrier transformation and control barrier functions have been successfully used for safe learning in systems under state constraints and/or control constraints, in conjunction with model-based reinforcement learning to learn the optimal control policy. However, existing barrier-based safe learning methods rely on fully known models and full state feedback. In this thesis, two different safe model-based reinforcement learning techniques are developed. One of the techniques utilizes a novel filtered concurrent learning method to realize simultaneous learning and control in the presence of model uncertainties for safety-critical systems, and the other technique utilizes a novel dynamic state estimator to realize simultaneous learning and control for safety-critical systems with a partially observable state. The applicability of the developed techniques is demonstrated through simulations, and to illustrate their effectiveness, comparative simulations are presented wherever alternate methods exist to solve the problem under consideration. The thesis concludes with a discussion about the limitations of the developed techniques. Extensions of the developed techniques are also proposed along with the possible approaches to achieve them
    • …
    corecore