294 research outputs found

    Near-Optimal Control of a Quadcopter Using Reinforcement Learning

    Get PDF
    This paper presents a novel control method for quadcopters that achieves near-optimal tracking control for input-affine nonlinear quadcopter dynamics. The method uses a reinforcement learning algorithm called Single Network Adaptive Critics (SNAC), which approximates a solution to the discrete-time Hamilton-Jacobi-Bellman (DT-HJB) equation using a single neural network trained offline. The control method involves two SNAC controllers, with the outer loop controlling the linear position and velocities (position control) and the inner loop controlling the angular position and velocities (attitude control). The resulting quadcopter controller provides optimal feedback control and tracks a trajectory for an infinite-horizon, and it is compared with commercial optimal control software. Furthermore, the closed-loop controller can control the system with any initial conditions within the domain of training. Overall, this research demonstrates the benefits of using SNAC for nonlinear control, showing its ability to achieve near-optimal tracking control while reducing computational complexity. This paper provides insights into a new approach for controlling quadcopters, with potential applications in various fields such as aerial surveillance, delivery, and search and rescue

    Issues on Stability of ADP Feedback Controllers for Dynamical Systems

    Get PDF
    This paper traces the development of neural-network (NN)-based feedback controllers that are derived from the principle of adaptive/approximate dynamic programming (ADP) and discusses their closed-loop stability. Different versions of NN structures in the literature, which embed mathematical mappings related to solutions of the ADP-formulated problems called “adaptive critics” or “action-critic” networks, are discussed. Distinction between the two classes of ADP applications is pointed out. Furthermore, papers in “model-free” development and model-based neurocontrollers are reviewed in terms of their contributions to stability issues. Recent literature suggests that work in ADP-based feedback controllers with assured stability is growing in diverse forms

    A brief review of neural networks based learning and control and their applications for robots

    Get PDF
    As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation

    Data-Driven Model-Free Sliding Mode and Fuzzy Control with Experimental Validation

    Get PDF
    The paper presents the combination of the model-free control technique with two popular nonlinear control techniques, sliding mode control and fuzzy control. Two data-driven model-free sliding mode control structures and one data-driven model-free fuzzy control structure are given. The data-driven model-free sliding mode control structures are built upon a model-free intelligent Proportional-Integral (iPI) control system structure, where an augmented control signal is inserted in the iPI control law to deal with the error dynamics in terms of sliding mode control. The data-driven model-free fuzzy control structure is developed by fuzzifying the PI component of the continuous-time iPI control law. The design approaches of the data-driven model-free control algorithms are offered. The data-driven model-free control algorithms are validated as controllers by real-time experiments conducted on 3D crane system laboratory equipment

    Adaptive dynamic programming with eligibility traces and complexity reduction of high-dimensional systems

    Get PDF
    This dissertation investigates the application of a variety of computational intelligence techniques, particularly clustering and adaptive dynamic programming (ADP) designs especially heuristic dynamic programming (HDP) and dual heuristic programming (DHP). Moreover, a one-step temporal-difference (TD(0)) and n-step TD (TD(λ)) with their gradients are utilized as learning algorithms to train and online-adapt the families of ADP. The dissertation is organized into seven papers. The first paper demonstrates the robustness of model order reduction (MOR) for simulating complex dynamical systems. Agglomerative hierarchical clustering based on performance evaluation is introduced for MOR. This method computes the reduced order denominator of the transfer function by clustering system poles in a hierarchical dendrogram. Several numerical examples of reducing techniques are taken from the literature to compare with our work. In the second paper, a HDP is combined with the Dyna algorithm for path planning. The third paper uses DHP with an eligibility trace parameter (λ) to track a reference trajectory under uncertainties for a nonholonomic mobile robot by using a first-order Sugeno fuzzy neural network structure for the critic and actor networks. In the fourth and fifth papers, a stability analysis for a model-free action-dependent HDP(λ) is demonstrated with batch- and online-implementation learning, respectively. The sixth work combines two different gradient prediction levels of critic networks. In this work, we provide a convergence proofs. The seventh paper develops a two-hybrid recurrent fuzzy neural network structures for both critic and actor networks. They use a novel n-step gradient temporal-difference (gradient of TD(λ)) of an advanced ADP algorithm called value-gradient learning (VGL(λ)), and convergence proofs are given. Furthermore, the seventh paper is the first to combine the single network adaptive critic with VGL(λ). --Abstract, page iv

    Power regulation and load mitigation of floating wind turbines via reinforcement learning

    Get PDF
    Floating offshore wind turbines (FOWTs) are often subjected to heavy structural loads due to challenging operating conditions, which can negatively impact power generation and lead to structural fatigue. This paper proposes a novel reinforcement learning (RL)-based control scheme to address this issue. It combines individual pitch control (IPC) and collective pitch control (CPC) to balance two key objectives: load reduction and power regulation. Specifically, a novel incremental model-based dual heuristic programming (IDHP) strategy is developed as the IPC solution to reduce structural loads. It integrates the online-learned FOWT dynamics into the dual heuristic programming process, making the entire control scheme data-driven and free from dependence on analytical models. Furthermore, the proposed method differs from existing IDHP methods in that only partial system dynamics need to be learned, resulting in a simplified design structure and improved training efficiency. Tests using a high-fidelity FOWT simulator demonstrate the effectiveness of the proposed method

    Reinforcement learning control of a flexible two-link manipulator: an experimental investigation

    Get PDF
    This article discusses the control design and experiment validation of a flexible two-link manipulator (FTLM) system represented by ordinary differential equations (ODEs). A reinforcement learning (RL) control strategy is developed that is based on actor-critic structure to enable vibration suppression while retaining trajectory tracking. Subsequently, the closed-loop system with the proposed RL control algorithm is proved to be semi-global uniform ultimate bounded (SGUUB) by Lyapunov's direct method. In the simulations, the control approach presented has been tested on the discretized ODE dynamic model and the analytical claims have been justified under the existence of uncertainty. Eventually, a series of experiments in a Quanser laboratory platform are investigated to demonstrate the effectiveness of the presented control and its application effect is compared with PD control
    corecore