11 research outputs found

    Adaptive Critic Designs

    Get PDF
    We discuss a variety of adaptive critic designs (ACDs) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: heuristic dynamic programming, dual heuristic programming, and globalized dual heuristic programming (GDHP). The main emphasis is on DHP and GDHP as advanced ACDs. We suggest two new modifications of the original GDHP design that are currently the only working implementations of GDHP. They promise to be useful for many engineering applications in the areas of optimization and optimal control. Based on one of these modifications, we present a unified approach to all ACDs. This leads to a generalized training procedure for ACD

    Discrete Globalised Dual Heuristic Dynamic Programming in Control of the Two-Wheeled Mobile Robot

    Get PDF
    Network-based control systems have been emerging technologies in the control of nonlinear systems over the past few years. This paper focuses on the implementation of the approximate dynamic programming algorithm in the network-based tracking control system of the two-wheeled mobile robot, Pioneer 2-DX. The proposed discrete tracking control system consists of the globalised dual heuristic dynamic programming algorithm, the PD controller, the supervisory term, and an additional control signal. The structure of the supervisory term derives from the stability analysis realised using the Lyapunov stability theorem. The globalised dual heuristic dynamic programming algorithm consists of two structures: the actor and the critic, realised in a form of neural networks. The actor generates the suboptimal control law, while the critic evaluates the realised control strategy by approximation of value function from the Bellman’s equation. The presented discrete tracking control system works online, the neural networks’ weights adaptation process is realised in every iteration step, and the neural networks preliminary learning procedure is not required. The performance of the proposed control system was verified by a series of computer simulations and experiments realised using the wheeled mobile robot Pioneer 2-DX

    Issues on Stability of ADP Feedback Controllers for Dynamical Systems

    Get PDF
    This paper traces the development of neural-network (NN)-based feedback controllers that are derived from the principle of adaptive/approximate dynamic programming (ADP) and discusses their closed-loop stability. Different versions of NN structures in the literature, which embed mathematical mappings related to solutions of the ADP-formulated problems called “adaptive critics” or “action-critic” networks, are discussed. Distinction between the two classes of ADP applications is pointed out. Furthermore, papers in “model-free” development and model-based neurocontrollers are reviewed in terms of their contributions to stability issues. Recent literature suggests that work in ADP-based feedback controllers with assured stability is growing in diverse forms

    Adaptive dynamic programming with eligibility traces and complexity reduction of high-dimensional systems

    Get PDF
    This dissertation investigates the application of a variety of computational intelligence techniques, particularly clustering and adaptive dynamic programming (ADP) designs especially heuristic dynamic programming (HDP) and dual heuristic programming (DHP). Moreover, a one-step temporal-difference (TD(0)) and n-step TD (TD(λ)) with their gradients are utilized as learning algorithms to train and online-adapt the families of ADP. The dissertation is organized into seven papers. The first paper demonstrates the robustness of model order reduction (MOR) for simulating complex dynamical systems. Agglomerative hierarchical clustering based on performance evaluation is introduced for MOR. This method computes the reduced order denominator of the transfer function by clustering system poles in a hierarchical dendrogram. Several numerical examples of reducing techniques are taken from the literature to compare with our work. In the second paper, a HDP is combined with the Dyna algorithm for path planning. The third paper uses DHP with an eligibility trace parameter (λ) to track a reference trajectory under uncertainties for a nonholonomic mobile robot by using a first-order Sugeno fuzzy neural network structure for the critic and actor networks. In the fourth and fifth papers, a stability analysis for a model-free action-dependent HDP(λ) is demonstrated with batch- and online-implementation learning, respectively. The sixth work combines two different gradient prediction levels of critic networks. In this work, we provide a convergence proofs. The seventh paper develops a two-hybrid recurrent fuzzy neural network structures for both critic and actor networks. They use a novel n-step gradient temporal-difference (gradient of TD(λ)) of an advanced ADP algorithm called value-gradient learning (VGL(λ)), and convergence proofs are given. Furthermore, the seventh paper is the first to combine the single network adaptive critic with VGL(λ). --Abstract, page iv

    Stable Adaptive Control Using New Critic Designs

    Full text link
    Classical adaptive control proves total-system stability for control of linear plants, but only for plants meeting very restrictive assumptions. Approximate Dynamic Programming (ADP) has the potential, in principle, to ensure stability without such tight restrictions. It also offers nonlinear and neural extensions for optimal control, with empirically supported links to what is seen in the brain. However, the relevant ADP methods in use today -- TD, HDP, DHP, GDHP -- and the Galerkin-based versions of these all have serious limitations when used here as parallel distributed real-time learning systems; either they do not possess quadratic unconditional stability (to be defined) or they lead to incorrect results in the stochastic case. (ADAC or Q-learning designs do not help.) After explaining these conclusions, this paper describes new ADP designs which overcome these limitations. It also addresses the Generalized Moving Target problem, a common family of static optimization problems, and describes a way to stabilize large-scale economic equilibrium models, such as the old long-term energy model of DOE.Comment: Includes general reviews of alternative control technologies and reinforcement learning. 4 figs, >70p., >200 eqs. Implementation details, stability analysis. Included in 9/24/98 patent disclosure. pdf version uploaded 2012, based on direct conversion of the original word/html file, because of issues of format compatabilit

    Stochastic optimal adaptive controller and communication protocol design for networked control systems

    Get PDF
    Networked Control System (NCS) is a recent topic of research wherein the feedback control loops are closed through a real-time communication network. Many design challenges surface in such systems due to network imperfections such as random delays, packet losses, quantization effects and so on. Since existing control techniques are unsuitable for such systems, in this dissertation, a suite of novel stochastic optimal adaptive design methodologies is undertaken for both linear and nonlinear NCS in presence of uncertain system dynamics and unknown network imperfections such as network-induced delays and packet losses. The design is introduced in five papers. In Paper 1, a stochastic optimal adaptive control design is developed for unknown linear NCS with uncertain system dynamics and unknown network imperfections. A value function is adjusted forward-in-time and online, and a novel update law is proposed for tuning value function estimator parameters. Additionally, by using estimated value function, optimal adaptive control law is derived based on adaptive dynamic programming technique. Subsequently, this design methodology is extended to solve stochastic optimal strategies of linear NCS zero-sum games in Paper 2. Since most systems are inherently nonlinear, a novel stochastic optimal adaptive control scheme is then developed in Paper 3 for nonlinear NCS with unknown network imperfections. On the other hand, in Paper 4, the network protocol behavior (e.g. TCP and UDP) are considered and optimal adaptive control design is revisited using output feedback for linear NCS. Finally, Paper 5 explores a co-design framework where both the controller and network scheduling protocol designs are addressed jointly so that proposed scheme can be implemented into next generation Cyber Physical Systems --Abstract, page iv

    Distributed Control of Autonomous Microgrids

    Get PDF

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
    corecore