8 research outputs found

    Off-policy Q-learning: set-point design for optimizing dual-rate rougher flotation operational processes

    Get PDF
    Rougher flotation, composed of unit processes operating at a fast time scale and economic performance measurements known as operational indices measured at a slower time scale, is very basic and the first concentration stage for flotation plants. Optimizing operational process for rougher flotation circuits is extremely important due to high economic profit arising from the optimality of operational indices. This paper presents a novel off-policy Q-learning method to learn theoptimal solution to rougher flotation operational processes without the knowledge of dynamics of unit processes and operational indices. To this end, first, the optimal operational control (OOC) for dual-rate rougher flotationprocesses is formulated. Second, H∞ tracking control problem is developed to optimally prescribe the set-points for the rougher flotation processes. Then, a zero-sum game off-policy Q-learning algorithm is proposed to find theoptimal set-points by using measured data. Finally, simulation experiments are employed to show the effectiveness of the proposed method

    Model-Free δ\delta-Policy Iteration Based on Damped Newton Method for Nonlinear Continuous-Time H∞\infty Tracking Control

    Full text link
    This paper presents a {\delta}-PI algorithm which is based on damped Newton method for the H{\infty} tracking control problem of unknown continuous-time nonlinear system. A discounted performance function and an augmented system are used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI equation is a nonlinear partial differential equation, traditional reinforcement learning methods for solving the tracking HJI equation are mostly based on the Newton method, which usually only satisfies local convergence and needs a good initial guess. Based upon the damped Newton iteration operator equation, a generalized tracking Bellman equation is derived firstly. The {\delta}-PI algorithm can seek the optimal solution of the tracking HJI equation by iteratively solving the generalized tracking Bellman equation. On-policy learning and off-policy learning {\delta}-PI reinforcement learning methods are provided, respectively. Off-policy version {\delta}-PI algorithm is a model-free algorithm which can be performed without making use of a priori knowledge of the system dynamics. NN-based implementation scheme for the off-policy {\delta}-PI algorithms is shown. The suitability of the model-free {\delta}-PI algorithm is illustrated with a nonlinear system simulation.Comment: 10 pages, 8 figure

    Decentralized Optimal Control With Application In Power System

    Get PDF
    An output-feedback decentralized optimal controller is proposed for power systems with renewable energy penetration. Renewable energy source is modeled similar to the classical generator model and is equipped with the unified power flow controller (UPFC). The transient performance of power system is considered and stability of the dynamical states are investigated. An offline decentralized optimal controller is designed that utilizes only the local states. The network comprises conventional synchronous generators as well as renewable sources with inverter equipped with UPFC. Subsequently, the optimal decentralized controller is compared to the initial stabilizing controller used to obtain the optimal controller. An online decentralized optimal controller is designed for discrete-time system. Two neuro networks are utilized to estimate value function and optimal control strategy. Furthermore, a novel observer-based decentralized optimal controller is developed on small scale discrete-time power system. The system is trained followed by least square rules and successive approximation. Simulation results on IEEE 14-, 30-, and 118-bus power system benchmarks shows satisfactory performance of the online decentralized controller. And also, simulation results demonstrate great performance of the observer and the optimal controller compare to the centralized optimal controller

    Reinforcement Learning, Intelligent Control and their Applications in Connected and Autonomous Vehicles

    Get PDF
    Reinforcement learning (RL) has attracted large attention over the past few years. Recently, we developed a data-driven algorithm to solve predictive cruise control (PCC) and games output regulation problems. This work integrates our recent contributions to the application of RL in game theory, output regulation problems, robust control, small-gain theory and PCC. The algorithm was developed for H∞H_\infty adaptive optimal output regulation of uncertain linear systems, and uncertain partially linear systems to reject disturbance and also force the output of the systems to asymptotically track a reference. In the PCC problem, we determined the reference velocity for each autonomous vehicle in the platoon using the traffic information broadcasted from the lights to reduce the vehicles\u27 trip time. Then we employed the algorithm to design an approximate optimal controller for the vehicles. This controller is able to regulate the headway, velocity and acceleration of each vehicle to the desired values. Simulation results validate the effectiveness of the algorithms

    Model-based Reinforcement Learning of Nonlinear Dynamical Systems

    Get PDF
    Model-based Reinforcement Learning (MBRL) techniques accelerate the learning task by employing a transition model to make predictions. In this dissertation, we present novel techniques for online learning of unknown dynamics by iteratively computing a feedback controller based on the most recent update of the model. Assuming a structured continuous-time model of the system in terms of a set of bases, we formulate an infinite horizon optimal control problem addressing a given control objective. The structure of the system along with a value function parameterized in the quadratic form provides flexibility in analytically calculating an update rule for the parameters. Hence, a matrix differential equation of the parameters is obtained, where the solution is used to characterize the optimal feedback control in terms of the bases, at any time step. Moreover, the quadratic form of the value function suggests a compact way of updating the parameters that considerably decreases the computational complexity. In the convergence analysis, we demonstrate asymptotic stability and optimality of the obtained learning algorithm around the equilibrium by revealing its connections with the analogous Linear Quadratic Regulator (LQR). Moreover, the results are extended to the trajectory tracking problem. Assuming a structured unknown nonlinear system augmented with the dynamics of a commander system, we obtain a control rule minimizing a given quadratic tracking objective function. Furthermore, in an alternative technique for learning, a piecewise nonlinear affine framework is developed for controlling nonlinear systems with unknown dynamics. Therefore, we extend the results to obtain a general piecewise nonlinear framework where each piece is responsible for locally learning and controlling over some partition of the domain. Then, we consider the Piecewise Affine (PWA) system with a bounded uncertainty as a special case, for which we suggest an optimization-based verification technique. Accordingly, given a discretization of the learned PWA system, we iteratively search for a common piecewise Lyapunov function in a set of positive definite functions, where a non-monotonic convergence is allowed. Then, this Lyapunov candidate is verified for the uncertain system. To demonstrate the applicability of the approaches presented in this dissertation, simulation results on benchmark nonlinear systems are included, such as quadrotor, vehicle, etc. Moreover, as another detailed application, we investigate the Maximum Power Point Tracking (MPPT) problem of solar Photovoltaic (PV) systems. Therefore, we develop an analytical nonlinear optimal control approach that assumes a known model. Then, we apply the obtained nonlinear optimal controller together with the piecewise MBRL technique presented previously

    Cooperative Strategies for Management of Power Quality Problems in Voltage-Source Converter-based Microgrids

    Get PDF
    The development of cooperative control strategies for microgrids has become an area of increasing research interest in recent years, often a result of advances in other areas of control theory such as multi-agent systems and enabled by emerging wireless communications technology, machine learning techniques, and power electronics. While some possible applications of the cooperative control theory to microgrids have been described in the research literature, a comprehensive survey of this approach with respect to its limitations and wide-ranging potential applications has not yet been provided. In this regard, an important area of research into microgrids is developing intelligent cooperative operating strategies within and between microgrids which implement and allocate tasks at the local level, and do not rely on centralized command and control structures. Multi-agent techniques are one focus of this research, but have not been applied to the full range of power quality problems in microgrids. The ability for microgrid control systems to manage harmonics, unbalance, flicker, and black start capability are some examples of applications yet to be fully exploited. During islanded operation, the normal buffer against disturbances and power imbalances provided by the main grid coupling is removed, this together with the reduced inertia of the microgrid (MG), makes power quality (PQ) management a critical control function. This research will investigate new cooperative control techniques for solving power quality problems in voltage source converter (VSC)-based AC microgrids. A set of specific power quality problems have been selected for the application focus, based on a survey of relevant published literature, international standards, and electricity utility regulations. The control problems which will be addressed are voltage regulation, unbalance load sharing, and flicker mitigation. The thesis introduces novel approaches based on multi-agent consensus problems and differential games. It was decided to exclude the management of harmonics, which is a more challenging issue, and is the focus of future research. Rather than using model-based engineering design for optimization of controller parameters, the thesis describes a novel technique for controller synthesis using off-policy reinforcement learning. The thesis also addresses the topic of communication and control system co-design. In this regard, stability of secondary voltage control considering communication time-delays will be addressed, while a performance-oriented approach to rate allocation using a novel solution method is described based on convex optimization
    corecore