1,215 research outputs found

    Approximate dynamic programming based solutions for fixed-final-time optimal control and optimal switching

    Get PDF
    Optimal solutions with neural networks (NN) based on an approximate dynamic programming (ADP) framework for new classes of engineering and non-engineering problems and associated difficulties and challenges are investigated in this dissertation. In the enclosed eight papers, the ADP framework is utilized for solving fixed-final-time problems (also called terminal control problems) and problems with switching nature. An ADP based algorithm is proposed in Paper 1 for solving fixed-final-time problems with soft terminal constraint, in which, a single neural network with a single set of weights is utilized. Paper 2 investigates fixed-final-time problems with hard terminal constraints. The optimality analysis of the ADP based algorithm for fixed-final-time problems is the subject of Paper 3, in which, it is shown that the proposed algorithm leads to the global optimal solution providing certain conditions hold. Afterwards, the developments in Papers 1 to 3 are used to tackle a more challenging class of problems, namely, optimal control of switching systems. This class of problems is divided into problems with fixed mode sequence (Papers 4 and 5) and problems with free mode sequence (Papers 6 and 7). Each of these two classes is further divided into problems with autonomous subsystems (Papers 4 and 6) and problems with controlled subsystems (Papers 5 and 7). Different ADP-based algorithms are developed and proofs of convergence of the proposed iterative algorithms are presented. Moreover, an extension to the developments is provided for online learning of the optimal switching solution for problems with modeling uncertainty in Paper 8. Each of the theoretical developments is numerically analyzed using different real-world or benchmark problems --Abstract, page v

    Optimal Controller Synthesis of Variable-Time Impulsive Problems using Single Network Adaptive Critics

    Get PDF
    This paper presents a systematic approach to solve for the optimal control of a variable-time impulsive system. First, optimality condition for a variable-time impulsive system is derived using the calculus of variations method. Next, a single network adaptive critic technique is proposed to numerically solve for the optimal control and the detailed algorithm is presented. Finally, two examples-one linear and one nonlinear-are solved applying the conditions derived and the algorithm proposed. Numerical results demonstrate the power of the neural network based adaptive critic method in solving this class of problems

    Finite-horizon optimal control of linear and a class of nonlinear systems

    Get PDF
    Traditionally, optimal control of dynamical systems with known system dynamics is obtained in a backward-in-time and offline manner either by using Riccati or Hamilton-Jacobi-Bellman (HJB) equation. In contrast, in this dissertation, finite-horizon optimal regulation has been investigated for both linear and nonlinear systems in a forward-in-time manner when system dynamics are uncertain. Value and policy iterations are not used while the value function (or Q-function for linear systems) and control input are updated once a sampling interval consistent with standard adaptive control. First, the optimal adaptive control of linear discrete-time systems with unknown system dynamics is presented in Paper I by using Q-learning and Bellman equation while satisfying the terminal constraint. A novel update law that uses history information of the cost to go is derived. Paper II considers the design of the linear quadratic regulator in the presence of state and input quantization. Quantization errors are eliminated via a dynamic quantizer design and the parameter update law is redesigned from Paper I. Furthermore, an optimal adaptive state feedback controller is developed in Paper III for the general nonlinear discrete-time systems in affine form without the knowledge of system dynamics. In Paper IV, a NN-based observer is proposed to reconstruct the state vector and identify the dynamics so that the control scheme from Paper III is extended to output feedback. Finally, the optimal regulation of quantized nonlinear systems with input constraint is considered in Paper V by introducing a non-quadratic cost functional. Closed-loop stability is demonstrated for all the controller designs developed in this dissertation by using Lyapunov analysis while all the proposed schemes function in an online and forward-in-time manner so that they are practically viable --Abstract, page iv

    Robust Optimal Control of Wave Energy Converters Based on Adaptive Dynamic Programming

    Get PDF

    Optimal Neuro-Controller Synthesis for Impulse-Driven System

    Get PDF
    This paper presents a new controller design technique for systems driven with impulse inputs. Necessary conditions for optimal impulse control are derived. A neural network structure to solve the resulting equations is presented. The solution concepts are illustrated with a few example problems that exhibit increasing levels of difficulty. Two linear problems-one scalar and one vector-and a benchmark nonlinear problem-Van Der Pol oscillator-are used as case studies. Numerical results show the efficacy of the new solution process for impulse driven systems. Since the theoretical development and the design technique are free from restrictive assumptions, this technique is applicable to many problems in engineering and science

    Learning-based Predictive Control for Nonlinear Systems with Unknown Dynamics Subject to Safety Constraints

    Full text link
    Model predictive control (MPC) has been widely employed as an effective method for model-based constrained control. For systems with unknown dynamics, reinforcement learning (RL) and adaptive dynamic programming (ADP) have received notable attention to solve the adaptive optimal control problems. Recently, works on the use of RL in the framework of MPC have emerged, which can enhance the ability of MPC for data-driven control. However, the safety under state constraints and the closed-loop robustness are difficult to be verified due to approximation errors of RL with function approximation structures. Aiming at the above problem, we propose a data-driven robust MPC solution based on incremental RL, called data-driven robust learning-based predictive control (dr-LPC), for perturbed unknown nonlinear systems subject to safety constraints. A data-driven robust MPC (dr-MPC) is firstly formulated with a learned predictor. The incremental Dual Heuristic Programming (DHP) algorithm using an actor-critic architecture is then utilized to solve the online optimization problem of dr-MPC. In each prediction horizon, the actor and critic learn time-varying laws for approximating the optimal control policy and costate respectively, which is different from classical MPCs. The state and control constraints are enforced in the learning process via building a Hamilton-Jacobi-Bellman (HJB) equation and a regularized actor-critic learning structure using logarithmic barrier functions. The closed-loop robustness and safety of the dr-LPC are proven under function approximation errors. Simulation results on two control examples have been reported, which show that the dr-LPC can outperform the DHP and dr-MPC in terms of state regulation, and its average computational time is much smaller than that with the dr-MPC in both examples.Comment: The paper has been submitted at a IEEE Journal for possible publicatio

    ๋ชจ๋ธ๊ธฐ๋ฐ˜๊ฐ•ํ™”ํ•™์Šต์„์ด์šฉํ•œ๊ณต์ •์ œ์–ด๋ฐ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ํ™”ํ•™์ƒ๋ฌผ๊ณตํ•™๋ถ€,2020. 2. ์ด์ข…๋ฏผ.์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ๋Š” ๊ณต์ • ์ตœ์ ํ™”์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด ๋ฌธ์ œ์˜ ์ˆ˜์น˜์  ํ•ด๋ฒ• ์ค‘ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์€ ์ˆœ๋ฐฉํ–ฅ์œผ๋กœ ์ž‘๋™ํ•˜๋Š” ์ง์ ‘๋ฒ• (direct optimization) ๋ฐฉ๋ฒ•์ด์ง€๋งŒ, ๋ช‡๊ฐ€์ง€ ํ•œ๊ณ„์ ์„ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ตœ์ ํ•ด๋Š” open-loop์˜ ํ˜•ํƒœ๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ์œผ๋ฉฐ, ๋ถˆํ™•์ •์„ฑ์ด ์กด์žฌํ• ๋•Œ ๋ฐฉ๋ฒ•๋ก ์˜ ์ˆ˜์น˜์  ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋™์  ๊ณ„ํš๋ฒ• (dynamic programming) ์€ ์ด๋Ÿฌํ•œ ํ•œ๊ณ„์ ์„ ๊ทผ์›์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ทธ๋™์•ˆ ๊ณต์ • ์ตœ์ ํ™”์— ์ ๊ทน์ ์œผ๋กœ ๊ณ ๋ ค๋˜์ง€ ์•Š์•˜๋˜ ์ด์œ ๋Š” ๋™์  ๊ณ„ํš๋ฒ•์˜ ๊ฒฐ๊ณผ๋กœ ์–ป์–ด์ง„ ํŽธ๋ฏธ๋ถ„ ๋ฐฉ์ •์‹ ๋ฌธ์ œ๊ฐ€ ์œ ํ•œ์ฐจ์› ๋ฒกํ„ฐ๊ณต๊ฐ„์ด ์•„๋‹Œ ๋ฌดํ•œ์ฐจ์›์˜ ํ•จ์ˆ˜๊ณต๊ฐ„์—์„œ ๋‹ค๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์†Œ์œ„ ์ฐจ์›์˜ ์ €์ฃผ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•œ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ์„œ, ์ƒ˜ํ”Œ์„ ์ด์šฉํ•œ ๊ทผ์‚ฌ์  ํ•ด๋ฒ•์— ์ดˆ์ ์„ ๋‘” ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์ด ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•๋ก  ์ค‘, ๊ณต์ • ์ตœ์ ํ™”์— ์ ํ•ฉํ•œ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•˜๊ณ , ์ด๋ฅผ ๊ณต์ • ์ตœ์ ํ™”์˜ ๋Œ€ํ‘œ์ ์ธ ์„ธ๊ฐ€์ง€ ์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ์ธ ์Šค์ผ€์ค„๋ง, ์ƒ์œ„๋‹จ๊ณ„ ์ตœ์ ํ™”, ํ•˜์œ„๋‹จ๊ณ„ ์ œ์–ด์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋“ค์€ ๊ฐ๊ฐ ๋ถ€๋ถ„๊ด€์ธก ๋งˆ๋ฅด์ฝ”ํ”„ ๊ฒฐ์ • ๊ณผ์ • (partially observable Markov decision process), ์ œ์–ด-์•„ํ•€ ์ƒํƒœ๊ณต๊ฐ„ ๋ชจ๋ธ (control-affine state space model), ์ผ๋ฐ˜์  ์ƒํƒœ๊ณต๊ฐ„ ๋ชจ๋ธ (general state space model)๋กœ ๋ชจ๋ธ๋ง๋œ๋‹ค. ๋˜ํ•œ ๊ฐ ์ˆ˜์น˜์  ๋ชจ๋ธ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP)๋กœ ๋ถˆ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ๋„์ž…ํ•˜์˜€๋‹ค. ์ด ์„ธ๊ฐ€์ง€ ๋ฌธ์ œ์™€ ๋ฐฉ๋ฒ•๋ก ์—์„œ ์ œ์‹œ๋œ ํŠน์ง•๋“ค์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์š”์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค: ์ฒซ๋ฒˆ์งธ๋กœ, ์Šค์ผ€์ค„๋ง ๋ฌธ์ œ์—์„œ closed-loop ํ”ผ๋“œ๋ฐฑ ํ˜•ํƒœ์˜ ํ•ด๋ฅผ ์ œ์‹œํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋Š” ๊ธฐ์กด ์ง์ ‘๋ฒ•์—์„œ ์–ป์„ ์ˆ˜ ์—†์—ˆ๋˜ ํ˜•ํƒœ๋กœ์„œ, ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ฐ•์ ์„ ๋ถ€๊ฐํ•  ์ˆ˜ ์žˆ๋Š” ์ธก๋ฉด์ด๋ผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ ๊ณ ๋ คํ•œ ํ•˜์œ„๋‹จ๊ณ„ ์ œ์–ด ๋ฌธ์ œ์—์„œ, ๋™์  ๊ณ„ํš๋ฒ•์˜ ๋ฌดํ•œ์ฐจ์› ํ•จ์ˆ˜๊ณต๊ฐ„ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ•จ์ˆ˜ ๊ทผ์‚ฌ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์œ ํ•œ์ฐจ์› ๋ฒกํ„ฐ๊ณต๊ฐ„ ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ•˜์˜€๋‹ค. ํŠนํžˆ, ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ํ•จ์ˆ˜ ๊ทผ์‚ฌ๋ฅผ ํ•˜์˜€๊ณ , ์ด๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์žฅ์ ๊ณผ ์ˆ˜๋ ด ํ•ด์„ ๊ฒฐ๊ณผ๋ฅผ ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์— ์‹ค์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋ฌธ์ œ๋Š” ์ƒ์œ„ ๋‹จ๊ณ„ ๋™์  ์ตœ์ ํ™” ๋ฌธ์ œ์ด๋‹ค. ๋™์  ์ตœ์ ํ™” ๋ฌธ์ œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ œ์•ฝ ์กฐ๊ฑดํ•˜์—์„œ ๊ฐ•ํ™”ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด, ์›-์Œ๋Œ€ ๋ฏธ๋ถ„๋™์  ๊ณ„ํš๋ฒ• (primal-dual DDP) ๋ฐฉ๋ฒ•๋ก ์„ ์ƒˆ๋กœ ์ œ์•ˆํ•˜์˜€๋‹ค. ์•ž์„œ ์„ค๋ช…ํ•œ ์„ธ๊ฐ€์ง€ ๋ฌธ์ œ์— ์ ์šฉ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•˜๊ณ , ๋™์  ๊ณ„ํš๋ฒ•์ด ์ง์ ‘๋ฒ•์— ๋น„๊ฒฌ๋  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์ด๋ผ๋Š” ์ฃผ์žฅ์„ ์‹ค์ฆํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ณต์ • ์˜ˆ์ œ๋ฅผ ์‹ค์—ˆ๋‹ค.Sequential decision making problem is a crucial technology for plant-wide process optimization. While the dominant numerical method is the forward-in-time direct optimization, it is limited to the open-loop solution and has difficulty in considering the uncertainty. Dynamic programming method complements the limitations, nonetheless associated functional optimization suffers from the curse-of-dimensionality. The sample-based approach for approximating the dynamic programming, referred to as reinforcement learning (RL) can resolve the issue and investigated throughout this thesis. The method that accounts for the system model explicitly is in particular interest. The model-based RL is exploited to solve the three representative sequential decision making problems; scheduling, supervisory optimization, and regulatory control. The problems are formulated with partially observable Markov decision process, control-affine state space model, and general state space model, and associated model-based RL algorithms are point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP), respectively. The contribution for each problem can be written as follows: First, for the scheduling problem, we developed the closed-loop feedback scheme which highlights the strength compared to the direct optimization method. In the second case, the regulatory control problem is tackled by the function approximation method which relaxes the functional optimization to the finite dimensional vector space optimization. Deep neural networks (DNNs) is utilized as the approximator, and the advantages as well as the convergence analysis is performed in the thesis. Finally, for the supervisory optimization problem, we developed the novel constraint RL framework that uses the primal-dual DDP method. Various illustrative examples are demonstrated to validate the developed model-based RL algorithms and to support the thesis statement on which the dynamic programming method can be considered as a complementary method for direct optimization method.1. Introduction 1 1.1 Motivation and previous work 1 1.2 Statement of contributions 9 1.3 Outline of the thesis 11 2. Background and preliminaries 13 2.1 Optimization problem formulation and the principle of optimality 13 2.1.1 Markov decision process 15 2.1.2 State space model 19 2.2 Overview of the developed RL algorithms 28 2.2.1 Point based value iteration 28 2.2.2 Globalized dual heuristic programming 29 2.2.3 Differential dynamic programming 32 3. A POMDP framework for integrated scheduling of infrastructure maintenance and inspection 35 3.1 Introduction 35 3.2 POMDP solution algorithm 38 3.2.1 General point based value iteration 38 3.2.2 GapMin algorithm 46 3.2.3 Receding horizon POMDP 49 3.3 Problem formulation for infrastructure scheduling 54 3.3.1 State 56 3.3.2 Maintenance and inspection actions 57 3.3.3 State transition function 61 3.3.4 Cost function 67 3.3.5 Observation set and observation function 68 3.3.6 State augmentation 69 3.4 Illustrative example and simulation result 69 3.4.1 Structural point for the analysis of a high dimensional belief space 72 3.4.2 Infinite horizon policy under the natural deterioration process 72 3.4.3 Receding horizon POMDP 79 3.4.4 Validation of POMDP policy via Monte Carlo simulation 83 4. A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system 88 4.1 Introduction 88 4.2 Function approximation and learning with deep neural networks 91 4.2.1 GDHP with a function approximator 91 4.2.2 Stable learning of DNNs 96 4.2.3 Overall algorithm 103 4.3 Results and discussions 107 4.3.1 Example 1: Semi-batch reactor 107 4.3.2 Example 2: Diffusion-Convection-Reaction (DCR) process 120 5. Convergence analysis of the model-based deep reinforcement learning for optimal control of nonlinear control-affine system 126 5.1 Introduction 126 5.2 Convergence proof of globalized dual heuristic programming (GDHP) 128 5.3 Function approximation with deep neural networks 137 5.3.1 Function approximation and gradient descent learning 137 5.3.2 Forward and backward propagations of DNNs 139 5.4 Convergence analysis in the deep neural networks space 141 5.4.1 Lyapunov analysis of the neural network parameter errors 141 5.4.2 Lyapunov analysis of the closed-loop stability 150 5.4.3 Overall Lyapunov function 152 5.5 Simulation results and discussions 157 5.5.1 System description 158 5.5.2 Algorithmic settings 160 5.5.3 Control result 161 6. Primal-dual differential dynamic programming for constrained dynamic optimization of continuous system 170 6.1 Introduction 170 6.2 Primal-dual differential dynamic programming for constrained dynamic optimization 172 6.2.1 Augmented Lagrangian method 172 6.2.2 Primal-dual differential dynamic programming algorithm 175 6.2.3 Overall algorithm 179 6.3 Results and discussions 179 7. Concluding remarks 186 7.1 Summary of the contributions 187 7.2 Future works 189 Bibliography 192Docto
    • โ€ฆ
    corecore