4 research outputs found
Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems
In this paper, we establish that for a wide class of controlled stochastic
differential equations (SDEs) with stiff coefficients, the value functions of
corresponding zero-sum games can be represented by a deep artificial neural
network (DNN), whose complexity grows at most polynomially in both the
dimension of the state equation and the reciprocal of the required accuracy.
Such nonlinear stiff systems may arise, for example, from Galerkin
approximations of controlled stochastic partial differential equations (SPDEs),
or controlled PDEs with uncertain initial conditions and source terms. This
implies that DNNs can break the curse of dimensionality in numerical
approximations and optimal control of PDEs and SPDEs. The main ingredient of
our proof is to construct a suitable discrete-time system to effectively
approximate the evolution of the underlying stochastic dynamics. Similar ideas
can also be applied to obtain expression rates of DNNs for value functions
induced by stiff systems with regime switching coefficients and driven by
general L\'{e}vy noise.Comment: This revised version has been accepted for publication in Analysis
and Application
Boundary control of parabolic PDE using adaptive dynamic programming
In this dissertation, novel adaptive/approximate dynamic programming (ADP) based state and output feedback control methods are presented for distributed parameter systems (DPS) which are expressed as uncertain parabolic partial differential equations (PDEs) in one and two dimensional domains. In the first step, the output feedback control design using an early lumping method is introduced after model reduction. Subsequently controllers were developed in four stages; Unlike current approaches in the literature, state and output feedback approaches were designed without utilizing model reduction for uncertain linear, coupled nonlinear and two-dimensional parabolic PDEs, respectively. In all of these techniques, the infinite horizon cost function was considered and controller design was obtained in a forward-in-time and online manner without solving the algebraic Riccati equation (ARE) or using value and policy iterations techniques.
Providing the stability analysis in the original infinite dimensional domain was a major challenge. Using Lyapunov criterion, the ultimate boundedness (UB) result was demonstrated for the regulation of closed-loop system using all the techniques developed herein. Moreover, due to distributed and large scale nature of state space, pure state feedback control design for DPS has proven to be practically obsolete. Therefore, output feedback design using limited point sensors in the domain or at boundaries are introduced. In the final two papers, the developed state feedback ADP control method was extended to regulate multi-dimensional and more complicated nonlinear parabolic PDE dynamics --Abstract, page iv
λͺ¨λΈκΈ°λ°κ°ννμ΅μμ΄μ©ν곡μ μ μ΄λ°μ΅μ ν
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :곡과λν ννμ물곡νλΆ,2020. 2. μ΄μ’
λ―Ό.μμ°¨μ μμ¬κ²°μ λ¬Έμ λ 곡μ μ΅μ νμ ν΅μ¬ λΆμΌ μ€ νλμ΄λ€. μ΄ λ¬Έμ μ μμΉμ ν΄λ² μ€ κ°μ₯ λ§μ΄ μ¬μ©λλ κ²μ μλ°©ν₯μΌλ‘ μλνλ μ§μ λ² (direct optimization) λ°©λ²μ΄μ§λ§, λͺκ°μ§ νκ³μ μ μ§λκ³ μλ€. μ΅μ ν΄λ open-loopμ ννλ₯Ό μ§λκ³ μμΌλ©°, λΆνμ μ±μ΄ μ‘΄μ¬ν λ λ°©λ²λ‘ μ μμΉμ 볡μ‘λκ° μ¦κ°νλ€λ κ²μ΄λ€. λμ κ³νλ² (dynamic programming) μ μ΄λ¬ν νκ³μ μ κ·Όμμ μΌλ‘ ν΄κ²°ν μ μμ§λ§, κ·Έλμ 곡μ μ΅μ νμ μ κ·Ήμ μΌλ‘ κ³ λ €λμ§ μμλ μ΄μ λ λμ κ³νλ²μ κ²°κ³Όλ‘ μ»μ΄μ§ νΈλ―ΈλΆ λ°©μ μ λ¬Έμ κ° μ νμ°¨μ 벑ν°κ³΅κ°μ΄ μλ 무νμ°¨μμ ν¨μ곡κ°μμ λ€λ£¨μ΄μ§κΈ° λλ¬Έμ΄λ€. μμ μ°¨μμ μ μ£ΌλΌκ³ λΆλ¦¬λ μ΄ λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν νκ°μ§ λ°©λ²μΌλ‘μ, μνμ μ΄μ©ν κ·Όμ¬μ ν΄λ²μ μ΄μ μ λ κ°ννμ΅ λ°©λ²λ‘ μ΄ μ°κ΅¬λμ΄ μλ€. λ³Έ νμλ
Όλ¬Έμμλ κ°ννμ΅ λ°©λ²λ‘ μ€, 곡μ μ΅μ νμ μ ν©ν λͺ¨λΈ κΈ°λ° κ°ννμ΅μ λν΄ μ°κ΅¬νκ³ , μ΄λ₯Ό 곡μ μ΅μ νμ λνμ μΈ μΈκ°μ§ μμ°¨μ μμ¬κ²°μ λ¬Έμ μΈ μ€μΌμ€λ§, μμλ¨κ³ μ΅μ ν, νμλ¨κ³ μ μ΄μ μ μ©νλ κ²μ λͺ©νλ‘ νλ€. μ΄ λ¬Έμ λ€μ κ°κ° λΆλΆκ΄μΈ‘ λ§λ₯΄μ½ν κ²°μ κ³Όμ (partially observable Markov decision process), μ μ΄-μν μνκ³΅κ° λͺ¨λΈ (control-affine state space model), μΌλ°μ μνκ³΅κ° λͺ¨λΈ (general state space model)λ‘ λͺ¨λΈλ§λλ€. λν κ° μμΉμ λͺ¨λΈλ€μ ν΄κ²°νκΈ° μν΄ point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP)λ‘ λΆλ¦¬λ λ°©λ²λ€μ λμ
νμλ€.
μ΄ μΈκ°μ§ λ¬Έμ μ λ°©λ²λ‘ μμ μ μλ νΉμ§λ€μ λ€μκ³Ό κ°μ΄ μμ½ν μ μλ€: 첫λ²μ§Έλ‘, μ€μΌμ€λ§ λ¬Έμ μμ closed-loop νΌλλ°± ννμ ν΄λ₯Ό μ μν μ μμλ€. μ΄λ κΈ°μ‘΄ μ§μ λ²μμ μ»μ μ μμλ ννλ‘μ, κ°ννμ΅μ κ°μ μ λΆκ°ν μ μλ μΈ‘λ©΄μ΄λΌ μκ°ν μ μλ€. λλ²μ§Έλ‘ κ³ λ €ν νμλ¨κ³ μ μ΄ λ¬Έμ μμ, λμ κ³νλ²μ 무νμ°¨μ ν¨μκ³΅κ° μ΅μ ν λ¬Έμ λ₯Ό ν¨μ κ·Όμ¬ λ°©λ²μ ν΅ν΄ μ νμ°¨μ 벑ν°κ³΅κ° μ΅μ ν λ¬Έμ λ‘ μνν μ μλ λ°©λ²μ λμ
νμλ€. νΉν, μ¬μΈ΅ μ κ²½λ§μ μ΄μ©νμ¬ ν¨μ κ·Όμ¬λ₯Ό νμκ³ , μ΄λ λ°μνλ μ¬λ¬κ°μ§ μ₯μ κ³Ό μλ ΄ ν΄μ κ²°κ³Όλ₯Ό λ³Έ νμλ
Όλ¬Έμ μ€μλ€. λ§μ§λ§ λ¬Έμ λ μμ λ¨κ³ λμ μ΅μ ν λ¬Έμ μ΄λ€. λμ μ΅μ ν λ¬Έμ μμ λ°μνλ μ μ½ μ‘°κ±΄νμμ κ°ννμ΅μ μννκΈ° μν΄, μ-μλ λ―ΈλΆλμ κ³νλ² (primal-dual DDP) λ°©λ²λ‘ μ μλ‘ μ μνμλ€. μμ μ€λͺ
ν μΈκ°μ§ λ¬Έμ μ μ μ©λ λ°©λ²λ‘ μ κ²μ¦νκ³ , λμ κ³νλ²μ΄ μ§μ λ²μ λΉκ²¬λ μ μλ λ°©λ²λ‘ μ΄λΌλ μ£Όμ₯μ μ€μ¦νκΈ° μν΄ μ¬λ¬κ°μ§ 곡μ μμ λ₯Ό μ€μλ€.Sequential decision making problem is a crucial technology for plant-wide process optimization. While the dominant numerical method is the forward-in-time direct optimization, it is limited to the open-loop solution and has difficulty in considering the uncertainty. Dynamic programming method complements the limitations, nonetheless associated functional optimization suffers from the curse-of-dimensionality. The sample-based approach for approximating the dynamic programming, referred to as reinforcement learning (RL) can resolve the issue and investigated throughout this thesis. The method that accounts for the system model explicitly is in particular interest. The model-based RL is exploited to solve the three representative sequential decision making problems; scheduling, supervisory optimization, and regulatory control. The problems are formulated with partially observable Markov decision process, control-affine state space model, and general state space model, and associated model-based RL algorithms are point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP), respectively.
The contribution for each problem can be written as follows: First, for the scheduling problem, we developed the closed-loop feedback scheme which highlights the strength compared to the direct optimization method. In the second case, the regulatory control problem is tackled by the function approximation method which relaxes the functional optimization to the finite dimensional vector space optimization. Deep neural networks (DNNs) is utilized as the approximator, and the advantages as well as the convergence analysis is performed in the thesis. Finally, for the supervisory optimization problem, we developed the novel constraint RL framework that uses the primal-dual DDP method. Various illustrative examples are demonstrated to validate the developed model-based RL algorithms and to support the thesis statement on which the dynamic programming method can be considered as a complementary method for direct optimization method.1. Introduction 1
1.1 Motivation and previous work 1
1.2 Statement of contributions 9
1.3 Outline of the thesis 11
2. Background and preliminaries 13
2.1 Optimization problem formulation and the principle of optimality 13
2.1.1 Markov decision process 15
2.1.2 State space model 19
2.2 Overview of the developed RL algorithms 28
2.2.1 Point based value iteration 28
2.2.2 Globalized dual heuristic programming 29
2.2.3 Differential dynamic programming 32
3. A POMDP framework for integrated scheduling of infrastructure maintenance and inspection 35
3.1 Introduction 35
3.2 POMDP solution algorithm 38
3.2.1 General point based value iteration 38
3.2.2 GapMin algorithm 46
3.2.3 Receding horizon POMDP 49
3.3 Problem formulation for infrastructure scheduling 54
3.3.1 State 56
3.3.2 Maintenance and inspection actions 57
3.3.3 State transition function 61
3.3.4 Cost function 67
3.3.5 Observation set and observation function 68
3.3.6 State augmentation 69
3.4 Illustrative example and simulation result 69
3.4.1 Structural point for the analysis of a high dimensional belief space 72
3.4.2 Infinite horizon policy under the natural deterioration process 72
3.4.3 Receding horizon POMDP 79
3.4.4 Validation of POMDP policy via Monte Carlo simulation 83
4. A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system 88
4.1 Introduction 88
4.2 Function approximation and learning with deep neural networks 91
4.2.1 GDHP with a function approximator 91
4.2.2 Stable learning of DNNs 96
4.2.3 Overall algorithm 103
4.3 Results and discussions 107
4.3.1 Example 1: Semi-batch reactor 107
4.3.2 Example 2: Diffusion-Convection-Reaction (DCR) process 120
5. Convergence analysis of the model-based deep reinforcement learning for optimal control of nonlinear control-affine system 126
5.1 Introduction 126
5.2 Convergence proof of globalized dual heuristic programming (GDHP) 128
5.3 Function approximation with deep neural networks 137
5.3.1 Function approximation and gradient descent learning 137
5.3.2 Forward and backward propagations of DNNs 139
5.4 Convergence analysis in the deep neural networks space 141
5.4.1 Lyapunov analysis of the neural network parameter errors 141
5.4.2 Lyapunov analysis of the closed-loop stability 150
5.4.3 Overall Lyapunov function 152
5.5 Simulation results and discussions 157
5.5.1 System description 158
5.5.2 Algorithmic settings 160
5.5.3 Control result 161
6. Primal-dual differential dynamic programming for constrained dynamic optimization of continuous system 170
6.1 Introduction 170
6.2 Primal-dual differential dynamic programming for constrained dynamic optimization 172
6.2.1 Augmented Lagrangian method 172
6.2.2 Primal-dual differential dynamic programming algorithm 175
6.2.3 Overall algorithm 179
6.3 Results and discussions 179
7. Concluding remarks 186
7.1 Summary of the contributions 187
7.2 Future works 189
Bibliography 192Docto
Model based fault diagnosis and prognosis of class of linear and nonlinear distributed parameter systems modeled by partial differential equations
With the rapid development of modern control systems, a significant number of industrial systems may suffer from component failures. An accurate yet faster fault prognosis and resilience can improve system availability and reduce unscheduled downtime. Therefore, in this dissertation, model-based prognosis and resilience control schemes have been developed for online prediction and accommodation of faults for distributed parameter systems (DPS). First, a novel fault detection, estimation and prediction framework is introduced utilizing a novel observer for a class of linear DPS with bounded disturbance by modeling the DPS as a set of partial differential equations.
To relax the state measurability in DPS, filters are introduced to redesign the detection observer. Upon detecting a fault, an adaptive term is activated to estimate the multiplicative fault and a tuning law is derived to tune the fault parameter magnitude. Then based on this estimated fault parameter together with its failure limit, time-to-failure (TTF) is derived for prognosis. A novel fault accommodation scheme is developed to handle actuator and sensor faults with boundary measurements. Next, a fault isolation scheme is presented to differentiate actuator, sensor and state faults with a limited number of measurements for a class of linear and nonlinear DPS.
Subsequently, actuator and sensor fault detection and prediction for a class of nonlinear DPS are considered with bounded disturbance by using a Luenberger observer. Finally, a novel resilient control scheme is proposed for nonlinear DPS once an actuator fault is detected by using an additional boundary measurement. In all the above methods, Lyapunov analysis is utilized to show the boundedness of the closed-loop signals during fault detection, prediction and resilience under mild assumptions --Abstract, page iv