511 research outputs found

    Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control

    Full text link
    Fast feedback control and safety guarantees are essential in modern robotics. We present an approach that achieves both by combining novel robust model predictive control (MPC) with function approximation via (deep) neural networks (NNs). The result is a new approach for complex tasks with nonlinear, uncertain, and constrained dynamics as are common in robotics. Specifically, we leverage recent results in MPC research to propose a new robust setpoint tracking MPC algorithm, which achieves reliable and safe tracking of a dynamic setpoint while guaranteeing stability and constraint satisfaction. The presented robust MPC scheme constitutes a one-layer approach that unifies the often separated planning and control layers, by directly computing the control command based on a reference and possibly obstacle positions. As a separate contribution, we show how the computation time of the MPC can be drastically reduced by approximating the MPC law with a NN controller. The NN is trained and validated from offline samples of the MPC, yielding statistical guarantees, and used in lieu thereof at run time. Our experiments on a state-of-the-art robot manipulator are the first to show that both the proposed robust and approximate MPC schemes scale to real-world robotic systems.Comment: 8 pages, 4 figures

    Reliably-stabilizing piecewise-affine neural network controllers

    Full text link
    A common problem affecting neural network (NN) approximations of model predictive control (MPC) policies is the lack of analytical tools to assess the stability of the closed-loop system under the action of the NN-based controller. We present a general procedure to quantify the performance of such a controller, or to design minimum complexity NNs with rectified linear units (ReLUs) that preserve the desirable properties of a given MPC scheme. By quantifying the approximation error between NN-based and MPC-based state-to-input mappings, we first establish suitable conditions involving two key quantities, the worst-case error and the Lipschitz constant, guaranteeing the stability of the closed-loop system. We then develop an offline, mixed-integer optimization-based method to compute those quantities exactly. Together these techniques provide conditions sufficient to certify the stability and performance of a ReLU-based approximation of an MPC control law

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

    Zhang Neural Networks for Online Solution of Time-Varying Linear Inequalities

    Get PDF
    In this chapter, a special type of recurrent neural networks termed โ€œZhang neural networkโ€ (ZNN) is presented and studied for online solution of time-varying linear (matrix-vector and matrix) inequalities. Specifically, focusing on solving the time-varying linear matrix-vector inequality (LMVI), we develop and investigate two different ZNN models based on two different Zhang functions (ZFs). Then, being an extension, by defining another two different ZFs, another two ZNN models are developed and investigated to solve the time-varying linear matrix inequality (LMI). For such ZNN models, theoretical results and analyses are presented as well to show their computational performances. Simulation results with two illustrative examples further substantiate the efficacy of the presented ZNN models for time-varying LMVI and LMI solving

    ๋ชจ๋ธ๊ธฐ๋ฐ˜๊ฐ•ํ™”ํ•™์Šต์„์ด์šฉํ•œ๊ณต์ •์ œ์–ด๋ฐ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ํ™”ํ•™์ƒ๋ฌผ๊ณตํ•™๋ถ€,2020. 2. ์ด์ข…๋ฏผ.์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ๋Š” ๊ณต์ • ์ตœ์ ํ™”์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด ๋ฌธ์ œ์˜ ์ˆ˜์น˜์  ํ•ด๋ฒ• ์ค‘ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์€ ์ˆœ๋ฐฉํ–ฅ์œผ๋กœ ์ž‘๋™ํ•˜๋Š” ์ง์ ‘๋ฒ• (direct optimization) ๋ฐฉ๋ฒ•์ด์ง€๋งŒ, ๋ช‡๊ฐ€์ง€ ํ•œ๊ณ„์ ์„ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ตœ์ ํ•ด๋Š” open-loop์˜ ํ˜•ํƒœ๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ์œผ๋ฉฐ, ๋ถˆํ™•์ •์„ฑ์ด ์กด์žฌํ• ๋•Œ ๋ฐฉ๋ฒ•๋ก ์˜ ์ˆ˜์น˜์  ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋™์  ๊ณ„ํš๋ฒ• (dynamic programming) ์€ ์ด๋Ÿฌํ•œ ํ•œ๊ณ„์ ์„ ๊ทผ์›์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ทธ๋™์•ˆ ๊ณต์ • ์ตœ์ ํ™”์— ์ ๊ทน์ ์œผ๋กœ ๊ณ ๋ ค๋˜์ง€ ์•Š์•˜๋˜ ์ด์œ ๋Š” ๋™์  ๊ณ„ํš๋ฒ•์˜ ๊ฒฐ๊ณผ๋กœ ์–ป์–ด์ง„ ํŽธ๋ฏธ๋ถ„ ๋ฐฉ์ •์‹ ๋ฌธ์ œ๊ฐ€ ์œ ํ•œ์ฐจ์› ๋ฒกํ„ฐ๊ณต๊ฐ„์ด ์•„๋‹Œ ๋ฌดํ•œ์ฐจ์›์˜ ํ•จ์ˆ˜๊ณต๊ฐ„์—์„œ ๋‹ค๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์†Œ์œ„ ์ฐจ์›์˜ ์ €์ฃผ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•œ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ์„œ, ์ƒ˜ํ”Œ์„ ์ด์šฉํ•œ ๊ทผ์‚ฌ์  ํ•ด๋ฒ•์— ์ดˆ์ ์„ ๋‘” ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์ด ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•๋ก  ์ค‘, ๊ณต์ • ์ตœ์ ํ™”์— ์ ํ•ฉํ•œ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•˜๊ณ , ์ด๋ฅผ ๊ณต์ • ์ตœ์ ํ™”์˜ ๋Œ€ํ‘œ์ ์ธ ์„ธ๊ฐ€์ง€ ์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ์ธ ์Šค์ผ€์ค„๋ง, ์ƒ์œ„๋‹จ๊ณ„ ์ตœ์ ํ™”, ํ•˜์œ„๋‹จ๊ณ„ ์ œ์–ด์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋“ค์€ ๊ฐ๊ฐ ๋ถ€๋ถ„๊ด€์ธก ๋งˆ๋ฅด์ฝ”ํ”„ ๊ฒฐ์ • ๊ณผ์ • (partially observable Markov decision process), ์ œ์–ด-์•„ํ•€ ์ƒํƒœ๊ณต๊ฐ„ ๋ชจ๋ธ (control-affine state space model), ์ผ๋ฐ˜์  ์ƒํƒœ๊ณต๊ฐ„ ๋ชจ๋ธ (general state space model)๋กœ ๋ชจ๋ธ๋ง๋œ๋‹ค. ๋˜ํ•œ ๊ฐ ์ˆ˜์น˜์  ๋ชจ๋ธ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP)๋กœ ๋ถˆ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ๋„์ž…ํ•˜์˜€๋‹ค. ์ด ์„ธ๊ฐ€์ง€ ๋ฌธ์ œ์™€ ๋ฐฉ๋ฒ•๋ก ์—์„œ ์ œ์‹œ๋œ ํŠน์ง•๋“ค์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์š”์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค: ์ฒซ๋ฒˆ์งธ๋กœ, ์Šค์ผ€์ค„๋ง ๋ฌธ์ œ์—์„œ closed-loop ํ”ผ๋“œ๋ฐฑ ํ˜•ํƒœ์˜ ํ•ด๋ฅผ ์ œ์‹œํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋Š” ๊ธฐ์กด ์ง์ ‘๋ฒ•์—์„œ ์–ป์„ ์ˆ˜ ์—†์—ˆ๋˜ ํ˜•ํƒœ๋กœ์„œ, ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ฐ•์ ์„ ๋ถ€๊ฐํ•  ์ˆ˜ ์žˆ๋Š” ์ธก๋ฉด์ด๋ผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ ๊ณ ๋ คํ•œ ํ•˜์œ„๋‹จ๊ณ„ ์ œ์–ด ๋ฌธ์ œ์—์„œ, ๋™์  ๊ณ„ํš๋ฒ•์˜ ๋ฌดํ•œ์ฐจ์› ํ•จ์ˆ˜๊ณต๊ฐ„ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ•จ์ˆ˜ ๊ทผ์‚ฌ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์œ ํ•œ์ฐจ์› ๋ฒกํ„ฐ๊ณต๊ฐ„ ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ•˜์˜€๋‹ค. ํŠนํžˆ, ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ํ•จ์ˆ˜ ๊ทผ์‚ฌ๋ฅผ ํ•˜์˜€๊ณ , ์ด๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์žฅ์ ๊ณผ ์ˆ˜๋ ด ํ•ด์„ ๊ฒฐ๊ณผ๋ฅผ ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์— ์‹ค์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋ฌธ์ œ๋Š” ์ƒ์œ„ ๋‹จ๊ณ„ ๋™์  ์ตœ์ ํ™” ๋ฌธ์ œ์ด๋‹ค. ๋™์  ์ตœ์ ํ™” ๋ฌธ์ œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ œ์•ฝ ์กฐ๊ฑดํ•˜์—์„œ ๊ฐ•ํ™”ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด, ์›-์Œ๋Œ€ ๋ฏธ๋ถ„๋™์  ๊ณ„ํš๋ฒ• (primal-dual DDP) ๋ฐฉ๋ฒ•๋ก ์„ ์ƒˆ๋กœ ์ œ์•ˆํ•˜์˜€๋‹ค. ์•ž์„œ ์„ค๋ช…ํ•œ ์„ธ๊ฐ€์ง€ ๋ฌธ์ œ์— ์ ์šฉ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•˜๊ณ , ๋™์  ๊ณ„ํš๋ฒ•์ด ์ง์ ‘๋ฒ•์— ๋น„๊ฒฌ๋  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์ด๋ผ๋Š” ์ฃผ์žฅ์„ ์‹ค์ฆํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ณต์ • ์˜ˆ์ œ๋ฅผ ์‹ค์—ˆ๋‹ค.Sequential decision making problem is a crucial technology for plant-wide process optimization. While the dominant numerical method is the forward-in-time direct optimization, it is limited to the open-loop solution and has difficulty in considering the uncertainty. Dynamic programming method complements the limitations, nonetheless associated functional optimization suffers from the curse-of-dimensionality. The sample-based approach for approximating the dynamic programming, referred to as reinforcement learning (RL) can resolve the issue and investigated throughout this thesis. The method that accounts for the system model explicitly is in particular interest. The model-based RL is exploited to solve the three representative sequential decision making problems; scheduling, supervisory optimization, and regulatory control. The problems are formulated with partially observable Markov decision process, control-affine state space model, and general state space model, and associated model-based RL algorithms are point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP), respectively. The contribution for each problem can be written as follows: First, for the scheduling problem, we developed the closed-loop feedback scheme which highlights the strength compared to the direct optimization method. In the second case, the regulatory control problem is tackled by the function approximation method which relaxes the functional optimization to the finite dimensional vector space optimization. Deep neural networks (DNNs) is utilized as the approximator, and the advantages as well as the convergence analysis is performed in the thesis. Finally, for the supervisory optimization problem, we developed the novel constraint RL framework that uses the primal-dual DDP method. Various illustrative examples are demonstrated to validate the developed model-based RL algorithms and to support the thesis statement on which the dynamic programming method can be considered as a complementary method for direct optimization method.1. Introduction 1 1.1 Motivation and previous work 1 1.2 Statement of contributions 9 1.3 Outline of the thesis 11 2. Background and preliminaries 13 2.1 Optimization problem formulation and the principle of optimality 13 2.1.1 Markov decision process 15 2.1.2 State space model 19 2.2 Overview of the developed RL algorithms 28 2.2.1 Point based value iteration 28 2.2.2 Globalized dual heuristic programming 29 2.2.3 Differential dynamic programming 32 3. A POMDP framework for integrated scheduling of infrastructure maintenance and inspection 35 3.1 Introduction 35 3.2 POMDP solution algorithm 38 3.2.1 General point based value iteration 38 3.2.2 GapMin algorithm 46 3.2.3 Receding horizon POMDP 49 3.3 Problem formulation for infrastructure scheduling 54 3.3.1 State 56 3.3.2 Maintenance and inspection actions 57 3.3.3 State transition function 61 3.3.4 Cost function 67 3.3.5 Observation set and observation function 68 3.3.6 State augmentation 69 3.4 Illustrative example and simulation result 69 3.4.1 Structural point for the analysis of a high dimensional belief space 72 3.4.2 Infinite horizon policy under the natural deterioration process 72 3.4.3 Receding horizon POMDP 79 3.4.4 Validation of POMDP policy via Monte Carlo simulation 83 4. A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system 88 4.1 Introduction 88 4.2 Function approximation and learning with deep neural networks 91 4.2.1 GDHP with a function approximator 91 4.2.2 Stable learning of DNNs 96 4.2.3 Overall algorithm 103 4.3 Results and discussions 107 4.3.1 Example 1: Semi-batch reactor 107 4.3.2 Example 2: Diffusion-Convection-Reaction (DCR) process 120 5. Convergence analysis of the model-based deep reinforcement learning for optimal control of nonlinear control-affine system 126 5.1 Introduction 126 5.2 Convergence proof of globalized dual heuristic programming (GDHP) 128 5.3 Function approximation with deep neural networks 137 5.3.1 Function approximation and gradient descent learning 137 5.3.2 Forward and backward propagations of DNNs 139 5.4 Convergence analysis in the deep neural networks space 141 5.4.1 Lyapunov analysis of the neural network parameter errors 141 5.4.2 Lyapunov analysis of the closed-loop stability 150 5.4.3 Overall Lyapunov function 152 5.5 Simulation results and discussions 157 5.5.1 System description 158 5.5.2 Algorithmic settings 160 5.5.3 Control result 161 6. Primal-dual differential dynamic programming for constrained dynamic optimization of continuous system 170 6.1 Introduction 170 6.2 Primal-dual differential dynamic programming for constrained dynamic optimization 172 6.2.1 Augmented Lagrangian method 172 6.2.2 Primal-dual differential dynamic programming algorithm 175 6.2.3 Overall algorithm 179 6.3 Results and discussions 179 7. Concluding remarks 186 7.1 Summary of the contributions 187 7.2 Future works 189 Bibliography 192Docto
    • โ€ฆ
    corecore