61 research outputs found
Efficiently Solving Repeated Integer Linear Programming Problems by Learning Solutions of Similar Linear Programming Problems using Boosting Trees
It is challenging to obtain online solutions of large-scale integer linear programming (ILP) problems that occur frequently in slightly different forms during planning for autonomous systems. We refer to such ILP problems as repeated ILP problems. The branch-and-bound (BAB) algorithm is commonly used to solve ILP problems, and a significant amount of computation time is expended in solving numerous relaxed linear programming (LP) problems at the nodes of the BAB trees. We observe that the relaxed LP problems, both within a particular BAB tree and across multiple trees for repeated ILP problems, are similar to each other in the sense that they contain almost the same number of constraints, similar objective function and constraint coefficients, and an identical number of decision variables. We present a boosting tree-based regression technique for learning a set of functions that map the objective function and the constraints to the decision variables of such a system of similar LP problems; this enables us to efficiently infer approximately optimal solutions of the repeated ILP problems. We provide theoretical performance guarantees on the predicted values and demonstrate the effectiveness of the algorithm in four representative domains involving a library of benchmark ILP problems, aircraft carrier deck scheduling, vehicle routing, and vehicle control
Tailored Presolve Techniques in Branch-and-Bound Method for Fast Mixed-Integer Optimal Control Applications
Mixed-integer model predictive control (MI-MPC) can be a powerful tool for
modeling hybrid control systems. In case of a linear-quadratic objective in
combination with linear or piecewise-linear system dynamics and inequality
constraints, MI-MPC needs to solve a mixed-integer quadratic program (MIQP) at
each sampling time step. This paper presents a collection of block-sparse
presolve techniques to efficiently remove decision variables, and to remove or
tighten inequality constraints, tailored to mixed-integer optimal control
problems (MIOCP). In addition, we describe a novel heuristic approach based on
an iterative presolve algorithm to compute a feasible but possibly suboptimal
MIQP solution. We present benchmarking results for a C code implementation of
the proposed BB-ASIPM solver, including a branch-and-bound (B&B) method with
the proposed tailored presolve techniques and an active-set based interior
point method (ASIPM), compared against multiple state-of-the-art MIQP solvers
on a case study of motion planning with obstacle avoidance constraints.
Finally, we demonstrate the computational performance of the BB-ASIPM solver on
the dSPACE Scalexio real-time embedded hardware using a second case study of
stabilization for an underactuated cart-pole with soft contacts.Comment: 27 pages, 7 figures, 2 tables, submitted to journal of Optimal
Control Applications and Method
Neural Networks for Fast Optimisation in Model Predictive Control: A Review
Model Predictive Control (MPC) is an optimal control algorithm with strong
stability and robustness guarantees. Despite its popularity in robotics and
industrial applications, the main challenge in deploying MPC is its high
computation cost, stemming from the need to solve an optimisation problem at
each control interval. There are several methods to reduce this cost. This
survey focusses on approaches where a neural network is used to approximate an
existing controller. Herein, relevant and unique neural approximation methods
for linear, nonlinear, and robust MPC are presented and compared. Comparisons
are based on the theoretical guarantees that are preserved, the factor by which
the original controller is sped up, and the size of problem that a framework is
applicable to. Research contributions include: a taxonomy that organises
existing knowledge, a summary of literary gaps, discussion on promising
research directions, and simple guidelines for choosing an approximation
framework. The main conclusions are that (1) new benchmarking tools are needed
to help prove the generalisability and scalability of approximation frameworks,
(2) future breakthroughs most likely lie in the development of ties between
control and learning, and (3) the potential and applicability of recently
developed neural architectures and tools remains unexplored in this field.Comment: 34 pages, 6 figures 3 tables. Submitted to ACM Computing Survey
Warm Start of Mixed-Integer Programs for Model Predictive Control of Hybrid Systems
In hybrid Model Predictive Control (MPC), a Mixed-Integer Quadratic Program
(MIQP) is solved at each sampling time to compute the optimal control action.
Although these optimizations are generally very demanding, in MPC we expect
consecutive problem instances to be nearly identical. This paper addresses the
question of how computations performed at one time step can be reused to
accelerate (warm start) the solution of subsequent MIQPs.
Reoptimization is not a rare practice in integer programming: for small
variations of certain problem data, the branch-and-bound algorithm allows an
efficient reuse of its search tree and the dual bounds of its leaf nodes. In
this paper we extend these ideas to the receding-horizon settings of MPC. The
warm-start algorithm we propose copes naturally with arbitrary model errors,
has a negligible computational cost, and frequently enables an a-priori pruning
of most of the search space. Theoretical considerations and experimental
evidence show that the proposed method tends to reduce the combinatorial
complexity of the hybrid MPC problem to that of a one-step look-ahead
optimization, greatly easing the online computation burden
Advances in Polynomial Optimization
Polynomial optimization has a wide range of practical applications in fields
such as optimal control, energy and water networks, facility location, management science, and finance. It also
generalizes relevant optimization problems thoroughly studied in the literature, such as mixed-binary linear
optimization, quadratic optimization, and complementarity problems. As finding globally optimal solutions is an
extremely challenging task, the development of efficient techniques for solving polynomial optimization problems is
of particular relevance. In this thesis we provide a detailed study of different techniques to solve this kind of
problems and we introduce some nobel approaches in this field, including the use of statistical learning techniques.
Furthermore, we also present a practical application of polynomial optimization to finance and more specifically,
portfolio design
์ด๋๋ธ๋ก ๋ฐ ์๋ฅํธ์ฐจ ์ ๊ฑฐ ๋ชจ๋ธ์์ธก์ ์ด ๊ธฐ๋ฒ์ ์ต์ ์ฑ ํฅ์
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ํํ์๋ฌผ๊ณตํ๋ถ,2020. 2. ์ด์ข
๋ฏผ.Model predictive control (MPC) is a receding horizon control which derives finite-horizon optimal solution for current state on-line by solving an optimal control problem. MPC has had a tremendous impact on both industrial and control research areas. There are several outstanding issues in MPC. MPC has to solve the optimization problem within a sampling period so that the reduction of on-line computational complexity is a one of the main research subject in MPC. Another major issue is model-plant mismatch due to the model based predictive approach so that offset-free tracking schemes by compensating model-plant mismatch or unmeasured disturbance has been developed. In this thesis, we focused on the optimality performance of move blocking which fixes the decision variables over arbitrary time intervals to reduce computational load for on-line optimization in MPC and disturbance estimator approach based offset-free MPC which is the most standardly used method to accomplish offset-free tracking in MPC. We improve the optimality performance of move blocked MPC in two ways. The first scheme provides a superior base sequence by linearly interpolating complementary base sequences, and the second scheme provides a proper time-varying blocking structure with semi-explicit approach. Moreover, we improve the optimality performance of offset-free MPC by exploiting learned model-plant mismatch compensating signal from estimated disturbance data. With the proposed schemes, we efficiently improve the optimality performance while guaranteeing the recursive feasibility and closed-loop stability.๋ชจ๋ธ์์ธก์ ์ด๋ ํ์ฌ ์์คํ
์ํ์ ๋ํ ์ ํ ๊ตฌ๊ฐ ์ต์ ํด๋ฅผ ๋์ถํ๋ ์จ๋ผ์ธ ์ด๋ ๊ตฌ๊ฐ ์ ์ด ๋ฐฉ์์ด๋ค. ๋ชจ๋ธ์์ธก์ ์ด๋ ํผ๋๋ฐฑ์ ํตํ ๊ณต์ ๋ํน์ฑ๊ณผ ์ ์ฝ ์กฐ๊ฑด์ ํจ๊ณผ์ ์ผ๋ก ๋ฐ์ํ๋ ์ฅ์ ์ผ๋ก ์ธํด ์ฐ์
๋ฐ ์ ์ด ์ฐ๊ตฌ ๋ถ์ผ์ ํฐ ์ํฅ์ ๋ฏธ์ณค๋ค. ์ด๋ฌํ ๋ชจ๋ธ์์ธก์ ์ด์๋ ๋ช ๊ฐ์ง ํด๊ฒฐ๋์ด์ผ ํ ๋ฌธ์ ๊ฐ ์๋ค. ๋ชจ๋ธ์์ธก์ ์ด์์๋ ์ํ๋ง ๊ธฐ๊ฐ ๋ด์ ์ต์ ํ ๋ฌธ์ ๋ฅผ ํ์ด๋ด์ผ ํ๊ธฐ ๋๋ฌธ์, ์จ๋ผ์ธ ๊ณ์ฐ ๋ณต์ก์ฑ์ ๊ฐ์๊ฐ ์ฃผ์ ์ฐ๊ตฌ ์ฃผ์ ์ค ํ๋๋ก ํ๋ฐํ ์ฐ๊ตฌ๋๊ณ ์๋ค. ๋ ๋ค๋ฅธ ์ฃผ์ ๋ฌธ์ ๋ ๋ชจ๋ธ์ ๊ธฐ๋ฐํ ์์ธก์ ์ด์ฉํ๋ ์ ๊ทผ ๋ฐฉ์์ผ๋ก ์ธํด ๋ชจ๋ธ-ํ๋ํธ ๋ถ์ผ์น๋ก ์ธํ ์ค์ฐจ๋ฅผ ํด๊ฒฐํด์ผ ํ๋ค๋ ์ ์ด๋ฉฐ, ๋ชจ๋ธ ํ๋ํธ ๋ถ์ผ์น ๋๋ ์ธก์ ๋์ง ์์ ์ธ๋์ ๋ณด์ํ์ฌ ์๋ฅํธ์ฐจ ์์ด ์ฐธ์กฐ์ ํธ๋ฅผ ์ถ์ ํ๋ ์ฐ๊ตฌ๊ฐ ํ๋ฐํ ์ด๋ฃจ์ด์ง๊ณ ์๋ค. ์ด ๋
ผ๋ฌธ์์๋ ๋ชจ๋ธ์์ธก์ ์ด์์์ ์จ๋ผ์ธ ์ต์ ํ๋ฅผ ์ํ ๊ณ์ฐ ๋ถํ๋ฅผ ์ค์ด๊ธฐ ์ํด ์์์ ์๊ฐ ๊ฐ๊ฒฉ์ ๊ฑธ์ณ ๊ฒฐ์ ๋ณ์๋ฅผ ๊ณ ์ ์ํค๋ ์ด๋ ๋ธ๋ก ์ ๋ต์ ์ต์ ์ฑ ํฅ์์ ์ค์ ์ ๋์์ผ๋ฉฐ, ๋ํ ์๋ฅํธ์ฐจ๋ฅผ ์ ๊ฑฐํ๊ธฐ ์ํด ๊ฐ์ฅ ํ์ค์ ์ผ๋ก ์ฌ์ฉ๋๋ ์ธ๋ ์ถ์ ๊ธฐ๋ฅผ ์ด์ฉํ ์๋ฅํธ์ฐจ-์ ๊ฑฐ ๋ชจ๋ธ์์ธก์ ์ด ๊ธฐ๋ฒ์ ์ต์ ์ฑ ํฅ์์ ์ค์ ์ ๋์๋ค. ์ด ๋
ผ๋ฌธ์์๋ ์ด๋ ๋ธ๋ก ๋ชจ๋ธ์์ธก์ ์ด์ ์ต์ ์ฑ๋ฅ์ ํฅ์์ํค๊ธฐ ์ํ ๋ ๊ฐ์ง ์ ๋ต์ ์ ์ํ๋ค. ์ฒซ ๋ฒ์งธ ์ ๋ต์ ์ด๋ ๋ธ๋ก ์ ๋ต์์ ์ผ๋ฐ์ ์ผ๋ก ๊ณ ์ ๋ ์ฑ๋ก ์ฌ์ฉ๋๋ ๊ธฐ๋ฐ ์ํ์ค๋ฅผ ์ํธ ๋ณด์์ ์ธ ๋ ๊ธฐ๋ฐ ์ํ์ค์ ์ ํ ๋ณด๊ฐ์ผ๋ก ๋์ฒดํจ์ผ๋ก์จ ๋ณด๋ค ์ฐ์ํ ๊ธฐ๋ฐ ์ํ์ค๋ฅผ ์ ๊ณตํ๋ฉฐ, ๋ ๋ฒ์งธ ์ ๋ต์ ์ค-๋ช
์์ ์ ๊ทผ๋ฒ์ ํ์ฉํ์ฌ ํ์ฌ ์์คํ
์ํ์ ์ ์ ํ ์๋ณ ๋ธ๋ก ๊ตฌ์กฐ๋ฅผ ์จ๋ผ์ธ์์ ์ ๊ณตํ๋ค. ๋ํ, ์๋ฅํธ์ฐจ-์ ๊ฑฐ ๋ชจ๋ธ์์ธก์ ์ด ๊ธฐ๋ฒ์ ์ต์ ์ฑ๋ฅ์ ํฅ์์ํค๊ธฐ ์ํด ์ถ์ ์ธ๋ ๋ฐ์ดํฐ๋ก๋ถํฐ ํ์ต๋ ๋ชจ๋ธ-ํ๋ํธ ๋ถ์ผ์น ๋ณด์ ์ ํธ๋ฅผ ์จ๋ผ์ธ์์ ์ด์ฉํ๋ ์ ๋ต์ ์ ์ํ์๋ค. ์ ์๋ ์ธ ๊ฐ์ง ๊ธฐ๋ฒ์ ํตํด ๋ชจ๋ธ์์ธก์ ์ด์ ๋ฐ๋ณต์ ์คํ๊ฐ๋ฅ์ฑ๊ณผ ํ์-๋ฃจํ ์์ ์ฑ์ ๋ณด์ฅํ๋ฉด์ ์ต์ ์ฑ๋ฅ์ ํจ์จ์ ์ผ๋ก ๊ฐ์ ํ์๋ค.1. Introduction 1
2. Move-blocked model predictive control with linear interpolation of base sequences 5
2.1 Introduction 5
2.2 Preliminaries 9
2.2.1 MPC formulation 9
2.2.2 Move blocking 12
2.2.3 Move blocked MPC (MBMPC) 15
2.3 Move blocking schemes 16
2.3.1 Previous solution based offset blocking 17
2.3.2 LQR solution based offset blocking 18
2.4 Interpolated solution based move blocking 20
2.4.1 Interpolated solution based MBMPC 20
2.4.2 QP formulation 26
2.5 Numerical examples 29
2.5.1 Example 1 (Feasible region) 30
2.5.2 Example 2 (Performance in regulation problem) 33
2.5.3 Example 3 (Performance in tracking problem) 36
3. Move-blocked model predictive control with time-varying blocking structure by semi-explicit approach 43
3.1 Introduction 43
3.2 Problem formulation 46
3.3 Move blocked MPC 48
3.3.1 Move blocking scheme 48
3.3.2 Implementation of move blocking 51
3.4 Semi-explicit approach for move blocked MPC 53
3.4.1 Off-line generation of critical region 56
3.4.2 On-line MPC scheme with critical region search 60
3.4.3 Property of semi-explicit move blocked MPC 62
3.5 Numerical examples 70
3.5.1 Example 1 (Regulation problem) 71
3.5.2 Example 2 (Tracking problem) 77
4. Model-plant mismatch learning offset-free model predictive control 83
4.1 Introduction 83
4.2 Offset-free MPC: Disturbance estimator approach 86
4.2.1 Preliminaries 86
4.2.2 Disturbance estimator and controller design 87
4.2.3 Offset-free tracking condition 89
4.3 Model-plant mismatch learning offset-free MPC 91
4.3.1 Model-plant mismatch learning 92
4.3.2 Application of learned model-plant mismatch 97
4.3.3 Robust asymptotic stability of model-plant mismatch learning offset-free MPC 102
4.4 Numerical example 117
4.4.1 System with random set-point 120
4.4.2 Transformed system 125
4.4.3 System with multiple random set-points 128
5. Concluding remarks 134
5.1 Move-blocked model predictive control with linear interpolation of base sequences 134
5.2 Move-blocked model predictive control with time-varying blocking structure by semi-explicit approach 135
5.3 Model-plant mismatch learning offset-free model predictive control 136
5.4 Conclusions 138
5.5 Future work 139
Bibliography 145Docto
Advances in the Optimization of Energy Systems and Machine Learning Hyperparameters
Intensifying public concern about climate change risks has accelerated the push for more tangible action in the transition toward low-carbon or carbon-neutral energy. Concurrently, the energy industry is also undergoing a digital transformation with the explosion in available data and computational power. To address these challenges, systematic decision-making strategies are necessary to analyze the vast array of technology options and information sources while navigating this energy transition. In this work, mathematical optimization is utilized to answer some of the outstanding issues around designing cleaner processes from resources such as natural gas and renewables, operating the logistics of these energy systems, and statistical modeling from data.
First, exploiting natural gas to produce lower emission liquid transportation fuels is investigated through an optimization-based process synthesis. This extends previous studies by incorporating chemical looping as an alternative syngas production method for the first time. Second, a similar process synthesis approach is implemented for the optimal design of a novel biomass-based process that coproduces ammonia and methanol, improving their production flexibility and profit margins.
Next, operational difficulties with solar and wind energies due to their temporal intermittency and uneven geographical distribution are tackled with a supply chain optimization model and a clustering decomposition algorithm. The former describes power generation through energy carriers (hydrogen-rich chemicals) connecting resource-dense rural areas to resource-deficient urban centers. Results show the potential of energy carriers for long-term storage. The latter is developed to identify the appropriate number of representative time periods for approximating an optimization problem with time series data, instead of using a full time horizon. This algorithm is applied to the simultaneous design and scheduling of a renewable power system with battery storage.
Finally, building machine learning models from data is commonly performed through k-fold cross-validation. From recasting this as a bilevel optimization, the exact solution to hyperparameter optimization is obtainable through parametric programming for machine learning models that are LP/QP. This extends previous results in statistics to a broader class of machine learning models
Design of multi-parametric NCO-tracking controllers for linear continuous-time systems
Process optimization for industrial applications aims to achieve performance enhancements while satisfying system constraints. A major challenge for any such method lies in the problem of uncertainty stemming from model mismatch and process disturbances. Classical approaches such as model predictive control usually handle the uncertainty by repeatedly solving the optimization problem on-line, which may prove a rather computationally demanding task nonetheless and cause serious delays for fast dynamic systems. Existing approaches for mitigating the on-line computational burden via off-line optimization include multi-parametric programming and NCO-tracking. Multi-parametric programming aims to generate a mapping of control strategies as a function of given parameters; whereas NCO-tracking involves tracking the necessary conditions of optimality (NCOs) based on a precomputed control switching structure, which enables a dynamic real-time optimization problem to be transferred into an on-line tracking problem using a feedback controller. A methodology, called multi-parametric (mp-)NCO-tracking is developed in this thesis, whereby multi-parametric dynamic optimization and NCO-tracking methods are combined into a unified framework.
An algorithm for the design of mp-NCO-tracking controllers for continuous-time, linear-quadratic optimal control problems is presented in Chapter 2. The off-line step defines the multi-parametric control structure mapped to given uncertain (measurable) parameters in terms of so-called critical regions and feedback laws. Specifically, each critical region corresponds to a unique control switching structure in terms of the sequence of active constraints. The on-line step involves determining the current critical region once the parameter value has been revealed, and then applying the corresponding feedback control laws in a receding horizon manner. The mp-NCO-tracking approach provides a means for relaxing the invariant switching structure assumption in NCO-tracking by constructing critical regions for various switching structures. Moreover, addressing the problem directly in continuous-time can potentially reduce the number of critical regions compared with standard multi-parametric programming based on a time discretization and a control vector parameterization. The methodology and its benefits are illustrated for a number of simple case studies.
To obtain the mathematical representation of the generally nonlinear critical regions, Chapter 3 investigates a machine learning model as a classifier, based on deep neural network. This feed-forward network is selected for its representational power as a universal approximator for arbitrary continuous functions. Here, the classifier takes the unknown parameter as input and maps the corresponding critical regions in terms of their switching structures. An algorithm for training the classifier is presented, which involves generating the training data set, setting up a neural network architecture, and applying optimization based training. By using a Softmax classifier in the output layer of the network, a normalized probability distribution is obtained, which consist of a vector with as many elements as the total number of critical regions, and each element representing the likelihood for a region to be the correct one. The classifier is conveniently embedded into the multi-parametric NCO-tracking controller for choosing the real-time switching structure in on-line control.
Lastly, a robustification of the mp-NCO-tracking methodology is developed in Chapter 4, where constraints are guaranteed to be satisfied under all possible uncertainty scenarios, which leads to a min-max formulation. A robust counterpart formulation of the multi-parametric dynamic optimization problem is presented, which considers both additive or multiplicative time-varying disturbances. The approach involves backing-off the path and terminal constraints of the linear-quadratic optimal control problem based on a worst-case uncertainty propagation computed using either interval or ellipsoidal reachability tubes. The uncertain system state is decomposed into a nominal reference and a perturbed component, and a convex enclosure of the reachable set for the perturbed component is precomputed via some auxiliary differential equations. Conservative constraint back-offs are obtained from the precomputed reachability tubes, which enables the controller design procedure in the nominal case to be directly applied for the robust control problem, and to retain the same computational effort as in the nominal case. These developments are demonstrated by numerical case studies, and ways of extending this approach to more general, nonlinear optimal control problems are discussed in Chapter 5.Open Acces
๋ชจ๋ธ๊ธฐ๋ฐ๊ฐํํ์ต์์ด์ฉํ๊ณต์ ์ ์ด๋ฐ์ต์ ํ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ํํ์๋ฌผ๊ณตํ๋ถ,2020. 2. ์ด์ข
๋ฏผ.์์ฐจ์ ์์ฌ๊ฒฐ์ ๋ฌธ์ ๋ ๊ณต์ ์ต์ ํ์ ํต์ฌ ๋ถ์ผ ์ค ํ๋์ด๋ค. ์ด ๋ฌธ์ ์ ์์น์ ํด๋ฒ ์ค ๊ฐ์ฅ ๋ง์ด ์ฌ์ฉ๋๋ ๊ฒ์ ์๋ฐฉํฅ์ผ๋ก ์๋ํ๋ ์ง์ ๋ฒ (direct optimization) ๋ฐฉ๋ฒ์ด์ง๋ง, ๋ช๊ฐ์ง ํ๊ณ์ ์ ์ง๋๊ณ ์๋ค. ์ต์ ํด๋ open-loop์ ํํ๋ฅผ ์ง๋๊ณ ์์ผ๋ฉฐ, ๋ถํ์ ์ฑ์ด ์กด์ฌํ ๋ ๋ฐฉ๋ฒ๋ก ์ ์์น์ ๋ณต์ก๋๊ฐ ์ฆ๊ฐํ๋ค๋ ๊ฒ์ด๋ค. ๋์ ๊ณํ๋ฒ (dynamic programming) ์ ์ด๋ฌํ ํ๊ณ์ ์ ๊ทผ์์ ์ผ๋ก ํด๊ฒฐํ ์ ์์ง๋ง, ๊ทธ๋์ ๊ณต์ ์ต์ ํ์ ์ ๊ทน์ ์ผ๋ก ๊ณ ๋ ค๋์ง ์์๋ ์ด์ ๋ ๋์ ๊ณํ๋ฒ์ ๊ฒฐ๊ณผ๋ก ์ป์ด์ง ํธ๋ฏธ๋ถ ๋ฐฉ์ ์ ๋ฌธ์ ๊ฐ ์ ํ์ฐจ์ ๋ฒกํฐ๊ณต๊ฐ์ด ์๋ ๋ฌดํ์ฐจ์์ ํจ์๊ณต๊ฐ์์ ๋ค๋ฃจ์ด์ง๊ธฐ ๋๋ฌธ์ด๋ค. ์์ ์ฐจ์์ ์ ์ฃผ๋ผ๊ณ ๋ถ๋ฆฌ๋ ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํ ํ๊ฐ์ง ๋ฐฉ๋ฒ์ผ๋ก์, ์ํ์ ์ด์ฉํ ๊ทผ์ฌ์ ํด๋ฒ์ ์ด์ ์ ๋ ๊ฐํํ์ต ๋ฐฉ๋ฒ๋ก ์ด ์ฐ๊ตฌ๋์ด ์๋ค. ๋ณธ ํ์๋
ผ๋ฌธ์์๋ ๊ฐํํ์ต ๋ฐฉ๋ฒ๋ก ์ค, ๊ณต์ ์ต์ ํ์ ์ ํฉํ ๋ชจ๋ธ ๊ธฐ๋ฐ ๊ฐํํ์ต์ ๋ํด ์ฐ๊ตฌํ๊ณ , ์ด๋ฅผ ๊ณต์ ์ต์ ํ์ ๋ํ์ ์ธ ์ธ๊ฐ์ง ์์ฐจ์ ์์ฌ๊ฒฐ์ ๋ฌธ์ ์ธ ์ค์ผ์ค๋ง, ์์๋จ๊ณ ์ต์ ํ, ํ์๋จ๊ณ ์ ์ด์ ์ ์ฉํ๋ ๊ฒ์ ๋ชฉํ๋ก ํ๋ค. ์ด ๋ฌธ์ ๋ค์ ๊ฐ๊ฐ ๋ถ๋ถ๊ด์ธก ๋ง๋ฅด์ฝํ ๊ฒฐ์ ๊ณผ์ (partially observable Markov decision process), ์ ์ด-์ํ ์ํ๊ณต๊ฐ ๋ชจ๋ธ (control-affine state space model), ์ผ๋ฐ์ ์ํ๊ณต๊ฐ ๋ชจ๋ธ (general state space model)๋ก ๋ชจ๋ธ๋ง๋๋ค. ๋ํ ๊ฐ ์์น์ ๋ชจ๋ธ๋ค์ ํด๊ฒฐํ๊ธฐ ์ํด point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP)๋ก ๋ถ๋ฆฌ๋ ๋ฐฉ๋ฒ๋ค์ ๋์
ํ์๋ค.
์ด ์ธ๊ฐ์ง ๋ฌธ์ ์ ๋ฐฉ๋ฒ๋ก ์์ ์ ์๋ ํน์ง๋ค์ ๋ค์๊ณผ ๊ฐ์ด ์์ฝํ ์ ์๋ค: ์ฒซ๋ฒ์งธ๋ก, ์ค์ผ์ค๋ง ๋ฌธ์ ์์ closed-loop ํผ๋๋ฐฑ ํํ์ ํด๋ฅผ ์ ์ํ ์ ์์๋ค. ์ด๋ ๊ธฐ์กด ์ง์ ๋ฒ์์ ์ป์ ์ ์์๋ ํํ๋ก์, ๊ฐํํ์ต์ ๊ฐ์ ์ ๋ถ๊ฐํ ์ ์๋ ์ธก๋ฉด์ด๋ผ ์๊ฐํ ์ ์๋ค. ๋๋ฒ์งธ๋ก ๊ณ ๋ คํ ํ์๋จ๊ณ ์ ์ด ๋ฌธ์ ์์, ๋์ ๊ณํ๋ฒ์ ๋ฌดํ์ฐจ์ ํจ์๊ณต๊ฐ ์ต์ ํ ๋ฌธ์ ๋ฅผ ํจ์ ๊ทผ์ฌ ๋ฐฉ๋ฒ์ ํตํด ์ ํ์ฐจ์ ๋ฒกํฐ๊ณต๊ฐ ์ต์ ํ ๋ฌธ์ ๋ก ์ํํ ์ ์๋ ๋ฐฉ๋ฒ์ ๋์
ํ์๋ค. ํนํ, ์ฌ์ธต ์ ๊ฒฝ๋ง์ ์ด์ฉํ์ฌ ํจ์ ๊ทผ์ฌ๋ฅผ ํ์๊ณ , ์ด๋ ๋ฐ์ํ๋ ์ฌ๋ฌ๊ฐ์ง ์ฅ์ ๊ณผ ์๋ ด ํด์ ๊ฒฐ๊ณผ๋ฅผ ๋ณธ ํ์๋
ผ๋ฌธ์ ์ค์๋ค. ๋ง์ง๋ง ๋ฌธ์ ๋ ์์ ๋จ๊ณ ๋์ ์ต์ ํ ๋ฌธ์ ์ด๋ค. ๋์ ์ต์ ํ ๋ฌธ์ ์์ ๋ฐ์ํ๋ ์ ์ฝ ์กฐ๊ฑดํ์์ ๊ฐํํ์ต์ ์ํํ๊ธฐ ์ํด, ์-์๋ ๋ฏธ๋ถ๋์ ๊ณํ๋ฒ (primal-dual DDP) ๋ฐฉ๋ฒ๋ก ์ ์๋ก ์ ์ํ์๋ค. ์์ ์ค๋ช
ํ ์ธ๊ฐ์ง ๋ฌธ์ ์ ์ ์ฉ๋ ๋ฐฉ๋ฒ๋ก ์ ๊ฒ์ฆํ๊ณ , ๋์ ๊ณํ๋ฒ์ด ์ง์ ๋ฒ์ ๋น๊ฒฌ๋ ์ ์๋ ๋ฐฉ๋ฒ๋ก ์ด๋ผ๋ ์ฃผ์ฅ์ ์ค์ฆํ๊ธฐ ์ํด ์ฌ๋ฌ๊ฐ์ง ๊ณต์ ์์ ๋ฅผ ์ค์๋ค.Sequential decision making problem is a crucial technology for plant-wide process optimization. While the dominant numerical method is the forward-in-time direct optimization, it is limited to the open-loop solution and has difficulty in considering the uncertainty. Dynamic programming method complements the limitations, nonetheless associated functional optimization suffers from the curse-of-dimensionality. The sample-based approach for approximating the dynamic programming, referred to as reinforcement learning (RL) can resolve the issue and investigated throughout this thesis. The method that accounts for the system model explicitly is in particular interest. The model-based RL is exploited to solve the three representative sequential decision making problems; scheduling, supervisory optimization, and regulatory control. The problems are formulated with partially observable Markov decision process, control-affine state space model, and general state space model, and associated model-based RL algorithms are point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP), respectively.
The contribution for each problem can be written as follows: First, for the scheduling problem, we developed the closed-loop feedback scheme which highlights the strength compared to the direct optimization method. In the second case, the regulatory control problem is tackled by the function approximation method which relaxes the functional optimization to the finite dimensional vector space optimization. Deep neural networks (DNNs) is utilized as the approximator, and the advantages as well as the convergence analysis is performed in the thesis. Finally, for the supervisory optimization problem, we developed the novel constraint RL framework that uses the primal-dual DDP method. Various illustrative examples are demonstrated to validate the developed model-based RL algorithms and to support the thesis statement on which the dynamic programming method can be considered as a complementary method for direct optimization method.1. Introduction 1
1.1 Motivation and previous work 1
1.2 Statement of contributions 9
1.3 Outline of the thesis 11
2. Background and preliminaries 13
2.1 Optimization problem formulation and the principle of optimality 13
2.1.1 Markov decision process 15
2.1.2 State space model 19
2.2 Overview of the developed RL algorithms 28
2.2.1 Point based value iteration 28
2.2.2 Globalized dual heuristic programming 29
2.2.3 Differential dynamic programming 32
3. A POMDP framework for integrated scheduling of infrastructure maintenance and inspection 35
3.1 Introduction 35
3.2 POMDP solution algorithm 38
3.2.1 General point based value iteration 38
3.2.2 GapMin algorithm 46
3.2.3 Receding horizon POMDP 49
3.3 Problem formulation for infrastructure scheduling 54
3.3.1 State 56
3.3.2 Maintenance and inspection actions 57
3.3.3 State transition function 61
3.3.4 Cost function 67
3.3.5 Observation set and observation function 68
3.3.6 State augmentation 69
3.4 Illustrative example and simulation result 69
3.4.1 Structural point for the analysis of a high dimensional belief space 72
3.4.2 Infinite horizon policy under the natural deterioration process 72
3.4.3 Receding horizon POMDP 79
3.4.4 Validation of POMDP policy via Monte Carlo simulation 83
4. A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system 88
4.1 Introduction 88
4.2 Function approximation and learning with deep neural networks 91
4.2.1 GDHP with a function approximator 91
4.2.2 Stable learning of DNNs 96
4.2.3 Overall algorithm 103
4.3 Results and discussions 107
4.3.1 Example 1: Semi-batch reactor 107
4.3.2 Example 2: Diffusion-Convection-Reaction (DCR) process 120
5. Convergence analysis of the model-based deep reinforcement learning for optimal control of nonlinear control-affine system 126
5.1 Introduction 126
5.2 Convergence proof of globalized dual heuristic programming (GDHP) 128
5.3 Function approximation with deep neural networks 137
5.3.1 Function approximation and gradient descent learning 137
5.3.2 Forward and backward propagations of DNNs 139
5.4 Convergence analysis in the deep neural networks space 141
5.4.1 Lyapunov analysis of the neural network parameter errors 141
5.4.2 Lyapunov analysis of the closed-loop stability 150
5.4.3 Overall Lyapunov function 152
5.5 Simulation results and discussions 157
5.5.1 System description 158
5.5.2 Algorithmic settings 160
5.5.3 Control result 161
6. Primal-dual differential dynamic programming for constrained dynamic optimization of continuous system 170
6.1 Introduction 170
6.2 Primal-dual differential dynamic programming for constrained dynamic optimization 172
6.2.1 Augmented Lagrangian method 172
6.2.2 Primal-dual differential dynamic programming algorithm 175
6.2.3 Overall algorithm 179
6.3 Results and discussions 179
7. Concluding remarks 186
7.1 Summary of the contributions 187
7.2 Future works 189
Bibliography 192Docto
- โฆ