303 research outputs found
Natural Language Syntax Complies with the Free-Energy Principle
Natural language syntax yields an unbounded array of hierarchically
structured expressions. We claim that these are used in the service of active
inference in accord with the free-energy principle (FEP). While conceptual
advances alongside modelling and simulation work have attempted to connect
speech segmentation and linguistic communication with the FEP, we extend this
program to the underlying computations responsible for generating syntactic
objects. We argue that recently proposed principles of economy in language
design - such as "minimal search" criteria from theoretical syntax - adhere to
the FEP. This affords a greater degree of explanatory power to the FEP - with
respect to higher language functions - and offers linguistics a grounding in
first principles with respect to computability. We show how both tree-geometric
depth and a Kolmogorov complexity estimate (recruiting a Lempel-Ziv compression
algorithm) can be used to accurately predict legal operations on syntactic
workspaces, directly in line with formulations of variational free energy
minimization. This is used to motivate a general principle of language design
that we term Turing-Chomsky Compression (TCC). We use TCC to align concerns of
linguists with the normative account of self-organization furnished by the FEP,
by marshalling evidence from theoretical linguistics and psycholinguistics to
ground core principles of efficient syntactic computation within active
inference
Online Optimization with Lookahead
The main contributions of this thesis consist of the development of a systematic groundwork for comprehensive performance evaluation of algorithms in online optimization with lookahead and the subsequent validation of the presented approaches in theoretical analysis and computational experiments
Proceedings of Mathsport international 2017 conference
Proceedings of MathSport International 2017 Conference, held in the Botanical Garden of the University of Padua, June 26-28, 2017.
MathSport International organizes biennial conferences dedicated to all topics where mathematics and sport meet.
Topics include: performance measures, optimization of sports performance, statistics and probability models, mathematical and physical models in sports, competitive strategies, statistics and probability match outcome models, optimal tournament design and scheduling, decision support systems, analysis of rules and adjudication, econometrics in sport, analysis of sporting technologies, financial valuation in sport, e-sports (gaming), betting and sports
Operational Research: Methods and Applications
Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the various subdomains of the field. The second offers a wide-ranging list of areas where Operational Research has been applied. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners. The entries within the methods and applications sections are presented in alphabetical order. The authors dedicate this paper to the 2023 Turkey/Syria earthquake victims. We sincerely hope that advances in OR will play a role towards minimising the pain and suffering caused by this and future catastrophes
๋ชจ๋ธ๊ธฐ๋ฐ๊ฐํํ์ต์์ด์ฉํ๊ณต์ ์ ์ด๋ฐ์ต์ ํ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ํํ์๋ฌผ๊ณตํ๋ถ,2020. 2. ์ด์ข
๋ฏผ.์์ฐจ์ ์์ฌ๊ฒฐ์ ๋ฌธ์ ๋ ๊ณต์ ์ต์ ํ์ ํต์ฌ ๋ถ์ผ ์ค ํ๋์ด๋ค. ์ด ๋ฌธ์ ์ ์์น์ ํด๋ฒ ์ค ๊ฐ์ฅ ๋ง์ด ์ฌ์ฉ๋๋ ๊ฒ์ ์๋ฐฉํฅ์ผ๋ก ์๋ํ๋ ์ง์ ๋ฒ (direct optimization) ๋ฐฉ๋ฒ์ด์ง๋ง, ๋ช๊ฐ์ง ํ๊ณ์ ์ ์ง๋๊ณ ์๋ค. ์ต์ ํด๋ open-loop์ ํํ๋ฅผ ์ง๋๊ณ ์์ผ๋ฉฐ, ๋ถํ์ ์ฑ์ด ์กด์ฌํ ๋ ๋ฐฉ๋ฒ๋ก ์ ์์น์ ๋ณต์ก๋๊ฐ ์ฆ๊ฐํ๋ค๋ ๊ฒ์ด๋ค. ๋์ ๊ณํ๋ฒ (dynamic programming) ์ ์ด๋ฌํ ํ๊ณ์ ์ ๊ทผ์์ ์ผ๋ก ํด๊ฒฐํ ์ ์์ง๋ง, ๊ทธ๋์ ๊ณต์ ์ต์ ํ์ ์ ๊ทน์ ์ผ๋ก ๊ณ ๋ ค๋์ง ์์๋ ์ด์ ๋ ๋์ ๊ณํ๋ฒ์ ๊ฒฐ๊ณผ๋ก ์ป์ด์ง ํธ๋ฏธ๋ถ ๋ฐฉ์ ์ ๋ฌธ์ ๊ฐ ์ ํ์ฐจ์ ๋ฒกํฐ๊ณต๊ฐ์ด ์๋ ๋ฌดํ์ฐจ์์ ํจ์๊ณต๊ฐ์์ ๋ค๋ฃจ์ด์ง๊ธฐ ๋๋ฌธ์ด๋ค. ์์ ์ฐจ์์ ์ ์ฃผ๋ผ๊ณ ๋ถ๋ฆฌ๋ ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํ ํ๊ฐ์ง ๋ฐฉ๋ฒ์ผ๋ก์, ์ํ์ ์ด์ฉํ ๊ทผ์ฌ์ ํด๋ฒ์ ์ด์ ์ ๋ ๊ฐํํ์ต ๋ฐฉ๋ฒ๋ก ์ด ์ฐ๊ตฌ๋์ด ์๋ค. ๋ณธ ํ์๋
ผ๋ฌธ์์๋ ๊ฐํํ์ต ๋ฐฉ๋ฒ๋ก ์ค, ๊ณต์ ์ต์ ํ์ ์ ํฉํ ๋ชจ๋ธ ๊ธฐ๋ฐ ๊ฐํํ์ต์ ๋ํด ์ฐ๊ตฌํ๊ณ , ์ด๋ฅผ ๊ณต์ ์ต์ ํ์ ๋ํ์ ์ธ ์ธ๊ฐ์ง ์์ฐจ์ ์์ฌ๊ฒฐ์ ๋ฌธ์ ์ธ ์ค์ผ์ค๋ง, ์์๋จ๊ณ ์ต์ ํ, ํ์๋จ๊ณ ์ ์ด์ ์ ์ฉํ๋ ๊ฒ์ ๋ชฉํ๋ก ํ๋ค. ์ด ๋ฌธ์ ๋ค์ ๊ฐ๊ฐ ๋ถ๋ถ๊ด์ธก ๋ง๋ฅด์ฝํ ๊ฒฐ์ ๊ณผ์ (partially observable Markov decision process), ์ ์ด-์ํ ์ํ๊ณต๊ฐ ๋ชจ๋ธ (control-affine state space model), ์ผ๋ฐ์ ์ํ๊ณต๊ฐ ๋ชจ๋ธ (general state space model)๋ก ๋ชจ๋ธ๋ง๋๋ค. ๋ํ ๊ฐ ์์น์ ๋ชจ๋ธ๋ค์ ํด๊ฒฐํ๊ธฐ ์ํด point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP)๋ก ๋ถ๋ฆฌ๋ ๋ฐฉ๋ฒ๋ค์ ๋์
ํ์๋ค.
์ด ์ธ๊ฐ์ง ๋ฌธ์ ์ ๋ฐฉ๋ฒ๋ก ์์ ์ ์๋ ํน์ง๋ค์ ๋ค์๊ณผ ๊ฐ์ด ์์ฝํ ์ ์๋ค: ์ฒซ๋ฒ์งธ๋ก, ์ค์ผ์ค๋ง ๋ฌธ์ ์์ closed-loop ํผ๋๋ฐฑ ํํ์ ํด๋ฅผ ์ ์ํ ์ ์์๋ค. ์ด๋ ๊ธฐ์กด ์ง์ ๋ฒ์์ ์ป์ ์ ์์๋ ํํ๋ก์, ๊ฐํํ์ต์ ๊ฐ์ ์ ๋ถ๊ฐํ ์ ์๋ ์ธก๋ฉด์ด๋ผ ์๊ฐํ ์ ์๋ค. ๋๋ฒ์งธ๋ก ๊ณ ๋ คํ ํ์๋จ๊ณ ์ ์ด ๋ฌธ์ ์์, ๋์ ๊ณํ๋ฒ์ ๋ฌดํ์ฐจ์ ํจ์๊ณต๊ฐ ์ต์ ํ ๋ฌธ์ ๋ฅผ ํจ์ ๊ทผ์ฌ ๋ฐฉ๋ฒ์ ํตํด ์ ํ์ฐจ์ ๋ฒกํฐ๊ณต๊ฐ ์ต์ ํ ๋ฌธ์ ๋ก ์ํํ ์ ์๋ ๋ฐฉ๋ฒ์ ๋์
ํ์๋ค. ํนํ, ์ฌ์ธต ์ ๊ฒฝ๋ง์ ์ด์ฉํ์ฌ ํจ์ ๊ทผ์ฌ๋ฅผ ํ์๊ณ , ์ด๋ ๋ฐ์ํ๋ ์ฌ๋ฌ๊ฐ์ง ์ฅ์ ๊ณผ ์๋ ด ํด์ ๊ฒฐ๊ณผ๋ฅผ ๋ณธ ํ์๋
ผ๋ฌธ์ ์ค์๋ค. ๋ง์ง๋ง ๋ฌธ์ ๋ ์์ ๋จ๊ณ ๋์ ์ต์ ํ ๋ฌธ์ ์ด๋ค. ๋์ ์ต์ ํ ๋ฌธ์ ์์ ๋ฐ์ํ๋ ์ ์ฝ ์กฐ๊ฑดํ์์ ๊ฐํํ์ต์ ์ํํ๊ธฐ ์ํด, ์-์๋ ๋ฏธ๋ถ๋์ ๊ณํ๋ฒ (primal-dual DDP) ๋ฐฉ๋ฒ๋ก ์ ์๋ก ์ ์ํ์๋ค. ์์ ์ค๋ช
ํ ์ธ๊ฐ์ง ๋ฌธ์ ์ ์ ์ฉ๋ ๋ฐฉ๋ฒ๋ก ์ ๊ฒ์ฆํ๊ณ , ๋์ ๊ณํ๋ฒ์ด ์ง์ ๋ฒ์ ๋น๊ฒฌ๋ ์ ์๋ ๋ฐฉ๋ฒ๋ก ์ด๋ผ๋ ์ฃผ์ฅ์ ์ค์ฆํ๊ธฐ ์ํด ์ฌ๋ฌ๊ฐ์ง ๊ณต์ ์์ ๋ฅผ ์ค์๋ค.Sequential decision making problem is a crucial technology for plant-wide process optimization. While the dominant numerical method is the forward-in-time direct optimization, it is limited to the open-loop solution and has difficulty in considering the uncertainty. Dynamic programming method complements the limitations, nonetheless associated functional optimization suffers from the curse-of-dimensionality. The sample-based approach for approximating the dynamic programming, referred to as reinforcement learning (RL) can resolve the issue and investigated throughout this thesis. The method that accounts for the system model explicitly is in particular interest. The model-based RL is exploited to solve the three representative sequential decision making problems; scheduling, supervisory optimization, and regulatory control. The problems are formulated with partially observable Markov decision process, control-affine state space model, and general state space model, and associated model-based RL algorithms are point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP), respectively.
The contribution for each problem can be written as follows: First, for the scheduling problem, we developed the closed-loop feedback scheme which highlights the strength compared to the direct optimization method. In the second case, the regulatory control problem is tackled by the function approximation method which relaxes the functional optimization to the finite dimensional vector space optimization. Deep neural networks (DNNs) is utilized as the approximator, and the advantages as well as the convergence analysis is performed in the thesis. Finally, for the supervisory optimization problem, we developed the novel constraint RL framework that uses the primal-dual DDP method. Various illustrative examples are demonstrated to validate the developed model-based RL algorithms and to support the thesis statement on which the dynamic programming method can be considered as a complementary method for direct optimization method.1. Introduction 1
1.1 Motivation and previous work 1
1.2 Statement of contributions 9
1.3 Outline of the thesis 11
2. Background and preliminaries 13
2.1 Optimization problem formulation and the principle of optimality 13
2.1.1 Markov decision process 15
2.1.2 State space model 19
2.2 Overview of the developed RL algorithms 28
2.2.1 Point based value iteration 28
2.2.2 Globalized dual heuristic programming 29
2.2.3 Differential dynamic programming 32
3. A POMDP framework for integrated scheduling of infrastructure maintenance and inspection 35
3.1 Introduction 35
3.2 POMDP solution algorithm 38
3.2.1 General point based value iteration 38
3.2.2 GapMin algorithm 46
3.2.3 Receding horizon POMDP 49
3.3 Problem formulation for infrastructure scheduling 54
3.3.1 State 56
3.3.2 Maintenance and inspection actions 57
3.3.3 State transition function 61
3.3.4 Cost function 67
3.3.5 Observation set and observation function 68
3.3.6 State augmentation 69
3.4 Illustrative example and simulation result 69
3.4.1 Structural point for the analysis of a high dimensional belief space 72
3.4.2 Infinite horizon policy under the natural deterioration process 72
3.4.3 Receding horizon POMDP 79
3.4.4 Validation of POMDP policy via Monte Carlo simulation 83
4. A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system 88
4.1 Introduction 88
4.2 Function approximation and learning with deep neural networks 91
4.2.1 GDHP with a function approximator 91
4.2.2 Stable learning of DNNs 96
4.2.3 Overall algorithm 103
4.3 Results and discussions 107
4.3.1 Example 1: Semi-batch reactor 107
4.3.2 Example 2: Diffusion-Convection-Reaction (DCR) process 120
5. Convergence analysis of the model-based deep reinforcement learning for optimal control of nonlinear control-affine system 126
5.1 Introduction 126
5.2 Convergence proof of globalized dual heuristic programming (GDHP) 128
5.3 Function approximation with deep neural networks 137
5.3.1 Function approximation and gradient descent learning 137
5.3.2 Forward and backward propagations of DNNs 139
5.4 Convergence analysis in the deep neural networks space 141
5.4.1 Lyapunov analysis of the neural network parameter errors 141
5.4.2 Lyapunov analysis of the closed-loop stability 150
5.4.3 Overall Lyapunov function 152
5.5 Simulation results and discussions 157
5.5.1 System description 158
5.5.2 Algorithmic settings 160
5.5.3 Control result 161
6. Primal-dual differential dynamic programming for constrained dynamic optimization of continuous system 170
6.1 Introduction 170
6.2 Primal-dual differential dynamic programming for constrained dynamic optimization 172
6.2.1 Augmented Lagrangian method 172
6.2.2 Primal-dual differential dynamic programming algorithm 175
6.2.3 Overall algorithm 179
6.3 Results and discussions 179
7. Concluding remarks 186
7.1 Summary of the contributions 187
7.2 Future works 189
Bibliography 192Docto
- โฆ