1,011 research outputs found

    Markov chains and optimality of the Hamiltonian cycle

    Get PDF
    We consider the Hamiltonian cycle problem (HCP) embedded in a controlled Markov decision process. In this setting, HCP reduces to an optimization problem on a set of Markov chains corresponding to a given graph. We prove that Hamiltonian cycles are minimizers for the trace of the fundamental matrix on a set of all stochastic transition matrices. In case of doubly stochastic matrices with symmetric linear perturbation, we show that Hamiltonian cycles minimize a diagonal element of a fundamental matrix for all admissible values of the perturbation parameter. In contrast to the previous work on this topic, our arguments are primarily based on probabilistic rather than algebraic methods

    Hamiltonian cycles and subsets of discounted occupational measures

    Full text link
    We study a certain polytope arising from embedding the Hamiltonian cycle problem in a discounted Markov decision process. The Hamiltonian cycle problem can be reduced to finding particular extreme points of a certain polytope associated with the input graph. This polytope is a subset of the space of discounted occupational measures. We characterize the feasible bases of the polytope for a general input graph GG, and determine the expected numbers of different types of feasible bases when the underlying graph is random. We utilize these results to demonstrate that augmenting certain additional constraints to reduce the polyhedral domain can eliminate a large number of feasible bases that do not correspond to Hamiltonian cycles. Finally, we develop a random walk algorithm on the feasible bases of the reduced polytope and present some numerical results. We conclude with a conjecture on the feasible bases of the reduced polytope.Comment: revised based on referees comment

    Advances in Branch-and-Fix methods to solve the Hamiltonian cycle problem in manufacturing optimization

    Get PDF
    159 p.Esta tesis parte del problema de la optimizaciรณn de la ruta de la herramienta donde se contribuye con unsistema de soporte para la toma de decisiones que genera rutas รณptimas en la tecnologรญa de FabricaciรณnAditiva. Esta contribuciรณn sirve como punto de partida o inspiraciรณn para analizar el problema del cicloHamiltoniano (HCP). El HCP consiste en visitar todos los vรฉrtices de un grafo dado una รบnica vez odeterminar que dicho ciclo no existe. Muchos de los mรฉtodos propuestos en la literatura sirven paragrafos no dirigidos y los que se enfocan en los grafos dirigidos no han sido implementados ni testeados.Uno de los mรฉtodos para resolver el problema es el Branch-and-Fix (BF), un mรฉtodo exacto que utiliza latranformaciรณn del HCP a un problema continuo. El BF es un algoritmo de ramificaciรณn que consiste enconstruir un รกrbol de decisiรณn donde en cada vรฉrtice dos problemas lineales son resueltos. Este mรฉtodo hasido testeado en grafos de tamaรฑo pequeรฑo y por ello, no se ha estudiado en profundidad las limitacionesque puede presentar. Por ello, en esta tesis se proponen cuatro contribuciones metodolรณgicasrelacionadas con el HCP y el BF: 1) mejorar la enficiencia del BF en diferentes aspectos, 2) proponer unmรฉtodo de ramificaciรณn global, 3) proponer un mรฉtodo del BF colapsado, 4) extender el HCP a unescenario multi-objetivo y proponer un mรฉtodo para resolverlo

    Computational methods for finding long simple cycles in complex networks

    Get PDF
    ยฉ 2017 Elsevier B.V. Detection of long simple cycles in real-world complex networks finds many applications in layout algorithms, information flow modelling, as well as in bioinformatics. In this paper, we propose two computational methods for finding long cycles in real-world networks. The first method is an exact approach based on our own integer linear programming formulation of the problem and a data mining pipeline. This pipeline ensures that the problem is solved as a sequence of integer linear programs. The second method is a multi-start local search heuristic, which combines an initial construction of a long cycle using depth-first search with four different perturbation operators. Our experimental results are presented for social network samples, graphs studied in the network science field, graphs from DIMACS series, and protein-protein interaction networks. These results show that our formulation leads to a significantly more efficient exact approach to solve the problem than a previous formulation. For 14 out of 22 networks, we have found the optimal solutions. The potential of heuristics in this problem is also demonstrated, especially in the context of large-scale problem instances

    ๋ชจ๋ธ๊ธฐ๋ฐ˜๊ฐ•ํ™”ํ•™์Šต์„์ด์šฉํ•œ๊ณต์ •์ œ์–ด๋ฐ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ํ™”ํ•™์ƒ๋ฌผ๊ณตํ•™๋ถ€,2020. 2. ์ด์ข…๋ฏผ.์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ๋Š” ๊ณต์ • ์ตœ์ ํ™”์˜ ํ•ต์‹ฌ ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ด ๋ฌธ์ œ์˜ ์ˆ˜์น˜์  ํ•ด๋ฒ• ์ค‘ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ์€ ์ˆœ๋ฐฉํ–ฅ์œผ๋กœ ์ž‘๋™ํ•˜๋Š” ์ง์ ‘๋ฒ• (direct optimization) ๋ฐฉ๋ฒ•์ด์ง€๋งŒ, ๋ช‡๊ฐ€์ง€ ํ•œ๊ณ„์ ์„ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ตœ์ ํ•ด๋Š” open-loop์˜ ํ˜•ํƒœ๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ์œผ๋ฉฐ, ๋ถˆํ™•์ •์„ฑ์ด ์กด์žฌํ• ๋•Œ ๋ฐฉ๋ฒ•๋ก ์˜ ์ˆ˜์น˜์  ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๋™์  ๊ณ„ํš๋ฒ• (dynamic programming) ์€ ์ด๋Ÿฌํ•œ ํ•œ๊ณ„์ ์„ ๊ทผ์›์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ทธ๋™์•ˆ ๊ณต์ • ์ตœ์ ํ™”์— ์ ๊ทน์ ์œผ๋กœ ๊ณ ๋ ค๋˜์ง€ ์•Š์•˜๋˜ ์ด์œ ๋Š” ๋™์  ๊ณ„ํš๋ฒ•์˜ ๊ฒฐ๊ณผ๋กœ ์–ป์–ด์ง„ ํŽธ๋ฏธ๋ถ„ ๋ฐฉ์ •์‹ ๋ฌธ์ œ๊ฐ€ ์œ ํ•œ์ฐจ์› ๋ฒกํ„ฐ๊ณต๊ฐ„์ด ์•„๋‹Œ ๋ฌดํ•œ์ฐจ์›์˜ ํ•จ์ˆ˜๊ณต๊ฐ„์—์„œ ๋‹ค๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์†Œ์œ„ ์ฐจ์›์˜ ์ €์ฃผ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ•œ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ์„œ, ์ƒ˜ํ”Œ์„ ์ด์šฉํ•œ ๊ทผ์‚ฌ์  ํ•ด๋ฒ•์— ์ดˆ์ ์„ ๋‘” ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์ด ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ•ํ™”ํ•™์Šต ๋ฐฉ๋ฒ•๋ก  ์ค‘, ๊ณต์ • ์ตœ์ ํ™”์— ์ ํ•ฉํ•œ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•˜๊ณ , ์ด๋ฅผ ๊ณต์ • ์ตœ์ ํ™”์˜ ๋Œ€ํ‘œ์ ์ธ ์„ธ๊ฐ€์ง€ ์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ์ธ ์Šค์ผ€์ค„๋ง, ์ƒ์œ„๋‹จ๊ณ„ ์ตœ์ ํ™”, ํ•˜์œ„๋‹จ๊ณ„ ์ œ์–ด์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ๋ฌธ์ œ๋“ค์€ ๊ฐ๊ฐ ๋ถ€๋ถ„๊ด€์ธก ๋งˆ๋ฅด์ฝ”ํ”„ ๊ฒฐ์ • ๊ณผ์ • (partially observable Markov decision process), ์ œ์–ด-์•„ํ•€ ์ƒํƒœ๊ณต๊ฐ„ ๋ชจ๋ธ (control-affine state space model), ์ผ๋ฐ˜์  ์ƒํƒœ๊ณต๊ฐ„ ๋ชจ๋ธ (general state space model)๋กœ ๋ชจ๋ธ๋ง๋œ๋‹ค. ๋˜ํ•œ ๊ฐ ์ˆ˜์น˜์  ๋ชจ๋ธ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP)๋กœ ๋ถˆ๋ฆฌ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ๋„์ž…ํ•˜์˜€๋‹ค. ์ด ์„ธ๊ฐ€์ง€ ๋ฌธ์ œ์™€ ๋ฐฉ๋ฒ•๋ก ์—์„œ ์ œ์‹œ๋œ ํŠน์ง•๋“ค์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์š”์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค: ์ฒซ๋ฒˆ์งธ๋กœ, ์Šค์ผ€์ค„๋ง ๋ฌธ์ œ์—์„œ closed-loop ํ”ผ๋“œ๋ฐฑ ํ˜•ํƒœ์˜ ํ•ด๋ฅผ ์ œ์‹œํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด๋Š” ๊ธฐ์กด ์ง์ ‘๋ฒ•์—์„œ ์–ป์„ ์ˆ˜ ์—†์—ˆ๋˜ ํ˜•ํƒœ๋กœ์„œ, ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ฐ•์ ์„ ๋ถ€๊ฐํ•  ์ˆ˜ ์žˆ๋Š” ์ธก๋ฉด์ด๋ผ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ ๊ณ ๋ คํ•œ ํ•˜์œ„๋‹จ๊ณ„ ์ œ์–ด ๋ฌธ์ œ์—์„œ, ๋™์  ๊ณ„ํš๋ฒ•์˜ ๋ฌดํ•œ์ฐจ์› ํ•จ์ˆ˜๊ณต๊ฐ„ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ•จ์ˆ˜ ๊ทผ์‚ฌ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์œ ํ•œ์ฐจ์› ๋ฒกํ„ฐ๊ณต๊ฐ„ ์ตœ์ ํ™” ๋ฌธ์ œ๋กœ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ•˜์˜€๋‹ค. ํŠนํžˆ, ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ํ•จ์ˆ˜ ๊ทผ์‚ฌ๋ฅผ ํ•˜์˜€๊ณ , ์ด๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์žฅ์ ๊ณผ ์ˆ˜๋ ด ํ•ด์„ ๊ฒฐ๊ณผ๋ฅผ ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์— ์‹ค์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋ฌธ์ œ๋Š” ์ƒ์œ„ ๋‹จ๊ณ„ ๋™์  ์ตœ์ ํ™” ๋ฌธ์ œ์ด๋‹ค. ๋™์  ์ตœ์ ํ™” ๋ฌธ์ œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ œ์•ฝ ์กฐ๊ฑดํ•˜์—์„œ ๊ฐ•ํ™”ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด, ์›-์Œ๋Œ€ ๋ฏธ๋ถ„๋™์  ๊ณ„ํš๋ฒ• (primal-dual DDP) ๋ฐฉ๋ฒ•๋ก ์„ ์ƒˆ๋กœ ์ œ์•ˆํ•˜์˜€๋‹ค. ์•ž์„œ ์„ค๋ช…ํ•œ ์„ธ๊ฐ€์ง€ ๋ฌธ์ œ์— ์ ์šฉ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•˜๊ณ , ๋™์  ๊ณ„ํš๋ฒ•์ด ์ง์ ‘๋ฒ•์— ๋น„๊ฒฌ๋  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์ด๋ผ๋Š” ์ฃผ์žฅ์„ ์‹ค์ฆํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ณต์ • ์˜ˆ์ œ๋ฅผ ์‹ค์—ˆ๋‹ค.Sequential decision making problem is a crucial technology for plant-wide process optimization. While the dominant numerical method is the forward-in-time direct optimization, it is limited to the open-loop solution and has difficulty in considering the uncertainty. Dynamic programming method complements the limitations, nonetheless associated functional optimization suffers from the curse-of-dimensionality. The sample-based approach for approximating the dynamic programming, referred to as reinforcement learning (RL) can resolve the issue and investigated throughout this thesis. The method that accounts for the system model explicitly is in particular interest. The model-based RL is exploited to solve the three representative sequential decision making problems; scheduling, supervisory optimization, and regulatory control. The problems are formulated with partially observable Markov decision process, control-affine state space model, and general state space model, and associated model-based RL algorithms are point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP), respectively. The contribution for each problem can be written as follows: First, for the scheduling problem, we developed the closed-loop feedback scheme which highlights the strength compared to the direct optimization method. In the second case, the regulatory control problem is tackled by the function approximation method which relaxes the functional optimization to the finite dimensional vector space optimization. Deep neural networks (DNNs) is utilized as the approximator, and the advantages as well as the convergence analysis is performed in the thesis. Finally, for the supervisory optimization problem, we developed the novel constraint RL framework that uses the primal-dual DDP method. Various illustrative examples are demonstrated to validate the developed model-based RL algorithms and to support the thesis statement on which the dynamic programming method can be considered as a complementary method for direct optimization method.1. Introduction 1 1.1 Motivation and previous work 1 1.2 Statement of contributions 9 1.3 Outline of the thesis 11 2. Background and preliminaries 13 2.1 Optimization problem formulation and the principle of optimality 13 2.1.1 Markov decision process 15 2.1.2 State space model 19 2.2 Overview of the developed RL algorithms 28 2.2.1 Point based value iteration 28 2.2.2 Globalized dual heuristic programming 29 2.2.3 Differential dynamic programming 32 3. A POMDP framework for integrated scheduling of infrastructure maintenance and inspection 35 3.1 Introduction 35 3.2 POMDP solution algorithm 38 3.2.1 General point based value iteration 38 3.2.2 GapMin algorithm 46 3.2.3 Receding horizon POMDP 49 3.3 Problem formulation for infrastructure scheduling 54 3.3.1 State 56 3.3.2 Maintenance and inspection actions 57 3.3.3 State transition function 61 3.3.4 Cost function 67 3.3.5 Observation set and observation function 68 3.3.6 State augmentation 69 3.4 Illustrative example and simulation result 69 3.4.1 Structural point for the analysis of a high dimensional belief space 72 3.4.2 Infinite horizon policy under the natural deterioration process 72 3.4.3 Receding horizon POMDP 79 3.4.4 Validation of POMDP policy via Monte Carlo simulation 83 4. A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system 88 4.1 Introduction 88 4.2 Function approximation and learning with deep neural networks 91 4.2.1 GDHP with a function approximator 91 4.2.2 Stable learning of DNNs 96 4.2.3 Overall algorithm 103 4.3 Results and discussions 107 4.3.1 Example 1: Semi-batch reactor 107 4.3.2 Example 2: Diffusion-Convection-Reaction (DCR) process 120 5. Convergence analysis of the model-based deep reinforcement learning for optimal control of nonlinear control-affine system 126 5.1 Introduction 126 5.2 Convergence proof of globalized dual heuristic programming (GDHP) 128 5.3 Function approximation with deep neural networks 137 5.3.1 Function approximation and gradient descent learning 137 5.3.2 Forward and backward propagations of DNNs 139 5.4 Convergence analysis in the deep neural networks space 141 5.4.1 Lyapunov analysis of the neural network parameter errors 141 5.4.2 Lyapunov analysis of the closed-loop stability 150 5.4.3 Overall Lyapunov function 152 5.5 Simulation results and discussions 157 5.5.1 System description 158 5.5.2 Algorithmic settings 160 5.5.3 Control result 161 6. Primal-dual differential dynamic programming for constrained dynamic optimization of continuous system 170 6.1 Introduction 170 6.2 Primal-dual differential dynamic programming for constrained dynamic optimization 172 6.2.1 Augmented Lagrangian method 172 6.2.2 Primal-dual differential dynamic programming algorithm 175 6.2.3 Overall algorithm 179 6.3 Results and discussions 179 7. Concluding remarks 186 7.1 Summary of the contributions 187 7.2 Future works 189 Bibliography 192Docto
    • โ€ฆ
    corecore