534 research outputs found

    Adaptive dynamic programming with eligibility traces and complexity reduction of high-dimensional systems

    Get PDF
    This dissertation investigates the application of a variety of computational intelligence techniques, particularly clustering and adaptive dynamic programming (ADP) designs especially heuristic dynamic programming (HDP) and dual heuristic programming (DHP). Moreover, a one-step temporal-difference (TD(0)) and n-step TD (TD(Ξ»)) with their gradients are utilized as learning algorithms to train and online-adapt the families of ADP. The dissertation is organized into seven papers. The first paper demonstrates the robustness of model order reduction (MOR) for simulating complex dynamical systems. Agglomerative hierarchical clustering based on performance evaluation is introduced for MOR. This method computes the reduced order denominator of the transfer function by clustering system poles in a hierarchical dendrogram. Several numerical examples of reducing techniques are taken from the literature to compare with our work. In the second paper, a HDP is combined with the Dyna algorithm for path planning. The third paper uses DHP with an eligibility trace parameter (Ξ») to track a reference trajectory under uncertainties for a nonholonomic mobile robot by using a first-order Sugeno fuzzy neural network structure for the critic and actor networks. In the fourth and fifth papers, a stability analysis for a model-free action-dependent HDP(Ξ») is demonstrated with batch- and online-implementation learning, respectively. The sixth work combines two different gradient prediction levels of critic networks. In this work, we provide a convergence proofs. The seventh paper develops a two-hybrid recurrent fuzzy neural network structures for both critic and actor networks. They use a novel n-step gradient temporal-difference (gradient of TD(Ξ»)) of an advanced ADP algorithm called value-gradient learning (VGL(Ξ»)), and convergence proofs are given. Furthermore, the seventh paper is the first to combine the single network adaptive critic with VGL(Ξ»). --Abstract, page iv

    A brief review of neural networks based learning and control and their applications for robots

    Get PDF
    As an imitation of the biological nervous systems, neural networks (NN), which are characterized with powerful learning ability, have been employed in a wide range of applications, such as control of complex nonlinear systems, optimization, system identification and patterns recognition etc. This article aims to bring a brief review of the state-of-art NN for the complex nonlinear systems. Recent progresses of NNs in both theoretical developments and practical applications are investigated and surveyed. Specifically, NN based robot learning and control applications were further reviewed, including NN based robot manipulator control, NN based human robot interaction and NN based behavior recognition and generation

    λͺ¨λΈκΈ°λ°˜κ°•ν™”ν•™μŠ΅μ„μ΄μš©ν•œκ³΅μ •μ œμ–΄λ°μ΅œμ ν™”

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 화학생물곡학뢀,2020. 2. 이쒅민.순차적 μ˜μ‚¬κ²°μ • λ¬Έμ œλŠ” 곡정 μ΅œμ ν™”μ˜ 핡심 λΆ„μ•Ό 쀑 ν•˜λ‚˜μ΄λ‹€. 이 문제의 수치적 해법 쀑 κ°€μž₯ 많이 μ‚¬μš©λ˜λŠ” 것은 순방ν–₯으둜 μž‘λ™ν•˜λŠ” 직접법 (direct optimization) λ°©λ²•μ΄μ§€λ§Œ, λͺ‡κ°€μ§€ ν•œκ³„μ μ„ μ§€λ‹ˆκ³  μžˆλ‹€. μ΅œμ ν•΄λŠ” open-loop의 ν˜•νƒœλ₯Ό μ§€λ‹ˆκ³  있으며, λΆˆν™•μ •μ„±μ΄ μ‘΄μž¬ν• λ•Œ λ°©λ²•λ‘ μ˜ 수치적 λ³΅μž‘λ„κ°€ μ¦κ°€ν•œλ‹€λŠ” 것이닀. 동적 κ³„νšλ²• (dynamic programming) 은 μ΄λŸ¬ν•œ ν•œκ³„μ μ„ κ·Όμ›μ μœΌλ‘œ ν•΄κ²°ν•  수 μžˆμ§€λ§Œ, κ·Έλ™μ•ˆ 곡정 μ΅œμ ν™”μ— 적극적으둜 κ³ λ €λ˜μ§€ μ•Šμ•˜λ˜ μ΄μœ λŠ” 동적 κ³„νšλ²•μ˜ 결과둜 얻어진 νŽΈλ―ΈλΆ„ 방정식 λ¬Έμ œκ°€ μœ ν•œμ°¨μ› 벑터곡간이 μ•„λ‹Œ λ¬΄ν•œμ°¨μ›μ˜ ν•¨μˆ˜κ³΅κ°„μ—μ„œ 닀루어지기 λ•Œλ¬Έμ΄λ‹€. μ†Œμœ„ μ°¨μ›μ˜ 저주라고 λΆˆλ¦¬λŠ” 이 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•œ ν•œκ°€μ§€ λ°©λ²•μœΌλ‘œμ„œ, μƒ˜ν”Œμ„ μ΄μš©ν•œ 근사적 해법에 μ΄ˆμ μ„ λ‘” κ°•ν™”ν•™μŠ΅ 방법둠이 μ—°κ΅¬λ˜μ–΄ μ™”λ‹€. λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œλŠ” κ°•ν™”ν•™μŠ΅ 방법둠 쀑, 곡정 μ΅œμ ν™”μ— μ ν•©ν•œ λͺ¨λΈ 기반 κ°•ν™”ν•™μŠ΅μ— λŒ€ν•΄ μ—°κ΅¬ν•˜κ³ , 이λ₯Ό 곡정 μ΅œμ ν™”μ˜ λŒ€ν‘œμ μΈ 세가지 순차적 μ˜μ‚¬κ²°μ • 문제인 μŠ€μΌ€μ€„λ§, μƒμœ„λ‹¨κ³„ μ΅œμ ν™”, ν•˜μœ„λ‹¨κ³„ μ œμ–΄μ— μ μš©ν•˜λŠ” 것을 λͺ©ν‘œλ‘œ ν•œλ‹€. 이 λ¬Έμ œλ“€μ€ 각각 λΆ€λΆ„κ΄€μΈ‘ 마λ₯΄μ½”ν”„ κ²°μ • κ³Όμ • (partially observable Markov decision process), μ œμ–΄-μ•„ν•€ μƒνƒœκ³΅κ°„ λͺ¨λΈ (control-affine state space model), 일반적 μƒνƒœκ³΅κ°„ λͺ¨λΈ (general state space model)둜 λͺ¨λΈλ§λœλ‹€. λ˜ν•œ 각 수치적 λͺ¨λΈλ“€μ„ ν•΄κ²°ν•˜κΈ° μœ„ν•΄ point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP)둜 λΆˆλ¦¬λŠ” 방법듀을 λ„μž…ν•˜μ˜€λ‹€. 이 세가지 λ¬Έμ œμ™€ λ°©λ²•λ‘ μ—μ„œ μ œμ‹œλœ νŠΉμ§•λ“€μ„ λ‹€μŒκ³Ό 같이 μš”μ•½ν•  수 μžˆλ‹€: 첫번째둜, μŠ€μΌ€μ€„λ§ λ¬Έμ œμ—μ„œ closed-loop ν”Όλ“œλ°± ν˜•νƒœμ˜ ν•΄λ₯Ό μ œμ‹œν•  수 μžˆμ—ˆλ‹€. μ΄λŠ” κΈ°μ‘΄ μ§μ ‘λ²•μ—μ„œ 얻을 수 μ—†μ—ˆλ˜ ν˜•νƒœλ‘œμ„œ, κ°•ν™”ν•™μŠ΅μ˜ 강점을 뢀각할 수 μžˆλŠ” 츑면이라 생각할 수 μžˆλ‹€. λ‘λ²ˆμ§Έλ‘œ κ³ λ €ν•œ ν•˜μœ„λ‹¨κ³„ μ œμ–΄ λ¬Έμ œμ—μ„œ, 동적 κ³„νšλ²•μ˜ λ¬΄ν•œμ°¨μ› ν•¨μˆ˜κ³΅κ°„ μ΅œμ ν™” 문제λ₯Ό ν•¨μˆ˜ 근사 방법을 톡해 μœ ν•œμ°¨μ› 벑터곡간 μ΅œμ ν™” 문제둜 μ™„ν™”ν•  수 μžˆλŠ” 방법을 λ„μž…ν•˜μ˜€λ‹€. 특히, 심측 신경망을 μ΄μš©ν•˜μ—¬ ν•¨μˆ˜ 근사λ₯Ό ν•˜μ˜€κ³ , μ΄λ•Œ λ°œμƒν•˜λŠ” μ—¬λŸ¬κ°€μ§€ μž₯점과 수렴 해석 κ²°κ³Όλ₯Ό λ³Έ ν•™μœ„λ…Όλ¬Έμ— μ‹€μ—ˆλ‹€. λ§ˆμ§€λ§‰ λ¬Έμ œλŠ” μƒμœ„ 단계 동적 μ΅œμ ν™” λ¬Έμ œμ΄λ‹€. 동적 μ΅œμ ν™” λ¬Έμ œμ—μ„œ λ°œμƒν•˜λŠ” μ œμ•½ μ‘°κ±΄ν•˜μ—μ„œ κ°•ν™”ν•™μŠ΅μ„ μˆ˜ν–‰ν•˜κΈ° μœ„ν•΄, 원-μŒλŒ€ 미뢄동적 κ³„νšλ²• (primal-dual DDP) 방법둠을 μƒˆλ‘œ μ œμ•ˆν•˜μ˜€λ‹€. μ•žμ„œ μ„€λͺ…ν•œ 세가지 λ¬Έμ œμ— 적용된 방법둠을 κ²€μ¦ν•˜κ³ , 동적 κ³„νšλ²•μ΄ 직접법에 비견될 수 μžˆλŠ” λ°©λ²•λ‘ μ΄λΌλŠ” μ£Όμž₯을 μ‹€μ¦ν•˜κΈ° μœ„ν•΄ μ—¬λŸ¬κ°€μ§€ 곡정 예제λ₯Ό μ‹€μ—ˆλ‹€.Sequential decision making problem is a crucial technology for plant-wide process optimization. While the dominant numerical method is the forward-in-time direct optimization, it is limited to the open-loop solution and has difficulty in considering the uncertainty. Dynamic programming method complements the limitations, nonetheless associated functional optimization suffers from the curse-of-dimensionality. The sample-based approach for approximating the dynamic programming, referred to as reinforcement learning (RL) can resolve the issue and investigated throughout this thesis. The method that accounts for the system model explicitly is in particular interest. The model-based RL is exploited to solve the three representative sequential decision making problems; scheduling, supervisory optimization, and regulatory control. The problems are formulated with partially observable Markov decision process, control-affine state space model, and general state space model, and associated model-based RL algorithms are point based value iteration (PBVI), globalized dual heuristic programming (GDHP), and differential dynamic programming (DDP), respectively. The contribution for each problem can be written as follows: First, for the scheduling problem, we developed the closed-loop feedback scheme which highlights the strength compared to the direct optimization method. In the second case, the regulatory control problem is tackled by the function approximation method which relaxes the functional optimization to the finite dimensional vector space optimization. Deep neural networks (DNNs) is utilized as the approximator, and the advantages as well as the convergence analysis is performed in the thesis. Finally, for the supervisory optimization problem, we developed the novel constraint RL framework that uses the primal-dual DDP method. Various illustrative examples are demonstrated to validate the developed model-based RL algorithms and to support the thesis statement on which the dynamic programming method can be considered as a complementary method for direct optimization method.1. Introduction 1 1.1 Motivation and previous work 1 1.2 Statement of contributions 9 1.3 Outline of the thesis 11 2. Background and preliminaries 13 2.1 Optimization problem formulation and the principle of optimality 13 2.1.1 Markov decision process 15 2.1.2 State space model 19 2.2 Overview of the developed RL algorithms 28 2.2.1 Point based value iteration 28 2.2.2 Globalized dual heuristic programming 29 2.2.3 Differential dynamic programming 32 3. A POMDP framework for integrated scheduling of infrastructure maintenance and inspection 35 3.1 Introduction 35 3.2 POMDP solution algorithm 38 3.2.1 General point based value iteration 38 3.2.2 GapMin algorithm 46 3.2.3 Receding horizon POMDP 49 3.3 Problem formulation for infrastructure scheduling 54 3.3.1 State 56 3.3.2 Maintenance and inspection actions 57 3.3.3 State transition function 61 3.3.4 Cost function 67 3.3.5 Observation set and observation function 68 3.3.6 State augmentation 69 3.4 Illustrative example and simulation result 69 3.4.1 Structural point for the analysis of a high dimensional belief space 72 3.4.2 Infinite horizon policy under the natural deterioration process 72 3.4.3 Receding horizon POMDP 79 3.4.4 Validation of POMDP policy via Monte Carlo simulation 83 4. A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system 88 4.1 Introduction 88 4.2 Function approximation and learning with deep neural networks 91 4.2.1 GDHP with a function approximator 91 4.2.2 Stable learning of DNNs 96 4.2.3 Overall algorithm 103 4.3 Results and discussions 107 4.3.1 Example 1: Semi-batch reactor 107 4.3.2 Example 2: Diffusion-Convection-Reaction (DCR) process 120 5. Convergence analysis of the model-based deep reinforcement learning for optimal control of nonlinear control-affine system 126 5.1 Introduction 126 5.2 Convergence proof of globalized dual heuristic programming (GDHP) 128 5.3 Function approximation with deep neural networks 137 5.3.1 Function approximation and gradient descent learning 137 5.3.2 Forward and backward propagations of DNNs 139 5.4 Convergence analysis in the deep neural networks space 141 5.4.1 Lyapunov analysis of the neural network parameter errors 141 5.4.2 Lyapunov analysis of the closed-loop stability 150 5.4.3 Overall Lyapunov function 152 5.5 Simulation results and discussions 157 5.5.1 System description 158 5.5.2 Algorithmic settings 160 5.5.3 Control result 161 6. Primal-dual differential dynamic programming for constrained dynamic optimization of continuous system 170 6.1 Introduction 170 6.2 Primal-dual differential dynamic programming for constrained dynamic optimization 172 6.2.1 Augmented Lagrangian method 172 6.2.2 Primal-dual differential dynamic programming algorithm 175 6.2.3 Overall algorithm 179 6.3 Results and discussions 179 7. Concluding remarks 186 7.1 Summary of the contributions 187 7.2 Future works 189 Bibliography 192Docto

    Discrete Globalised Dual Heuristic Dynamic Programming in Control of the Two-Wheeled Mobile Robot

    Get PDF
    Network-based control systems have been emerging technologies in the control of nonlinear systems over the past few years. This paper focuses on the implementation of the approximate dynamic programming algorithm in the network-based tracking control system of the two-wheeled mobile robot, Pioneer 2-DX. The proposed discrete tracking control system consists of the globalised dual heuristic dynamic programming algorithm, the PD controller, the supervisory term, and an additional control signal. The structure of the supervisory term derives from the stability analysis realised using the Lyapunov stability theorem. The globalised dual heuristic dynamic programming algorithm consists of two structures: the actor and the critic, realised in a form of neural networks. The actor generates the suboptimal control law, while the critic evaluates the realised control strategy by approximation of value function from the Bellman’s equation. The presented discrete tracking control system works online, the neural networks’ weights adaptation process is realised in every iteration step, and the neural networks preliminary learning procedure is not required. The performance of the proposed control system was verified by a series of computer simulations and experiments realised using the wheeled mobile robot Pioneer 2-DX
    • …
    corecore