Search CORE

3 research outputs found

Development of model free flight control system using deep deterministic policy gradient (DDPG)

Author: Budiarti Dewi H.
Publication venue: SATM
Publication date: 01/09/2019
Field of study

Developing a flight control system for a complete 6 degree-of-freedom for an air vehicle remains a huge task that requires time and effort to gather all the necessary data. This thesis proposes the use of reinforcement learning to develop a policy for a flight control system of an air vehicle. This method is designed to be independent of a model but it does require a set of samples for the reinforcement learning agent to learn from. A novel reinforcement learning method called Deep Deterministic Policy Gradient (DDPG) is applied to counter the problem with large and continuous space in a flight control. However, applying the DDPG for multiple action is often difficult. Too many possibilities can hinder the reinforcement learning agent from converging its learning process. This thesis proposes a learning strategy that helps shape the way the learning agent learns with multiple actions. It also shows that the final policy for flight control can be extracted and applied immediately for a flight control system.Aerospac

Cranfield CERES

Actor-critic algorithms for hierarchical Markov decision processes

Author: Bhatnagar Shalabh
Panigrahi Ranjan J
Publication venue: Elsevier
Publication date
Field of study

We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and satisfy a performance bound. One of the approximation algorithms is a three-timescale actor-critic algorithm while the other is a two-timescale algorithm, however, which operates in two separate stages. All our algorithms recursively update randomized policies using the simultaneous perturbation stochastic approximation (SPSA) methodology. We briefly present the convergence analysis of our algorithms. We then present numerical experiments on a problem of production planning in semiconductor fabs on which we compare the performance of all algorithms together with policy iteration. Algorithms based on certain Hadamard matrix based deterministic perturbations are found to show the best results

Open Access Repository of IISc Research Publications

TOK'07 otomatik kontrol ulusal toplantısı: 5-7 Eylül 2007, Sabancı Üniversitesi, Tuzla, İstanbul

Author
Publication venue: Otomatik Kontrol Türk Milli Komitesi
Publication date: 05/09/2007
Field of study

Sabanci University Research Database