3 research outputs found
Development of model free flight control system using deep deterministic policy gradient (DDPG)
Developing a flight control system for a complete 6 degree-of-freedom for an air
vehicle remains a huge task that requires time and effort to gather all the
necessary data. This thesis proposes the use of reinforcement learning to
develop a policy for a flight control system of an air vehicle. This method is
designed to be independent of a model but it does require a set of samples for
the reinforcement learning agent to learn from.
A novel reinforcement learning method called Deep Deterministic Policy Gradient
(DDPG) is applied to counter the problem with large and continuous space in a
flight control. However, applying the DDPG for multiple action is often difficult.
Too many possibilities can hinder the reinforcement learning agent from
converging its learning process.
This thesis proposes a learning strategy that helps shape the way the learning
agent learns with multiple actions. It also shows that the final policy for flight
control can be extracted and applied immediately for a flight control system.Aerospac
Actor-critic algorithms for hierarchical Markov decision processes
We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and satisfy a performance bound. One of the approximation algorithms is a three-timescale actor-critic algorithm while the other is a two-timescale algorithm, however, which operates in two separate stages. All our algorithms recursively update randomized policies using the simultaneous perturbation stochastic approximation (SPSA) methodology. We briefly present the convergence analysis of our algorithms. We then present numerical experiments on a problem of production planning in semiconductor fabs on which we compare the performance of all algorithms together with policy iteration. Algorithms based on certain Hadamard matrix based deterministic perturbations are found to show the best results