3 research outputs found

    Development of model free flight control system using deep deterministic policy gradient (DDPG)

    Get PDF
    Developing a flight control system for a complete 6 degree-of-freedom for an air vehicle remains a huge task that requires time and effort to gather all the necessary data. This thesis proposes the use of reinforcement learning to develop a policy for a flight control system of an air vehicle. This method is designed to be independent of a model but it does require a set of samples for the reinforcement learning agent to learn from. A novel reinforcement learning method called Deep Deterministic Policy Gradient (DDPG) is applied to counter the problem with large and continuous space in a flight control. However, applying the DDPG for multiple action is often difficult. Too many possibilities can hinder the reinforcement learning agent from converging its learning process. This thesis proposes a learning strategy that helps shape the way the learning agent learns with multiple actions. It also shows that the final policy for flight control can be extracted and applied immediately for a flight control system.Aerospac

    Actor-critic algorithms for hierarchical Markov decision processes

    No full text
    We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and satisfy a performance bound. One of the approximation algorithms is a three-timescale actor-critic algorithm while the other is a two-timescale algorithm, however, which operates in two separate stages. All our algorithms recursively update randomized policies using the simultaneous perturbation stochastic approximation (SPSA) methodology. We briefly present the convergence analysis of our algorithms. We then present numerical experiments on a problem of production planning in semiconductor fabs on which we compare the performance of all algorithms together with policy iteration. Algorithms based on certain Hadamard matrix based deterministic perturbations are found to show the best results

    TOK'07 otomatik kontrol ulusal toplantısı: 5-7 Eylül 2007, Sabancı Üniversitesi, Tuzla, İstanbul

    Get PDF
    corecore