967 research outputs found

    Applications of Probabilistic Inference to Planning & Reinforcement Learning

    Get PDF
    Optimal control is a profound and fascinating subject that regularly attracts interest from numerous scien- tific disciplines, including both pure and applied Mathematics, Computer Science, Artificial Intelligence, Psychology, Neuroscience and Economics. In 1960 Rudolf Kalman discovered that there exists a dual- ity between the problems of filtering and optimal control in linear systems [84]. This is now regarded as a seminal piece of work and it has since motivated a large amount of research into the discovery of similar dualities between optimal control and statistical inference. This is especially true of recent years where there has been much research into recasting problems of optimal control into problems of statis- tical/approximate inference. Broadly speaking this is the perspective that we take in this work and in particular we present various applications of methods from the fields of statistical/approximate inference to optimal control, planning and Reinforcement Learning. Some of the methods would be more accu- rately described to originate from other fields of research, such as the dual decomposition techniques used in chapter(5) which originate from convex optimisation. However, the original motivation for the application of these techniques was from the field of approximate inference. The study of dualities be- tween optimal control and statistical inference has been a subject of research for over 50 years and we do not claim to encompass the entire subject. Instead, we present what we consider to be a range of interesting and novel applications from this field of researc

    A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

    Get PDF
    Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being considered the current state of the art in the field. In this article we provide a unifying perspective of these two algorithms by showing that their step-directions in the parameter space are closely related to the search direction of an approximate Newton method. This analysis leads naturally to the consideration of this approximate Newton method as an alternative gradient-based method for Markov Decision Processes. We are able show that the algorithm has numerous desirable properties, absent in the naive application of Newton's method, that make it a viable alternative to either Expectation Maximisation or natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties, performing strongly in comparison to both Expectation Maximisation and natural gradient ascent

    Probabilistic inverse reinforcement learning in unknown environments

    Full text link
    We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

    Return and abort trajectory optimisation for reusable launch vehicles

    Get PDF
    Among the future space access vehicles, the lifting body spaceplane is the most promising approach to prevent damage to both the launcher and the payload in case of loss of thrust. The glide performances of the vehicle allow the recovery in both nominal and abort cases. The approach presented is used in the investigation of the unpowered descent paths of a sample vehicle through trajectory optimisation. The vehicle's downrange and crossrange limits are obtained for aborts in multiple flight conditions

    Multi-objective optimisation of aircraft flight trajectories in the ATM and avionics context

    Get PDF
    The continuous increase of air transport demand worldwide and the push for a more economically viable and environmentally sustainable aviation are driving significant evolutions of aircraft, airspace and airport systems design and operations. Although extensive research has been performed on the optimisation of aircraft trajectories and very efficient algorithms were widely adopted for the optimisation of vertical flight profiles, it is only in the last few years that higher levels of automation were proposed for integrated flight planning and re-routing functionalities of innovative Communication Navigation and Surveillance/Air Traffic Management (CNS/ATM) and Avionics (CNS+A) systems. In this context, the implementation of additional environmental targets and of multiple operational constraints introduces the need to efficiently deal with multiple objectives as part of the trajectory optimisation algorithm. This article provides a comprehensive review of Multi-Objective Trajectory Optimisation (MOTO) techniques for transport aircraft flight operations, with a special focus on the recent advances introduced in the CNS+A research context. In the first section, a brief introduction is given, together with an overview of the main international research initiatives where this topic has been studied, and the problem statement is provided. The second section introduces the mathematical formulation and the third section reviews the numerical solution techniques, including discretisation and optimisation methods for the specific problem formulated. The fourth section summarises the strategies to articulate the preferences and to select optimal trajectories when multiple conflicting objectives are introduced. The fifth section introduces a number of models defining the optimality criteria and constraints typically adopted in MOTO studies, including fuel consumption, air pollutant and noise emissions, operational costs, condensation trails, airspace and airport operations

    Pilot3 D2.1 - Trade-off report on multi criteria decision making techniques

    Get PDF
    This deliverable describes the decision making approach that will be followed in Pilot3. It presents a domain-driven analysis of the characteristics of Pilot3 objective function and optimisation framework. This has been done considering inputs from deliverable D1.1 - Technical Resources and Problem definition, from interaction with the Topic Manager, but most importantly from a dedicated Advisory Board workshop and follow-up consultation. The Advisory Board is formed by relevant stakeholders including airlines, flight operation experts, pilots, and other relevant ATM experts. A review of the different multi-criteria decision making techniques available in the literature is presented. Considering the domain-driven characteristics of Pilot3 and inputs on how the tool could be used by airlines and crew. Then, the most suitable methods for multi-criteria optimisation are selected for each of the phases of the optimisation framework

    Applications of the homotopy analysis method to optimal control problems

    Get PDF
    Traditionally, trajectory optimization for aerospace applications has been performed using either direct or indirect methods. Indirect methods produce highly accurate solutions but suer from a small convergence region, requiring initial guesses close to the optimal solution. In past two decades, a new series of analytical approximation methods have been used for solving systems of dierential equations and boundary value problems. The Homotopy Analysis Method (HAM) is one such method which has been used to solve typical boundary value problems in nance, science, and engineering. In this investigation, a methodology is created to solve indirect trajectory optimization problems using the Homotopy Analysis Method. Use of the auxiliary convergence control parameter to widen the convergence region and increase the rate of convergence have been demonstrated on multiple optimal control problems. The guaranteed convergence and the ease of selecting the initial guess for trajectory optimization problems makes the method of high signicance. It has been demonstrated that initial guesses for the optimal control problem can be generated using a simple approach based on only the initial boundary conditions. The approach has been demonstrated on the Zermelo\u27s problem and two cases of a 2D ascent problem. It has been established that for free nal-time boundary value problems, nding the convergence region is much harder as compared to xed nal-time cases. To validate the approach, results are compared with those obtained using the MATLAB\u27s bvp4c function. A number of new challenges are discovered and listed during the process
    • …
    corecore