Search CORE

967 research outputs found

Applications of Probabilistic Inference to Planning & Reinforcement Learning

Author: Furmston TJ
Publication venue: UCL (University College London)
Publication date: 28/04/2013
Field of study

Optimal control is a profound and fascinating subject that regularly attracts interest from numerous scien- tific disciplines, including both pure and applied Mathematics, Computer Science, Artificial Intelligence, Psychology, Neuroscience and Economics. In 1960 Rudolf Kalman discovered that there exists a dual- ity between the problems of filtering and optimal control in linear systems [84]. This is now regarded as a seminal piece of work and it has since motivated a large amount of research into the discovery of similar dualities between optimal control and statistical inference. This is especially true of recent years where there has been much research into recasting problems of optimal control into problems of statis- tical/approximate inference. Broadly speaking this is the perspective that we take in this work and in particular we present various applications of methods from the fields of statistical/approximate inference to optimal control, planning and Reinforcement Learning. Some of the methods would be more accu- rately described to originate from other fields of research, such as the dual decomposition techniques used in chapter(5) which originate from convex optimisation. However, the original motivation for the application of these techniques was from the field of approximate inference. The study of dualities be- tween optimal control and statistical inference has been a subject of research for over 50 years and we do not claim to encompass the entire subject. Instead, we present what we consider to be a range of interesting and novel applications from this field of researc

UCL Discovery

A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

Author: Barber D
Furmston T
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2012
Field of study

Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being considered the current state of the art in the field. In this article we provide a unifying perspective of these two algorithms by showing that their step-directions in the parameter space are closely related to the search direction of an approximate Newton method. This analysis leads naturally to the consideration of this approximate Newton method as an alternative gradient-based method for Markov Decision Processes. We are able show that the algorithm has numerous desirable properties, absent in the naive application of Newton's method, that make it a viable alternative to either Expectation Maximisation or natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties, performing strongly in comparison to both Expectation Maximisation and natural gradient ascent

CiteSeerX

UCL Discovery

Probabilistic inverse reinforcement learning in unknown environments

Author: Dimitrakakis Christos
Tossou Aristide
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Chalmers Research

Chalmers Publication Library

Return and abort trajectory optimisation for reusable launch vehicles

Author: Hempsell M.
Toso F.
Toso F.
Young D. A.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 02/03/2017
Field of study

Among the future space access vehicles, the lifting body spaceplane is the most promising approach to prevent damage to both the launcher and the payload in case of loss of thrust. The glide performances of the vehicle allow the recovery in both nominal and abort cases. The approach presented is used in the investigation of the unpowered descent paths of a sample vehicle through trajectory optimisation. The vehicle's downrange and crossrange limits are obtained for aborts in multiple flight conditions

Crossref

University of Strathclyde Institutional Repository

Multi-objective optimisation of aircraft flight trajectories in the ATM and avionics context

Author: Gardi A
Ramasamy S
Sabatini R
Publication venue: Elsevier (United Kingdom)
Publication date: 01/01/2016
Field of study

The continuous increase of air transport demand worldwide and the push for a more economically viable and environmentally sustainable aviation are driving significant evolutions of aircraft, airspace and airport systems design and operations. Although extensive research has been performed on the optimisation of aircraft trajectories and very efficient algorithms were widely adopted for the optimisation of vertical flight profiles, it is only in the last few years that higher levels of automation were proposed for integrated flight planning and re-routing functionalities of innovative Communication Navigation and Surveillance/Air Traffic Management (CNS/ATM) and Avionics (CNS+A) systems. In this context, the implementation of additional environmental targets and of multiple operational constraints introduces the need to efficiently deal with multiple objectives as part of the trajectory optimisation algorithm. This article provides a comprehensive review of Multi-Objective Trajectory Optimisation (MOTO) techniques for transport aircraft flight operations, with a special focus on the recent advances introduced in the CNS+A research context. In the first section, a brief introduction is given, together with an overview of the main international research initiatives where this topic has been studied, and the problem statement is provided. The second section introduces the mathematical formulation and the third section reviews the numerical solution techniques, including discretisation and optimisation methods for the specific problem formulated. The fourth section summarises the strategies to articulate the preferences and to select optimal trajectories when multiple conflicting objectives are introduced. The fifth section introduces a number of models defining the optimality criteria and constraints typically adopted in MOTO studies, including fuel consumption, air pollutant and noise emissions, operational costs, condensation trails, airspace and airport operations

RMIT Research Repository

Recommended from our members

Analysis of Flight Variability: a Systematic Approach

Author: Andrienko G.
Andrienko N.
Garcia J. M. C.
Scarlatti D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In movement data analysis, there exists a problem of comparing multiple trajectories of moving objects to common or distinct reference trajectories. We introduce a general conceptual framework for comparative analysis of trajectories and an analytical procedure, which consists of (1) finding corresponding points in pairs of trajectories, (2) computation of pairwise difference measures, and (3) interactive visual analysis of the distributions of the differences with respect to space, time, set of moving objects, trajectory structures, and spatio-temporal context. We propose a combination of visualisation, interaction, and data transformation techniques supporting the analysis and demonstrate the use of our approach for solving a challenging problem from the aviation domain

City Research Online

Crossref

Fraunhofer-ePrints

Pilot3 D2.1 - Trade-off report on multi criteria decision making techniques

Author: de Villardi de Montlaur A.
de Villardi de Montlaur A.
Delgado L.
Delgado L.
Gurtner G.
Gurtner G.
Kuljanin J.
Kuljanin J.
Prats X.
Prats X.
Publication venue
Publication date: 01/01/2020
Field of study

This deliverable describes the decision making approach that will be followed in Pilot3. It presents a domain-driven analysis of the characteristics of Pilot3 objective function and optimisation framework. This has been done considering inputs from deliverable D1.1 - Technical Resources and Problem definition, from interaction with the Topic Manager, but most importantly from a dedicated Advisory Board workshop and follow-up consultation. The Advisory Board is formed by relevant stakeholders including airlines, flight operation experts, pilots, and other relevant ATM experts. A review of the different multi-criteria decision making techniques available in the literature is presented. Considering the domain-driven characteristics of Pilot3 and inputs on how the tool could be used by airlines and crew. Then, the most suitable methods for multi-criteria optimisation are selected for each of the phases of the optimisation framework

WestminsterResearch

Guidance and Control Elements for Improved Access to Space:from Planetary Landers to Reusable Launchers

Author: Simplicio Pedro V M
Publication venue
Publication date: 28/11/2019
Field of study

Explore Bristol Research

Applications of the homotopy analysis method to optimal control problems

Author: Singh Shubham
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

Traditionally, trajectory optimization for aerospace applications has been performed using either direct or indirect methods. Indirect methods produce highly accurate solutions but suer from a small convergence region, requiring initial guesses close to the optimal solution. In past two decades, a new series of analytical approximation methods have been used for solving systems of dierential equations and boundary value problems. The Homotopy Analysis Method (HAM) is one such method which has been used to solve typical boundary value problems in nance, science, and engineering. In this investigation, a methodology is created to solve indirect trajectory optimization problems using the Homotopy Analysis Method. Use of the auxiliary convergence control parameter to widen the convergence region and increase the rate of convergence have been demonstrated on multiple optimal control problems. The guaranteed convergence and the ease of selecting the initial guess for trajectory optimization problems makes the method of high signicance. It has been demonstrated that initial guesses for the optimal control problem can be generated using a simple approach based on only the initial boundary conditions. The approach has been demonstrated on the Zermelo\u27s problem and two cases of a 2D ascent problem. It has been established that for free nal-time boundary value problems, nding the convergence region is much harder as compared to xed nal-time cases. To validate the approach, results are compared with those obtained using the MATLAB\u27s bvp4c function. A number of new challenges are discovered and listed during the process

Purdue E-Pubs