967 research outputs found
Applications of Probabilistic Inference to Planning & Reinforcement Learning
Optimal control is a profound and fascinating subject that regularly attracts interest from numerous scien-
tific disciplines, including both pure and applied Mathematics, Computer Science, Artificial Intelligence,
Psychology, Neuroscience and Economics. In
1960 Rudolf Kalman discovered that there exists a dual-
ity between the problems of filtering and optimal control in linear systems [84]. This is now regarded
as a seminal piece of work and it has since motivated a large amount of research into the discovery of
similar dualities between optimal control and statistical inference. This is especially true of recent years
where there has been much research into recasting problems of optimal control into problems of statis-
tical/approximate inference. Broadly speaking this is the perspective that we take in this work and in
particular we present various applications of methods from the fields of statistical/approximate inference
to optimal control, planning and Reinforcement Learning. Some of the methods would be more accu-
rately described to originate from other fields of research, such as the dual decomposition
techniques used in chapter(5) which originate from convex optimisation. However, the original motivation for the
application of these techniques was from the field of approximate inference. The study of dualities be-
tween optimal control and statistical inference has been a subject of research for over 50
years and we do not claim to encompass the entire subject. Instead, we present what we consider to be a range of
interesting and novel applications from this field of researc
A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes
Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being considered the current state of the art in the field. In this article we provide a unifying perspective of these two algorithms by showing that their step-directions in the parameter space are closely related to the search direction of an approximate Newton method. This analysis leads naturally to the consideration of this approximate Newton method as an alternative gradient-based method for Markov Decision Processes. We are able show that the algorithm has numerous desirable properties, absent in the naive application of Newton's method, that make it a viable alternative to either Expectation Maximisation or natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties, performing strongly in comparison to both Expectation Maximisation and natural gradient ascent
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Return and abort trajectory optimisation for reusable launch vehicles
Among the future space access vehicles, the lifting body spaceplane is the most promising approach to prevent damage to both the launcher and the payload in case of loss of thrust. The glide performances of the vehicle allow the recovery in both nominal and abort cases. The approach presented is used in the investigation of the unpowered descent paths of a sample vehicle through trajectory optimisation. The vehicle's downrange and crossrange limits are obtained for aborts in multiple flight conditions
Multi-objective optimisation of aircraft flight trajectories in the ATM and avionics context
The continuous increase of air transport demand worldwide and the push for a more economically viable and environmentally sustainable aviation are driving significant evolutions of aircraft, airspace and airport systems design and operations. Although extensive research has been performed on the optimisation of aircraft trajectories and very efficient algorithms were widely adopted for the optimisation of vertical flight profiles, it is only in the last few years that higher levels of automation were proposed for integrated flight planning and re-routing functionalities of innovative Communication Navigation and Surveillance/Air Traffic Management (CNS/ATM) and Avionics (CNS+A) systems. In this context, the implementation of additional environmental targets and of multiple operational constraints introduces the need to efficiently deal with multiple objectives as part of the trajectory optimisation algorithm. This article provides a comprehensive review of Multi-Objective Trajectory Optimisation (MOTO) techniques for transport aircraft flight operations, with a special focus on the recent advances introduced in the CNS+A research context. In the first section, a brief introduction is given, together with an overview of the main international research initiatives where this topic has been studied, and the problem statement is provided. The second section introduces the mathematical formulation and the third section reviews the numerical solution techniques, including discretisation and optimisation methods for the specific problem formulated. The fourth section summarises the strategies to articulate the preferences and to select optimal trajectories when multiple conflicting objectives are introduced. The fifth section introduces a number of models defining the optimality criteria and constraints typically adopted in MOTO studies, including fuel consumption, air pollutant and noise emissions, operational costs, condensation trails, airspace and airport operations
Recommended from our members
Analysis of Flight Variability: a Systematic Approach
In movement data analysis, there exists a problem of comparing multiple trajectories of moving objects to common or distinct reference trajectories. We introduce a general conceptual framework for comparative analysis of trajectories and an analytical procedure, which consists of (1) finding corresponding points in pairs of trajectories, (2) computation of pairwise difference measures, and (3) interactive visual analysis of the distributions of the differences with respect to space, time, set of moving objects, trajectory structures, and spatio-temporal context. We propose a combination of visualisation, interaction, and data transformation techniques supporting the analysis and demonstrate the use of our approach for solving a challenging problem from the aviation domain
Pilot3 D2.1 - Trade-off report on multi criteria decision making techniques
This deliverable describes the decision making approach that will be followed in Pilot3.
It presents a domain-driven analysis of the characteristics of Pilot3 objective function and optimisation framework. This has been done considering inputs from deliverable D1.1 - Technical Resources and Problem definition, from interaction with the Topic Manager, but most importantly from a dedicated Advisory Board workshop and follow-up consultation. The Advisory Board is formed by relevant stakeholders including airlines, flight operation experts, pilots, and other relevant ATM experts.
A review of the different multi-criteria decision making techniques available in the literature is presented. Considering the domain-driven characteristics of Pilot3 and inputs on how the tool could be used by airlines and crew. Then, the most suitable methods for multi-criteria optimisation are selected for each of the phases of the optimisation framework
Applications of the homotopy analysis method to optimal control problems
Traditionally, trajectory optimization for aerospace applications has been performed using either direct or indirect methods. Indirect methods produce highly accurate solutions but suer from a small convergence region, requiring initial guesses close to the optimal solution. In past two decades, a new series of analytical approximation methods have been used for solving systems of dierential equations and boundary value problems.
The Homotopy Analysis Method (HAM) is one such method which has been used to solve typical boundary value problems in nance, science, and engineering. In this investigation, a methodology is created to solve indirect trajectory optimization problems using the Homotopy Analysis Method. Use of the auxiliary convergence control parameter to widen the convergence region and increase the rate of convergence have been demonstrated on multiple optimal control problems. The guaranteed convergence and the ease of selecting the initial guess for trajectory optimization problems makes the method of high signicance. It has been demonstrated that initial guesses for the optimal control problem can be generated using a simple approach based on only the initial boundary conditions. The approach has been demonstrated on the Zermelo\u27s problem and two cases of a 2D ascent problem. It has been established that for free nal-time boundary value problems, nding the convergence region is much harder as compared to xed nal-time cases. To validate the approach, results are compared with those obtained using the MATLAB\u27s bvp4c function. A number of new challenges are discovered and listed during the process
- …