Search CORE

149,166 research outputs found

Methods for evaluating Decision Problems with Limited Information

Author: Höhle Michael
Nilsson D.
Publication venue
Publication date: 01/01/2005
Field of study

LImited Memory Influence Diagrams (LIMIDs) are general models of decision problems for representing limited memory policies (Lauritzen and Nilsson (2001)). The evaluation of LIMIDs can be done by Single Policy Updating that produces a local maximum strategy in which no single policy modification can increase the expected utility. This paper examines the quality of the obtained local maximum strategy and proposes three different methods for evaluating LIMIDs. The first algorithm, Temporal Policy Updating, resembles Single Policy Updating. The second algorithm, Greedy Search, successively updates the policy that gives the highest expected utility improvement. The final algorithm, Simulating Annealing, differs from the two preceeding by allowing the search to take some downhill steps to escape a local maximum. A careful comparison of the algorithms is provided both in terms of the quality of the obtained strategies, and in terms of implementation of the algorithms including some considerations of the computational complexity

Open Access LMU

Learning to Race through Coordinate Descent Bayesian Optimisation

Author: Grassi Jr Valdir
Guizilini Vitor
Oliveira Rafael
Ott Lionel
Ramos Fabio
Rocha Fernando H. M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/02/2018
Field of study

In the automation of many kinds of processes, the observable outcome can often be described as the combined effect of an entire sequence of actions, or controls, applied throughout its execution. In these cases, strategies to optimise control policies for individual stages of the process might not be applicable, and instead the whole policy might have to be optimised at once. On the other hand, the cost to evaluate the policy's performance might also be high, being desirable that a solution can be found with as few interactions as possible with the real system. We consider the problem of optimising control policies to allow a robot to complete a given race track within a minimum amount of time. We assume that the robot has no prior information about the track or its own dynamical model, just an initial valid driving example. Localisation is only applied to monitor the robot and to provide an indication of its position along the track's centre axis. We propose a method for finding a policy that minimises the time per lap while keeping the vehicle on the track using a Bayesian optimisation (BO) approach over a reproducing kernel Hilbert space. We apply an algorithm to search more efficiently over high-dimensional policy-parameter spaces with BO, by iterating over each dimension individually, in a sequential coordinate descent-like scheme. Experiments demonstrate the performance of the algorithm against other methods in a simulated car racing environment.Comment: Accepted as conference paper for the 2018 IEEE International Conference on Robotics and Automation (ICRA

arXiv.org e-Print Archive

Crossref

Sample Efficient Policy Search for Optimal Stopping Domains

Author: Brunskill Emma
Dann Christoph
Goel Karan
Publication venue
Publication date: 24/05/2017
Field of study

Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields.Comment: To appear in IJCAI-201

arXiv.org e-Print Archive

Crossref