149,166 research outputs found

    Methods for evaluating Decision Problems with Limited Information

    Get PDF
    LImited Memory Influence Diagrams (LIMIDs) are general models of decision problems for representing limited memory policies (Lauritzen and Nilsson (2001)). The evaluation of LIMIDs can be done by Single Policy Updating that produces a local maximum strategy in which no single policy modification can increase the expected utility. This paper examines the quality of the obtained local maximum strategy and proposes three different methods for evaluating LIMIDs. The first algorithm, Temporal Policy Updating, resembles Single Policy Updating. The second algorithm, Greedy Search, successively updates the policy that gives the highest expected utility improvement. The final algorithm, Simulating Annealing, differs from the two preceeding by allowing the search to take some downhill steps to escape a local maximum. A careful comparison of the algorithms is provided both in terms of the quality of the obtained strategies, and in terms of implementation of the algorithms including some considerations of the computational complexity

    Learning to Race through Coordinate Descent Bayesian Optimisation

    Full text link
    In the automation of many kinds of processes, the observable outcome can often be described as the combined effect of an entire sequence of actions, or controls, applied throughout its execution. In these cases, strategies to optimise control policies for individual stages of the process might not be applicable, and instead the whole policy might have to be optimised at once. On the other hand, the cost to evaluate the policy's performance might also be high, being desirable that a solution can be found with as few interactions as possible with the real system. We consider the problem of optimising control policies to allow a robot to complete a given race track within a minimum amount of time. We assume that the robot has no prior information about the track or its own dynamical model, just an initial valid driving example. Localisation is only applied to monitor the robot and to provide an indication of its position along the track's centre axis. We propose a method for finding a policy that minimises the time per lap while keeping the vehicle on the track using a Bayesian optimisation (BO) approach over a reproducing kernel Hilbert space. We apply an algorithm to search more efficiently over high-dimensional policy-parameter spaces with BO, by iterating over each dimension individually, in a sequential coordinate descent-like scheme. Experiments demonstrate the performance of the algorithm against other methods in a simulated car racing environment.Comment: Accepted as conference paper for the 2018 IEEE International Conference on Robotics and Automation (ICRA

    Sample Efficient Policy Search for Optimal Stopping Domains

    Full text link
    Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields.Comment: To appear in IJCAI-201
    • …
    corecore