172 research outputs found

    Near-Optimal BRL using Optimistic Local Transitions

    Get PDF
    Model-based Bayesian Reinforcement Learning (BRL) allows a found formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the exploration-exploitation dilemma. However, algorithms explicitly addressing BRL suffer from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces BOLT, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze BOLT's sample complexity, and show that under certain parameters, the algorithm is near-optimal in the Bayesian sense with high probability. Then, experimental results highlight the key differences of this method compared to previous work.Comment: ICML201

    Near-Optimal BRL using Optimistic Local Transitions (Extended Version)

    Get PDF
    Model-based Bayesian Reinforcement Learning (BRL) allows a sound formalization of the problem of acting optimally while facing an unknown environment, i.e., avoiding the exploration-exploitation dilemma. However, algorithms explicitly addressing BRL suffer from such a combinatorial explosion that a large body of work relies on heuristic algorithms. This paper introduces bolt, a simple and (almost) deterministic heuristic algorithm for BRL which is optimistic about the transition function. We analyze bolt's sample complexity, and show that under certain parameters, the algorithm is near-optimal in the Bayesian sense with high probability. Then, experimental results highlight the key differences of this method compared to previous work.L'apprentissage par renforcement bayésien basé modèle (BRL) permet une formalisation saine du problème consistant à agir optimalement face à un environnement inconnu, c'est-à-dire en évitant le dilemme exploration-exploitation. Toutefois, les algorithmes s'attaquant explicitement au BRL souffrent d'une telle explosion combinatoire qu'un grand nombre de travaux repose sur des algorithmes heuristiques. Cet article introduit bolt, un algorithme heuristique simple et (presque) déterministe pour le BRL qui est optimiste vis à vis de la fonction de transition. Nous analysons la complexité d'échantillon de bolt et montrons que, pour certains paramètres, l'algorithme est quasi-optimal au sens bayésien avec une grande probabilité. Puis, des résultats expérimentaux mettent en valeur les principales différences entre cette méthode et des travaux antérieurs

    Planification Optimiste dans les Processus Décisionnels de Markov avec Croyance

    Full text link
    Cet article décrit l'algorithme BOP (de l'anglais ``Bayesian Optimistic Planning''), un nouvel algorithme d'apprentissage par renforcement Bayésien indirect (c'est à dire fondé sur un modèle). BOP étend l'approche de l'algorithme OP-MDP (de l'anglais ``Optimistic Planning for Markov Decision Processes'', voir [Busoniu2011,Busoniu2012]) au cas où les probabilités de transitions du MDP sous-jacent sont initialement inconnues, et doivent être apprises au travers d'interactions avec l'environnement. Les connaissances sur le MDP sous-jacent sont représentées par une distribution de probabilités sur l'ensemble de tous les modèles de transitions à l'aide de distributions de Dirichlet. L'algorithme BOP planifie dans l'espace augmenté état-croyance obtenu par concaténation du vecteur d'état avec la distribution postérieure sur les modèles de transitions. On montre que BOP atteint l'optimalité Bayésienne lorsque le paramètre de budget tend vers l'infini. Quelques expériences préliminaires montrent des résultats encourageants.Peer reviewe

    Bayesian Reinforcement Learning via Deep, Sparse Sampling

    Full text link
    We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward in discrete environments.Comment: Published in AISTATS 202

    Benchmarking for Bayesian Reinforcement Learning

    Full text link
    In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but even though a few toy examples exist in the literature, there are still no extensive or rigorous benchmarks to compare them. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.Comment: 37 page

    Efficient Bayesian Planning

    Get PDF
    Artificial Intelligence (AI) is a long-studied and yet very active field of research. The list of things differentiating humans from AI grows thinner but the dream of an artificial general intelligence remains elusive. Sequential Decision Making is a subfield of AI that poses a seemingly benign question ``How to act optimally in an unknown environment?\u27\u27. This requires the AI agent to learn about its environment as well as plan an action sequence given its current knowledge about it. The two common problem settings are partial observability and unknown environment dynamics. Bayesian planning deals with these issues by simultaneously defining a single planning problem which considers the simultaneous effects of an action on both learning and goal search. The technique involves dealing with infinite tree data structures which are hard to store but essential for computing the optimal plan. Finally, we consider the minimax setting where the Bayesian prior is chosen by an adversary and therefore a worst case policy needs to be found.In this thesis, we present novel Bayesian planning algorithms. First, we propose DSS (Deeper, Sparser Sampling) for the case of unknown environment dynamics. It is a meta-algorithm derived from a simple insight about the Bayes rule, which beats the state-of-the-art across the board from discrete to continuous state settings. A theoretical analysis provides a high probability bound on its performance. Our analysis is different from previous approaches in the literature in terms of problem formulation and formal guarantees. The result also contrasts with those of previous comparable BRL algorithms, which typically provide asymptotic convergence guarantees. Suitable Bayesian models and their corresponding planners are proposed for implementing the discrete and continuous versions of DSS. We then address the issue of partial observability via our second algorithm, FMP (Finite Memory Planner). This uses depth-dependent partitioning of the infinite planning tree. Experimental results demonstrate comparable performance to the current state-of-the-art for both discrete and continuous settings. Finally, we propose algorithms for finding the best policy for the worst case belief in the Minimax Bayesian setting

    Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration

    Full text link
    In this paper, we present a robotic model-based reinforcement learning method that combines ideas from model identification and model predictive control. We use a feature-based representation of the dynamics that allows the dynamics model to be fitted with a simple least squares procedure, and the features are identified from a high-level specification of the robot's morphology, consisting of the number and connectivity structure of its links. Model predictive control is then used to choose the actions under an optimistic model of the dynamics, which produces an efficient and goal-directed exploration strategy. We present real time experimental results on standard benchmark problems involving the pendulum, cartpole, and double pendulum systems. Experiments indicate that our method is able to learn a range of benchmark tasks substantially faster than the previous best methods. To evaluate our approach on a realistic robotic control task, we also demonstrate real time control of a simulated 7 degree of freedom arm.Comment: 8 page

    HRA*: Hybrid randomized path planning for complex 3D environments

    Get PDF
    Trabajo presentado al IROS celebrado en Tokyo del 3 al 7 de noviembre de 2013.We propose HRA*, a new randomized path planner for complex 3D environments. The method is a modified A* algorithm that uses a hybrid node expansion technique that combines a random exploration of the action space meeting vehicle kinematic constraints with a cost to goal metric that considers only kinematically feasible paths to the goal. The method includes also a series of heuristics to accelerate the search time. These include a cost penalty near obstacles, and a filter to prevent revisiting configurations. The performance of the method is compared against A*, RRT and RRT* in a series of challenging 3D outdoor datasets. HRA* is shown to outperform all of them in computation time, and delivering shorter paths than A* and RRT.This work has been partially supported by the Mexican Council of Science and Technology with a PhD Scholarship to Ernesto Teniente, by the Spanish Ministry of Science and Innovation under project DPI-2011-27510 and by the EU project ARCAS FP7-287617.Peer Reviewe

    HRA*: hybrid randomized path planning for complex 3D environments

    Get PDF
    We propose HRA*, a new randomized path planner for complex 3D environments. The method is a modified A* algorithm that uses a hybrid node expansion technique that combines a random exploration of the action space meeting vehicle kinematic constraints with a cost to goal metric that considers only kinematically feasible paths to the goal. The method includes also a series of heuristics to accelerate the search time. These include a cost penalty near obstacles, and a filter to prevent revisiting configurations. The performance of the method is compared against A*, RRT and RRT* in a series of challenging 3D outdoor datasets. HRA* is shown to outperform all of them in computation time, and delivering shorter paths than A* and RRPostprint (author's final draft
    corecore