Search CORE

22 research outputs found

Planification Optimiste dans les Processus Décisionnels de Markov avec Croyance

Author: Busoniu Lucian
Fonteneau Raphaël
Munos Rémi
Publication venue
Publication date: 01/01/2013
Field of study

Cet article décrit l'algorithme BOP (de l'anglais ``Bayesian Optimistic Planning''), un nouvel algorithme d'apprentissage par renforcement Bayésien indirect (c'est à dire fondé sur un modèle). BOP étend l'approche de l'algorithme OP-MDP (de l'anglais ``Optimistic Planning for Markov Decision Processes'', voir [Busoniu2011,Busoniu2012]) au cas où les probabilités de transitions du MDP sous-jacent sont initialement inconnues, et doivent être apprises au travers d'interactions avec l'environnement. Les connaissances sur le MDP sous-jacent sont représentées par une distribution de probabilités sur l'ensemble de tous les modèles de transitions à l'aide de distributions de Dirichlet. L'algorithme BOP planifie dans l'espace augmenté état-croyance obtenu par concaténation du vecteur d'état avec la distribution postérieure sur les modèles de transitions. On montre que BOP atteint l'optimalité Bayésienne lorsque le paramètre de budget tend vers l'infini. Quelques expériences préliminaires montrent des résultats encourageants.Peer reviewe

Open Repository and Bibliography - Liège

Optimistic Planning for Markov Decision Processes

International audienceThe reinforcement learning community has recently intensified its interest in online planning methods, due to their relative independence on the state space size. However, tight near-optimality guarantees are not yet available for the general case of stochastic Markov decision processes and closed-loop, state-dependent planning policies. We therefore consider an algorithm related to AO* that optimistically explores a tree representation of the space of closed-loop policies, and we analyze the near-optimality of the action it returns after n tree node expansions. While this optimistic planning requires a finite number of actions and possible next states for each transition, its asymptotic performance does not depend directly on these numbers, but only on the subset of nodes that significantly impact near-optimal policies. We characterize this set by introducing a novel measure of problem complexity, called the near-optimality exponent. Specializing the exponent and performance bound for some interesting classes of MDPs illustrates the algorithm works better when there are fewer near-optimal policies and less uniform transition probabilities

HAL - Lille 3

Swinburne Research Bank

Should I do that? Using relational reinforcement learning and declarative programming to discover domain axioms

Author: Meadows Ben
Sridharan Mohan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2016
Field of study

Crossref

University of Birmingham Research Portal

Optimistic planning for continuous–action deterministic systems.

Author: Alexander Daniels
Lucian Buşoniu
Robert Babuška
Rémi Munos
Publication venue
Publication date: 01/01/2013
Field of study

Abstract : We consider the optimal control of systems with deterministic dynamics, continuous, possibly large-scale state spaces, and continuous, low-dimensional action spaces. We describe an online planning algorithm called SOOP, which like other algorithms in its class has no direct dependence on the state space structure. Unlike previous algorithms, SOOP explores the true solution space, consisting of infinite sequences of continuous actions, without requiring knowledge about the smoothness of the system. To this end, it borrows the principle of the simultaneous optimistic optimization method, and develops a nontrivial adaptation of this principle to the planning problem. Experiments on four problems show SOOP reliably ranks among the best algorithms, fully dominating competing methods when the problem requires both long horizons and fine discretization

CiteSeerX

Combinando Modelos de Interação para Melhorar a Coordenação em Sistemas Multiagente

Author: Borges André Pinz
Enembreck Fabrício
Ribeiro Richardson
Ronszcka Adriano Francisco
Scalabrin Edson Emílio
Ávila Braulio Coelho
Publication venue: Instituto de Informática - Universidade Federal do Rio Grande do Sul
Publication date: 30/04/2011
Field of study

A contribuição principal deste artigo é a implementação de um método híbrido de coordenação a partir da combinação de modelos de interação desenvolvidos anteriormente. Os modelos de interação são baseados no compartilhamento de recompensas para aprendizagem com múltiplos agentes, no intuito de descobrir de maneira interativa políticas de boa qualidade. A troca de recompensas entre os agentes durante a interação é uma tarefa complexa e se realizada de forma inadequada pode ocasionar atrasos no aprendizado ou até mesmo causar comportamentos inesperados, tornando a cooperação ineficiente e convergindo para uma política não-satisfatória. A partir desses conceitos, o método híbrido utiliza as particularidades de cada modelo, reduzindo possíveis conflitos entre ações com recompensas de políticas diferentes, melhorando a coordenação dos agentes em problemas de aprendizagem por reforço. Resultados experimentais mostram que o método híbrido é capaz de acelerar a convergência, conquistando rapidamente políticas ótimas mesmo em grandes espaços de estados, superando os resultados de abordagens clássicas de aprendizagem por reforço

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Crossref