906 research outputs found
On the Parallelization of Monte-Carlo planning
International audienceWe provide a parallelization with and without shared-memory for Bandit-Based Monte-Carlo Planning algorithms, applied to the game of Go. The resulting algorithm won the first non-blitz game against a professionnal human player in 9x9 Go
DOP: Deep Optimistic Planning with Approximate Value Function Evaluation
Research on reinforcement learning has demonstrated promising results in manifold applications and domains. Still, efficiently learning effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. multi-agent systems or hyper-redundant robots). To alleviate this problem, we present DOP, a deep model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) plan effective policies. Specifically, we exploit deep neural networks to learn Q-functions that are used to attack the curse of dimensionality during a Monte-Carlo tree search. Our algorithm, in fact, constructs upper confidence bounds on the learned value function to select actions optimistically. We implement and evaluate DOP on different scenarios: (1) a cooperative navigation problem, (2) a fetching task for a 7-DOF KUKA robot, and (3) a human-robot handover with a humanoid robot (both in simulation and real). The obtained results show the effectiveness of DOP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance
A Neural Networks Committee for the Contextual Bandit Problem
This paper presents a new contextual bandit algorithm, NeuralBandit, which
does not need hypothesis on stationarity of contexts and rewards. Several
neural networks are trained to modelize the value of rewards knowing the
context. Two variants, based on multi-experts approach, are proposed to choose
online the parameters of multi-layer perceptrons. The proposed algorithms are
successfully tested on a large dataset with and without stationarity of
rewards.Comment: 21st International Conference on Neural Information Processin
WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving
Machine learning can provide efficient solutions to the complex problems
encountered in autonomous driving, but ensuring their safety remains a
challenge. A number of authors have attempted to address this issue, but there
are few publicly-available tools to adequately explore the trade-offs between
functionality, scalability, and safety.
We thus present WiseMove, a software framework to investigate safe deep
reinforcement learning in the context of motion planning for autonomous
driving. WiseMove adopts a modular learning architecture that suits our current
research questions and can be adapted to new technologies and new questions. We
present the details of WiseMove, demonstrate its use on a common traffic
scenario, and describe how we use it in our ongoing safe learning research
Maximum a Posteriori Estimation by Search in Probabilistic Programs
We introduce an approximate search algorithm for fast maximum a posteriori
probability estimation in probabilistic programs, which we call Bayesian ascent
Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with
varying number of mutually dependent finite, countable, and continuous random
variables. BaMC is an anytime MAP search algorithm applicable to any
combination of random variables and dependencies. We compare BaMC to other MAP
estimation algorithms and show that BaMC is faster and more robust on a range
of probabilistic models.Comment: To appear in proceedings of SOCS1
- …