44,565 research outputs found
Using Monte Carlo Search With Data Aggregation to Improve Robot Soccer Policies
RoboCup soccer competitions are considered among the most challenging
multi-robot adversarial environments, due to their high dynamism and the
partial observability of the environment. In this paper we introduce a method
based on a combination of Monte Carlo search and data aggregation (MCSDA) to
adapt discrete-action soccer policies for a defender robot to the strategy of
the opponent team. By exploiting a simple representation of the domain, a
supervised learning algorithm is trained over an initial collection of data
consisting of several simulations of human expert policies. Monte Carlo policy
rollouts are then generated and aggregated to previous data to improve the
learned policy over multiple epochs and games. The proposed approach has been
extensively tested both on a soccer-dedicated simulator and on real robots.
Using this method, our learning robot soccer team achieves an improvement in
ball interceptions, as well as a reduction in the number of opponents' goals.
Together with a better performance, an overall more efficient positioning of
the whole team within the field is achieved
Thinking Fast and Slow with Deep Learning and Tree Search
Sequential decision making problems, such as structured prediction, robotic
control, and game playing, require a combination of planning policies and
generalisation of those plans. In this paper, we present Expert Iteration
(ExIt), a novel reinforcement learning algorithm which decomposes the problem
into separate planning and generalisation tasks. Planning new policies is
performed by tree search, while a deep neural network generalises those plans.
Subsequently, tree search is improved by using the neural network policy to
guide search, increasing the strength of new plans. In contrast, standard deep
Reinforcement Learning algorithms rely on a neural network not only to
generalise plans, but to discover them too. We show that ExIt outperforms
REINFORCE for training a neural network to play the board game Hex, and our
final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most
recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters
changed - Repetition of experiments: improved accuracy and errors shown.
(note the reduction in effect size for the tpt/cat experiment) - Results from
a longer training run, including changes in expert strength in training -
Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see
appendix
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Q-CP: Learning Action Values for Cooperative Planning
Research on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a coordination task between two mobile robots entering a door. The obtained results show the effectiveness of Q-CP in the chosen applications, where action values drive the exploration and reduce the computational demand of the planning process while achieving good performance
Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups
Monte Carlo Tree Search (MCTS) has improved the performance of game engines
in domains such as Go, Hex, and general game playing. MCTS has been shown to
outperform classic alpha-beta search in games where good heuristic evaluations
are difficult to obtain. In recent years, combining ideas from traditional
minimax search in MCTS has been shown to be advantageous in some domains, such
as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new
way to use heuristic evaluations to guide the MCTS search by storing the two
sources of information, estimated win rates and heuristic evaluations,
separately. Rather than using the heuristic evaluations to replace the
playouts, our technique backs them up implicitly during the MCTS simulations.
These minimax values are then used to guide future simulations. We show that
using implicit minimax backups leads to stronger play performance in Kalah,
Breakthrough, and Lines of Action.Comment: 24 pages, 7 figures, 9 tables, expanded version of paper presented at
IEEE Conference on Computational Intelligence and Games (CIG) 2014 conferenc
Active End-Effector Pose Selection for Tactile Object Recognition through Monte Carlo Tree Search
This paper considers the problem of active object recognition using touch
only. The focus is on adaptively selecting a sequence of wrist poses that
achieves accurate recognition by enclosure grasps. It seeks to minimize the
number of touches and maximize recognition confidence. The actions are
formulated as wrist poses relative to each other, making the algorithm
independent of absolute workspace coordinates. The optimal sequence is
approximated by Monte Carlo tree search. We demonstrate results in a physics
engine and on a real robot. In the physics engine, most object instances were
recognized in at most 16 grasps. On a real robot, our method recognized objects
in 2--9 grasps and outperformed a greedy baseline.Comment: Accepted to International Conference on Intelligent Robots and
Systems (IROS) 201
- …