14,098 research outputs found
Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups
Monte Carlo Tree Search (MCTS) has improved the performance of game engines
in domains such as Go, Hex, and general game playing. MCTS has been shown to
outperform classic alpha-beta search in games where good heuristic evaluations
are difficult to obtain. In recent years, combining ideas from traditional
minimax search in MCTS has been shown to be advantageous in some domains, such
as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new
way to use heuristic evaluations to guide the MCTS search by storing the two
sources of information, estimated win rates and heuristic evaluations,
separately. Rather than using the heuristic evaluations to replace the
playouts, our technique backs them up implicitly during the MCTS simulations.
These minimax values are then used to guide future simulations. We show that
using implicit minimax backups leads to stronger play performance in Kalah,
Breakthrough, and Lines of Action.Comment: 24 pages, 7 figures, 9 tables, expanded version of paper presented at
IEEE Conference on Computational Intelligence and Games (CIG) 2014 conferenc
Detecting and Characterizing Small Dense Bipartite-like Subgraphs by the Bipartiteness Ratio Measure
We study the problem of finding and characterizing subgraphs with small
\textit{bipartiteness ratio}. We give a bicriteria approximation algorithm
\verb|SwpDB| such that if there exists a subset of volume at most and
bipartiteness ratio , then for any , it finds a set
of volume at most and bipartiteness ratio at most
. By combining a truncation operation, we give a local
algorithm \verb|LocDB|, which has asymptotically the same approximation
guarantee as the algorithm \verb|SwpDB| on both the volume and bipartiteness
ratio of the output set, and runs in time
, independent of the size of the
graph. Finally, we give a spectral characterization of the small dense
bipartite-like subgraphs by using the th \textit{largest} eigenvalue of the
Laplacian of the graph.Comment: 17 pages; ISAAC 201
Learn to Interpret Atari Agents
Deep Reinforcement Learning (DeepRL) agents surpass human-level performances
in a multitude of tasks. However, the direct mapping from states to actions
makes it hard to interpret the rationale behind the decision making of agents.
In contrast to previous a-posteriori methods of visualizing DeepRL policies, we
propose an end-to-end trainable framework based on Rainbow, a representative
Deep Q-Network (DQN) agent. Our method automatically learns important regions
in the input domain, which enables characterizations of the decision making and
interpretations for non-intuitive behaviors. Hence we name it Region Sensitive
Rainbow (RS-Rainbow). RS-Rainbow utilizes a simple yet effective mechanism to
incorporate visualization ability into the learning model, not only improving
model interpretability, but leading to improved performance. Extensive
experiments on the challenging platform of Atari 2600 demonstrate the
superiority of RS-Rainbow. In particular, our agent achieves state of the art
at just 25% of the training frames. Demonstrations and code are available at
https://github.com/yz93/Learn-to-Interpret-Atari-Agents
- …