673 research outputs found
Abalearn: a risk-sensitive approach to self-play learning in Abalone
This paper presents Abalearn, a self-teaching Abalone pro gram capable of automatically reaching an intermediate level of play
without needing expert-labeled training examples, deep searches or ex posure to competent play.
Our approach is based on a reinforcement learning algorithm that is risk seeking, since defensive players in Abalone tend to never end a game.
We show that it is the risk-sensitivity that allows a successful self-play
training. We also propose a set of features that seem relevant for achiev ing a good level of play.
We evaluate our approach using a fixed heuristic opponent as a bench mark, pitting our agents against human players online and comparing
samples of our agents at different times of training.info:eu-repo/semantics/publishedVersio
The Hanabi Challenge: A New Frontier for AI Research
From the early days of computing, games have been important testbeds for
studying how well machines can do sophisticated decision making. In recent
years, machine learning has made dramatic advances with artificial agents
reaching superhuman performance in challenge domains like Go, Atari, and some
variants of poker. As with their predecessors of chess, checkers, and
backgammon, these game domains have driven research by providing sophisticated
yet well-defined challenges for artificial intelligence practitioners. We
continue this tradition by proposing the game of Hanabi as a new challenge
domain with novel problems that arise from its combination of purely
cooperative gameplay with two to five players and imperfect information. In
particular, we argue that Hanabi elevates reasoning about the beliefs and
intentions of other agents to the foreground. We believe developing novel
techniques for such theory of mind reasoning will not only be crucial for
success in Hanabi, but also in broader collaborative efforts, especially those
with human partners. To facilitate future research, we introduce the
open-source Hanabi Learning Environment, propose an experimental framework for
the research community to evaluate algorithmic advances, and assess the
performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence
Session 5: Development, Neuroscience and Evolutionary Psychology
Proceedings of the Pittsburgh Workshop in History and Philosophy of Biology, Center for Philosophy of Science, University of Pittsburgh, March 23-24 2001 Session 5: Development, Neuroscience and Evolutionary Psycholog
Temporal Difference Learning in Complex Domains
PhDThis thesis adapts and improves on the methods of TD(k) (Sutton 1988) that were
successfully used for backgammon (Tesauro 1994) and applies them to other complex
games that are less amenable to simple pattem-matching approaches. The games
investigated are chess and shogi, both of which (unlike backgammon) require
significant amounts of computational effort to be expended on search in order to
achieve expert play. The improved methods are also tested in a non-game domain.
In the chess domain, the adapted TD(k) method is shown to successfully learn the
relative values of the pieces, and matches using these learnt piece values indicate that
they perform at least as well as piece values widely quoted in elementary chess books.
The adapted TD(X) method is also shown to work well in shogi, considered by many
researchers to be the next challenge for computer game-playing, and for which there
is no standardised set of piece values.
An original method to automatically set and adjust the major control parameters used
by TD(k) is presented. The main performance advantage comes from the learning
rate adjustment, which is based on a new concept called temporal coherence.
Experiments in both chess and a random-walk domain show that the temporal
coherence algorithm produces both faster learning and more stable values than both
human-chosen parameters and an earlier method for learning rate adjustment.
The methods presented in this thesis allow programs to learn with as little input of
external knowledge as possible, exploring the domain on their own rather than by
being taught. Further experiments show that the method is capable of handling many
hundreds of weights, and that it is not necessary to perform deep searches during the
leaming phase in order to learn effective weight
Policy Improvement in Cribbage
Cribbage is a card game involving multiple methods of scoring which each receive varying emphasis over the course of a typical game. Reinforcement learning is a machine learning strategy in which an agent learns to accomplish a task via direct experience by collecting rewards based on performance. In this thesis, reinforcement learning is applied to the game of cribbage, improving an agent’s policy of combining multiple basic strategies, according to the needs of the dynamic state of the game. From inspection, a reasonable policy is learned by the agent over the course of a million games, but an increase in performance was not demonstrated
M2ICAL: A technique for analyzing imperfect comparison algorithms using Markov chains
Ph.DDOCTOR OF PHILOSOPH
- …