174 research outputs found
Functions that Emerge through End-to-End Reinforcement Learning - The Direction for Artificial General Intelligence -
Recently, triggered by the impressive results in TV-games or game of Go by
Google DeepMind, end-to-end reinforcement learning (RL) is collecting
attentions. Although little is known, the author's group has propounded this
framework for around 20 years and already has shown various functions that
emerge in a neural network (NN) through RL. In this paper, they are introduced
again at this timing.
"Function Modularization" approach is deeply penetrated subconsciously. The
inputs and outputs for a learning system can be raw sensor signals and motor
commands. "State space" or "action space" generally used in RL show the
existence of functional modules. That has limited reinforcement learning to
learning only for the action-planning module. In order to extend reinforcement
learning to learning of the entire function on a huge degree of freedom of a
massively parallel learning system and to explain or develop human-like
intelligence, the author has believed that end-to-end RL from sensors to motors
using a recurrent NN (RNN) becomes an essential key. Especially in the higher
functions, this approach is very effective by being free from the need to
decide their inputs and outputs.
The functions that emerge, we have confirmed, through RL using a NN cover a
broad range from real robot learning with raw camera pixel inputs to
acquisition of dynamic functions in a RNN. Those are (1)image recognition,
(2)color constancy (optical illusion), (3)sensor motion (active recognition),
(4)hand-eye coordination and hand reaching movement, (5)explanation of brain
activities, (6)communication, (7)knowledge transfer, (8)memory, (9)selective
attention, (10)prediction, (11)exploration. The end-to-end RL enables the
emergence of very flexible comprehensive functions that consider many things in
parallel although it is difficult to give the boundary of each function
clearly.Comment: The Multi-disciplinary Conference on Reinforcement Learning and
Decision Making (RLDM) 2017, 5 pages, 4 figure
Self-configuration from a Machine-Learning Perspective
The goal of machine learning is to provide solutions which are trained by
data or by experience coming from the environment. Many training algorithms
exist and some brilliant successes were achieved. But even in structured
environments for machine learning (e.g. data mining or board games), most
applications beyond the level of toy problems need careful hand-tuning or human
ingenuity (i.e. detection of interesting patterns) or both. We discuss several
aspects how self-configuration can help to alleviate these problems. One aspect
is the self-configuration by tuning of algorithms, where recent advances have
been made in the area of SPO (Sequen- tial Parameter Optimization). Another
aspect is the self-configuration by pattern detection or feature construction.
Forming multiple features (e.g. random boolean functions) and using algorithms
(e.g. random forests) which easily digest many fea- tures can largely increase
learning speed. However, a full-fledged theory of feature construction is not
yet available and forms a current barrier in machine learning. We discuss
several ideas for systematic inclusion of feature construction. This may lead
to partly self-configuring machine learning solutions which show robustness,
flexibility, and fast learning in potentially changing environments.Comment: 12 pages, 5 figures, Dagstuhl seminar 11181 "Organic Computing -
Design of Self-Organizing Systems", May 201
DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess
We present an end-to-end learning method for chess, relying on deep neural
networks. Without any a priori knowledge, in particular without any knowledge
regarding the rules of chess, a deep neural network is trained using a
combination of unsupervised pretraining and supervised training. The
unsupervised training extracts high level features from a given position, and
the supervised training learns to compare two chess positions and select the
more favorable one. The training relies entirely on datasets of several million
chess games, and no further domain specific knowledge is incorporated.
The experiments show that the resulting neural network (referred to as
DeepChess) is on a par with state-of-the-art chess playing programs, which have
been developed through many years of manual feature selection and tuning.
DeepChess is the first end-to-end machine learning-based method that results in
a grandmaster-level chess playing performance.Comment: Winner of Best Paper Award in ICANN 201
Player co-modelling in a strategy board game: discovering how to play fast
In this paper we experiment with a 2-player strategy board game where playing
models are evolved using reinforcement learning and neural networks. The models
are evolved to speed up automatic game development based on human involvement
at varying levels of sophistication and density when compared to fully
autonomous playing. The experimental results suggest a clear and measurable
association between the ability to win games and the ability to do that fast,
while at the same time demonstrating that there is a minimum level of human
involvement beyond which no learning really occurs.Comment: Contains 19 pages, 6 figures, 7 tables. Submitted to a journa
Evolution of Neural Networks to Play the Game of Dots-and-Boxes
Dots-and-Boxes is a child's game which remains analytically unsolved. We
implement and evolve artificial neural networks to play this game, evaluating
them against simple heuristic players. Our networks do not evaluate or predict
the final outcome of the game, but rather recommend moves at each stage.
Superior generalisation of play by co-evolved populations is found, and a
comparison made with networks trained by back-propagation using simple
heuristics as an oracle.Comment: 8 pages, 5 figures, LaTeX 2.09 (works with LaTeX2e
Simulating Human Grandmasters: Evolution and Coevolution of Evaluation Functions
This paper demonstrates the use of genetic algorithms for evolving a
grandmaster-level evaluation function for a chess program. This is achieved by
combining supervised and unsupervised learning. In the supervised learning
phase the organisms are evolved to mimic the behavior of human grandmasters,
and in the unsupervised learning phase these evolved organisms are further
improved upon by means of coevolution.
While past attempts succeeded in creating a grandmaster-level program by
mimicking the behavior of existing computer chess programs, this paper presents
the first successful attempt at evolving a state-of-the-art evaluation function
by learning only from databases of games played by humans. Our results
demonstrate that the evolved program outperforms a two-time World Computer
Chess Champion.Comment: arXiv admin note: substantial text overlap with arXiv:1711.06839,
arXiv:1711.0684
Genetic Algorithms for Mentor-Assisted Evaluation Function Optimization
In this paper we demonstrate how genetic algorithms can be used to reverse
engineer an evaluation function's parameters for computer chess. Our results
show that using an appropriate mentor, we can evolve a program that is on par
with top tournament-playing chess programs, outperforming a two-time World
Computer Chess Champion. This performance gain is achieved by evolving a
program with a smaller number of parameters in its evaluation function to mimic
the behavior of a superior mentor which uses a more extensive evaluation
function. In principle, our mentor-assisted approach could be used in a wide
range of problems for which appropriate mentors are available.Comment: Winner of Best Paper Award in GECCO 2008. arXiv admin note:
substantial text overlap with arXiv:1711.06840, arXiv:1711.0684
Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems
In a Role-Playing Game, finding optimal trajectories is one of the most
important tasks. In fact, the strategy decision system becomes a key component
of a game engine. Determining the way in which decisions are taken (online,
batch or simulated) and the consumed resources in decision making (e.g.
execution time, memory) will influence, in mayor degree, the game performance.
When classical search algorithms such as A* can be used, they are the very
first option. Nevertheless, such methods rely on precise and complete models of
the search space, and there are many interesting scenarios where their
application is not possible. Then, model free methods for sequential decision
making under uncertainty are the best choice. In this paper, we propose a
heuristic planning strategy to incorporate the ability of heuristic-search in
path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does,
selects branches more likely to produce outcomes than other branches. Besides,
it has the advantages of being a model-free online reinforcement learning
algorithm. The proposal was evaluated against the one-step Q-Learning and
Dyna-Q algorithms obtaining excellent experimental results: Dyna-H
significantly overcomes both methods in all experiments. We suggest also, a
functional analogy between the proposed sampling from worst trajectories
heuristic and the role of dreams (e.g. nightmares) in human behavior
A Methodology for Learning Players' Styles from Game Records
We describe a preliminary investigation into learning a Chess player's style
from game records. The method is based on attempting to learn features of a
player's individual evaluation function using the method of temporal
differences, with the aid of a conventional Chess engine architecture. Some
encouraging results were obtained in learning the styles of two recent Chess
world champions, and we report on our attempt to use the learnt styles to
discriminate between the players from game records by trying to detect who was
playing white and who was playing black. We also discuss some limitations of
our approach and propose possible directions for future research. The method we
have presented may also be applicable to other strategic games, and may even be
generalisable to other domains where sequences of agents' actions are recorded.Comment: 15 pages, 3 figure
Probabilistic Exploration in Planning while Learning
Sequential decision tasks with incomplete information are characterized by
the exploration problem; namely the trade-off between further exploration for
learning more about the environment and immediate exploitation of the accrued
information for decision-making. Within artificial intelligence, there has been
an increasing interest in studying planning-while-learning algorithms for these
decision tasks. In this paper we focus on the exploration problem in
reinforcement learning and Q-learning in particular. The existing exploration
strategies for Q-learning are of a heuristic nature and they exhibit limited
scaleability in tasks with large (or infinite) state and action spaces.
Efficient experimentation is needed for resolving uncertainties when possible
plans are compared (i.e. exploration). The experimentation should be sufficient
for selecting with statistical significance a locally optimal plan (i.e.
exploitation). For this purpose, we develop a probabilistic hill-climbing
algorithm that uses a statistical selection procedure to decide how much
exploration is needed for selecting a plan which is, with arbitrarily high
probability, arbitrarily close to a locally optimal one. Due to its generality
the algorithm can be employed for the exploration strategy of robust
Q-learning. An experiment on a relatively complex control task shows that the
proposed exploration strategy performs better than a typical exploration
strategy.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence (UAI1995
- …