Search CORE

174 research outputs found

Functions that Emerge through End-to-End Reinforcement Learning - The Direction for Artificial General Intelligence -

Author: Shibata Katsunari
Publication venue
Publication date: 16/05/2017
Field of study

Recently, triggered by the impressive results in TV-games or game of Go by Google DeepMind, end-to-end reinforcement learning (RL) is collecting attentions. Although little is known, the author's group has propounded this framework for around 20 years and already has shown various functions that emerge in a neural network (NN) through RL. In this paper, they are introduced again at this timing. "Function Modularization" approach is deeply penetrated subconsciously. The inputs and outputs for a learning system can be raw sensor signals and motor commands. "State space" or "action space" generally used in RL show the existence of functional modules. That has limited reinforcement learning to learning only for the action-planning module. In order to extend reinforcement learning to learning of the entire function on a huge degree of freedom of a massively parallel learning system and to explain or develop human-like intelligence, the author has believed that end-to-end RL from sensors to motors using a recurrent NN (RNN) becomes an essential key. Especially in the higher functions, this approach is very effective by being free from the need to decide their inputs and outputs. The functions that emerge, we have confirmed, through RL using a NN cover a broad range from real robot learning with raw camera pixel inputs to acquisition of dynamic functions in a RNN. Those are (1)image recognition, (2)color constancy (optical illusion), (3)sensor motion (active recognition), (4)hand-eye coordination and hand reaching movement, (5)explanation of brain activities, (6)communication, (7)knowledge transfer, (8)memory, (9)selective attention, (10)prediction, (11)exploration. The end-to-end RL enables the emergence of very flexible comprehensive functions that consider many things in parallel although it is difficult to give the boundary of each function clearly.Comment: The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2017, 5 pages, 4 figure

arXiv.org e-Print Archive

Self-configuration from a Machine-Learning Perspective

Author: Konen Wolfgang
Publication venue
Publication date: 05/09/2011
Field of study

The goal of machine learning is to provide solutions which are trained by data or by experience coming from the environment. Many training algorithms exist and some brilliant successes were achieved. But even in structured environments for machine learning (e.g. data mining or board games), most applications beyond the level of toy problems need careful hand-tuning or human ingenuity (i.e. detection of interesting patterns) or both. We discuss several aspects how self-configuration can help to alleviate these problems. One aspect is the self-configuration by tuning of algorithms, where recent advances have been made in the area of SPO (Sequen- tial Parameter Optimization). Another aspect is the self-configuration by pattern detection or feature construction. Forming multiple features (e.g. random boolean functions) and using algorithms (e.g. random forests) which easily digest many fea- tures can largely increase learning speed. However, a full-fledged theory of feature construction is not yet available and forms a current barrier in machine learning. We discuss several ideas for systematic inclusion of feature construction. This may lead to partly self-configuring machine learning solutions which show robustness, flexibility, and fast learning in potentially changing environments.Comment: 12 pages, 5 figures, Dagstuhl seminar 11181 "Organic Computing - Design of Self-Organizing Systems", May 201

arXiv.org e-Print Archive

DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess

Author: David Eli
Netanyahu Nathan S.
Wolf Lior
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/11/2017
Field of study

We present an end-to-end learning method for chess, relying on deep neural networks. Without any a priori knowledge, in particular without any knowledge regarding the rules of chess, a deep neural network is trained using a combination of unsupervised pretraining and supervised training. The unsupervised training extracts high level features from a given position, and the supervised training learns to compare two chess positions and select the more favorable one. The training relies entirely on datasets of several million chess games, and no further domain specific knowledge is incorporated. The experiments show that the resulting neural network (referred to as DeepChess) is on a par with state-of-the-art chess playing programs, which have been developed through many years of manual feature selection and tuning. DeepChess is the first end-to-end machine learning-based method that results in a grandmaster-level chess playing performance.Comment: Winner of Best Paper Award in ICANN 201

arXiv.org e-Print Archive

Player co-modelling in a strategy board game: discovering how to play fast

Author: Kalles Dimitris
Publication venue
Publication date: 30/11/2006
Field of study

In this paper we experiment with a 2-player strategy board game where playing models are evolved using reinforcement learning and neural networks. The models are evolved to speed up automatic game development based on human involvement at varying levels of sophistication and density when compared to fully autonomous playing. The experimental results suggest a clear and measurable association between the ability to win games and the ability to do that fast, while at the same time demonstrating that there is a minimum level of human involvement beyond which no learning really occurs.Comment: Contains 19 pages, 6 figures, 7 tables. Submitted to a journa

arXiv.org e-Print Archive

Evolution of Neural Networks to Play the Game of Dots-and-Boxes

Author: Bossomaier Terry
Weaver Lex
Publication venue
Publication date: 27/09/1998
Field of study

Dots-and-Boxes is a child's game which remains analytically unsolved. We implement and evolve artificial neural networks to play this game, evaluating them against simple heuristic players. Our networks do not evaluate or predict the final outcome of the game, but rather recommend moves at each stage. Superior generalisation of play by co-evolved populations is found, and a comparison made with networks trained by back-propagation using simple heuristics as an oracle.Comment: 8 pages, 5 figures, LaTeX 2.09 (works with LaTeX2e

arXiv.org e-Print Archive

Simulating Human Grandmasters: Evolution and Coevolution of Evaluation Functions

Author: David Eli
Herik H. Jaap van den
Koppel Moshe
Netanyahu Nathan S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/11/2017
Field of study

This paper demonstrates the use of genetic algorithms for evolving a grandmaster-level evaluation function for a chess program. This is achieved by combining supervised and unsupervised learning. In the supervised learning phase the organisms are evolved to mimic the behavior of human grandmasters, and in the unsupervised learning phase these evolved organisms are further improved upon by means of coevolution. While past attempts succeeded in creating a grandmaster-level program by mimicking the behavior of existing computer chess programs, this paper presents the first successful attempt at evolving a state-of-the-art evaluation function by learning only from databases of games played by humans. Our results demonstrate that the evolved program outperforms a two-time World Computer Chess Champion.Comment: arXiv admin note: substantial text overlap with arXiv:1711.06839, arXiv:1711.0684

arXiv.org e-Print Archive

Genetic Algorithms for Mentor-Assisted Evaluation Function Optimization

Author: David Eli
Koppel Moshe
Netanyahu Nathan S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/11/2017
Field of study

In this paper we demonstrate how genetic algorithms can be used to reverse engineer an evaluation function's parameters for computer chess. Our results show that using an appropriate mentor, we can evolve a program that is on par with top tournament-playing chess programs, outperforming a two-time World Computer Chess Champion. This performance gain is achieved by evolving a program with a smaller number of parameters in its evaluation function to mimic the behavior of a superior mentor which uses a more extensive evaluation function. In principle, our mentor-assisted approach could be used in a wide range of problems for which appropriate mentors are available.Comment: Winner of Best Paper Award in GECCO 2008. arXiv admin note: substantial text overlap with arXiv:1711.06840, arXiv:1711.0684

arXiv.org e-Print Archive

Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems

Author: Botella Guillermo
H. Jose Antonio Martin
Lopez Victoria
Santos Matilde
Publication venue
Publication date: 30/07/2011
Field of study

In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A* can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a model-free online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significantly overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior

arXiv.org e-Print Archive

A Methodology for Learning Players' Styles from Game Records

Author: Fenner Trevor
Levene Mark
Publication venue
Publication date: 16/04/2009
Field of study

We describe a preliminary investigation into learning a Chess player's style from game records. The method is based on attempting to learn features of a player's individual evaluation function using the method of temporal differences, with the aid of a conventional Chess engine architecture. Some encouraging results were obtained in learning the styles of two recent Chess world champions, and we report on our attempt to use the learnt styles to discriminate between the players from game records by trying to detect who was playing white and who was playing black. We also discuss some limitations of our approach and propose possible directions for future research. The method we have presented may also be applicable to other strategic games, and may even be generalisable to other domains where sequences of agents' actions are recorded.Comment: 15 pages, 3 figure

arXiv.org e-Print Archive

Probabilistic Exploration in Planning while Learning

Author: Karakoulas Grigoris I.
Publication venue
Publication date: 20/02/2013
Field of study

Sequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learning and Q-learning in particular. The existing exploration strategies for Q-learning are of a heuristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimentation should be sufficient for selecting with statistical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hill-climbing algorithm that uses a statistical selection procedure to decide how much exploration is needed for selecting a plan which is, with arbitrarily high probability, arbitrarily close to a locally optimal one. Due to its generality the algorithm can be employed for the exploration strategy of robust Q-learning. An experiment on a relatively complex control task shows that the proposed exploration strategy performs better than a typical exploration strategy.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995

arXiv.org e-Print Archive