Search CORE

388 research outputs found

Learning to Play Othello with N-Tuple Systems

Author: Lucas Simon M
Publication venue
Publication date: 01/01/2008
Field of study

This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously de-veloped weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games of self-play learning. The conclusion is that n-tuple networks learn faster and better than the other more conventional approaches

University of Essex Research Repository

CiteSeerX

Temporal difference learning with interpolated table value functions

Author: Lucas Simon M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/10/2009
Field of study

This paper introduces a novel function approximation architecture especially well suited to temporal difference learning. The architecture is based on using sets of interpolated table look-up functions. These offer rapid and stable learning, and are efficient when the number of inputs is small. An empirical investigation is conducted to test their performance on a supervised learning task, and on themountain car problem, a standard reinforcement learning benchmark. In each case, the interpolated table functions offer competitive performance. ©2009 IEEE

University of Essex Research Repository

Crossref

Investigating learning rates for evolution and temporal difference learning

Author: Lucas Simon M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2008
Field of study

Evidently, any learning algorithm can only learn on the basis of the information given to it. This paper presents a first attempt to place an upper bound on the information rates attainable with standard co-evolution and with TDL. The upper bound for TDL is shown to be much higher than for coevolution. Under commonly used settings for learning to play Othello for example, TDL may have an upper bound that is hundreds or even thousands of times higher than that of coevolution. To test how well these bounds correlate with actual learning rates, a simple two-player game called Treasure Hunt. is developed. While the upper bounds cannot be used to predict the number of games required to learn the optimal policy, they do correctly predict the rank order of the number of games required by each algorithm. © 2008 IEEE

University of Essex Research Repository

Crossref

Warm-Start AlphaZero Self-Play Search Enhancements

Author: C Browne
CD Rosin
D Silver
D Silver
D Silver
EA Heinz
G Tesauro
H Wang
J Schmidhuber
J Tao
LV Allis
M Buro
MA Wiering
ML Zhang
N Justesen
N Srivastava
O Vinyals
R Coulom
R Coulom
RD Gaina
S Gelly
S Iwata
S Reisch
SY Chong
TP Runarsson
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2020
Field of study

Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Abalearn: a risk-sensitive approach to self-play learning in Abalone

Author: Campos Pedro
Langlois Thibault
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

This paper presents Abalearn, a self-teaching Abalone pro gram capable of automatically reaching an intermediate level of play without needing expert-labeled training examples, deep searches or ex posure to competent play. Our approach is based on a reinforcement learning algorithm that is risk seeking, since defensive players in Abalone tend to never end a game. We show that it is the risk-sensitivity that allows a successful self-play training. We also propose a set of features that seem relevant for achiev ing a good level of play. We evaluate our approach using a fixed heuristic opponent as a bench mark, pitting our agents against human players online and comparing samples of our agents at different times of training.info:eu-repo/semantics/publishedVersio

Repositório Digital da Universidade da Madeira

Playing Tic-Tac-Toe Using Genetic Neural Network with Double Transfer functions

Author: Lam HK
Ling SS
Publication venue: 'Scientific Research Publishing, Inc.'
Publication date: 01/01/2011
Field of study

Computational intelligence is a powerful tool for game development. In this paper, an algorithm of playing the game Tic-Tac-Toe with computational intelligence is developed. This algorithm is learned by a Neural Network with Double Transfer functions (NNDTF), which is trained by genetic algorithm (GA). In the NNDTF, the neuron has two transfer functions and exhibits a node-to-node relationship in the hidden layer that enhances the learning ability of the network. A Tic-Tac-Toe game is used to show that the NNDTF provide a better performance than the traditional neural network does

Crossref

OPUS - University of Technology Sydney

King's Research Portal

Self-Conscious Emotions and the Right Fronto-Temporal and Right Temporal Parietal Junction

Author: Ahmad Nathira
Archer Qiana
Castaneda Ray Nunez
Keenan Julian
LaVarco Adriana
Minervini Anthony
Pardillo Matthew
Publication venue: Montclair State University Digital Commons
Publication date: 20/01/2022
Field of study

For more than two decades, research focusing on both clinical and non-clinical populations has suggested a key role for specific regions in the regulation of self-conscious emotions. It is speculated that both the expression and the interpretation of self-conscious emotions are critical in humans for action planning and response, communication, learning, parenting, and most social encounters. Empathy, Guilt, Jealousy, Shame, and Pride are all categorized as self-conscious emotions, all of which are crucial components to one’s sense of self. There has been an abundance of evidence pointing to the right Fronto-Temporal involvement in the integration of cognitive processes underlying the expression of these emotions. Numerous regions within the right hemisphere have been identified including the right temporal parietal junction (rTPJ), the orbitofrontal cortex (OFC), and the inferior parietal lobule (IPL). In this review, we aim to investigate patient cases, in addition to clinical and non-clinical studies. We also aim to highlight these specific brain regions pivotal to the right hemispheric dominance observed in the neural correlates of such self-conscious emotions and provide the potential role that self-conscious emotions play in evolution

Montclair State University Digital Commons

PubMed Central