Search CORE

1,622 research outputs found

A Study of Arc Strong Connectivity of Digraphs

Author: Anderson Janet
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2017
Field of study

My dissertation research was motivated by Matula and his study of a quantity he called the strength of a graph G, kappa\u27( G) = max{lcub}kappa\u27(H) : H G{rcub}. (Abstract shortened by ProQuest.)

The Research Repository @ WVU (West Virginia University)

An efficient algorithm for learning with semi-bandit feedback

Author: A. György
A. Kalai
C. Allenberg
D. Suehiro
E. Takimoto
H.B. McMahan
J. Hannan
J. Poland
J.-Y. Audibert
N. Cesa-Bianchi
N. Cesa-Bianchi
P. Auer
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

arXiv.org e-Print Archive

Crossref

Quoridor agent using Monte Carlo Tree Search

Author: Massagué Respall Victor
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/06/2018
Field of study

This thesis presents a preliminary study using Monte Carlo Tree Search (MCTS) upon the board game of Quoridor. The system is shown to perform well against current existing methods, defeating a set of player agents drawn from an existing digital implementation

UPCommons. Portal del coneixement obert de la UPC

A Deep Learning Agent for Games with Hidden Information

Author: Mills Robert A
Publication venue: Bard Digital Commons
Publication date: 01/01/2018
Field of study

The goal of this project is to develop an agent capable of playing a particular game at an above average human level. In order to do so we investigated reinforcement and deep learning techniques for making decisions in discrete action spaces with hidden information. The methods we used to accomplish this goal include a standard word2vec implementation, an alpha-beta minimax tree search, and an LSTM network to evaluate game states. Given just the rules of the game and a vector representation of the game states, the agent learned to play the game by competitive self play. The emergent behavior from these techniques was compared to human play

Bard College

Applications of network optimization

Author
Publication venue: Alfred P. Sloan School of Management, Massachusetts Institute of Technology, 1992.
Publication date: 29/04/2003
Field of study

Includes bibliographical references (p. 41-48).Ravindra K. Ahuja ... [et al.]

DSpace@MIT

Artificial Intelligence Techniques Applied To Draughts

Author: ALLSOP DANIEL,DAVID
Publication venue
Publication date: 01/01/2013
Field of study

This thesis documents the work done to develop a draughts playing program that learns game strategies utilising various Artificial Intelligence (AI) techniques with the goal of being able to play draughts at a reasonably high skill level as a result of having played against itself without external guidance. Context/Background: AI is a fast evolving field of study. The motivation being programming computers to learn from experience should eventually eliminate the need for this detailed, time consuming, and costly programming effort currently required to program solutions to problems. Aims: The aim is to investigate a variety of AI techniques. The program’s effectiveness will be assessed in both evaluating moves and playing a computationally intensive game. Minimax based algorithms together with a basic scoring heuristic are used to evaluate enough of the game tree to pick high utility moves. Later the scoring heuristic is augmented using artificial intelligence techniques. As a result of this training “smart scoring behaviour” the program is expected to learn how to best assign values to each of the squares on the draughts board enabling it to play at an adequately high skill level. Method: In this thesis a version of the board game Draughts is implemented in the Java programming language. Players were developed using a variety of techniques. These algorithms were tested by comparing running times, number of nodes of the game tree searched and the utility of the moves picked. In addition an algorithm is developed to assign scores to given board states using a genetic algorithm. Results: The project was a success for the most part permitting the creation of the game of draughts in the JAVA programming language. Four out of the five proposed move selection techniques were successfully tested in isolation. Finally the genetic algorithm demonstrated the ability to augment the scoring heuristic without the benefit of external guidance in the form of human experience

Durham e-Theses

Expert iteration

Author: Anthony Thomas William
Publication venue: UCL (University College London)
Publication date: 28/03/2021
Field of study

In this thesis, we study how reinforcement learning algorithms can tackle classical board games without recourse to human knowledge. Specifically, we develop a framework and algorithms which learn to play the board game Hex starting from random play. We first describe Expert Iteration (ExIt), a novel reinforcement learning framework which extends Modified Policy Iteration. ExIt explicitly decomposes the reinforcement learning problem into two parts: planning and generalisation. A planning algorithm explores possible move sequences starting from a particular position to find good strategies from that position, while a parametric function approximator is trained to predict those plans, generalising to states not yet seen. Subsequently, planning is improved by using the approximated policy to guide search, increasing the strength of new plans. This decomposition allows ExIt to combine the benefits of both planning methods and function approximation methods. We demonstrate the effectiveness of the ExIt paradigm by implementing ExIt with two different planning algorithms. First, we develop a version based on Monte Carlo Tree Search (MCTS), a search algorithm which has been successful both in specific games, such as Go, Hex and Havannah, and in general game playing competitions. We then develop a new planning algorithm, Policy Gradient Search (PGS), which uses a model-free reinforcement learning algorithm for online planning. Unlike MCTS, PGS does not require an explicit search tree. Instead PGS uses function approximation within a single search, allowing it to be applied to problems with larger branching factors. Both MCTS-ExIt and PGS-ExIt defeated MoHex 2.0 - the most recent Hex Olympiad winner to be open sourced - in 9 × 9 Hex. More importantly, whereas MoHex makes use of many Hex-specific improvements and knowledge, all our programs were trained tabula rasa using general reinforcement learning methods. This bodes well for ExIt’s applicability to both other games and real world decision making problems

UCL Discovery