Search CORE

4,999 research outputs found

Monte Carlo Approaches to Parameterized Poker Squares

Author: Anton Calin
Arrington Robert
Bogaerts Steven
Castro-Wunsch Karo
Langely Clay
Maga William
Messinger Colin M.
Neller Todd W.
Yang Zuozhi
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 29/06/2016
Field of study

The paper summarized a variety of Monte Carlo approaches employed in the top three performing entries to the Parameterized Poker Squares NSG Challenge competition. In all cases AI players benefited from real-time machine learning and various Monte Carlo game-tree search techniques

Crossref

Gettysburg College

Reinforcement Learning via AIXI Approximation

Author: Hutter Marcus
Ng Kee Siong
Silver David
Veness Joel
Publication venue
Publication date: 01/01/2010
Field of study

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

The Australian National University

Association for the Advancement of Artificial Intelligence: AAAI Publications

Reactive approach for automating exploration and exploitation in ant colony optimization

Author: Allwawi Rafid Sagban Abbood
Publication venue
Publication date: 01/01/2016
Field of study

Ant colony optimization (ACO) algorithms can be used to solve nondeterministic polynomial hard problems. Exploration and exploitation are the main mechanisms in controlling search within the ACO. Reactive search is an alternative technique to maintain the dynamism of the mechanics. However, ACO-based reactive search technique has three (3) problems. First, the memory model to record previous search regions did not completely transfer the neighborhood structures to the next iteration which leads to arbitrary restart and premature local search. Secondly, the exploration indicator is not robust due to the difference of magnitudes in distance matrices for the current population. Thirdly, the parameter control techniques that utilize exploration indicators in their feedback process do not consider the problem of indicator robustness. A reactive ant colony optimization (RACO) algorithm has been proposed to overcome the limitations of the reactive search. RACO consists of three main components. The first component is a reactive max-min ant system algorithm for recording the neighborhood structures. The second component is a statistical machine learning mechanism named ACOustic to produce a robust exploration indicator. The third component is the ACO-based adaptive parameter selection algorithm to solve the parameterization problem which relies on quality, exploration and unified criteria in assigning rewards to promising parameters. The performance of RACO is evaluated on traveling salesman and quadratic assignment problems and compared with eight metaheuristics techniques in terms of success rate, Wilcoxon signed-rank, Chi-square and relative percentage deviation. Experimental results showed that the performance of RACO is superior than the eight (8) metaheuristics techniques which confirmed that RACO can be used as a new direction for solving optimization problems. RACO can be used in providing a dynamic exploration and exploitation mechanism, setting a parameter value which allows an efficient search, describing the amount of exploration an ACO algorithm performs and detecting stagnation situations

Universiti Utara Malaysia: UUM eTheses