3,635 research outputs found
Learning Mazes with Aliasing States: An LCS Algorithm with Associative Perception
Learning classifier systems (LCSs) belong to a class of algorithms based on the principle of self-organization and have frequently been applied to the task of solving mazes, an important type of reinforcement learning (RL) problem. Maze problems represent a simplified virtual model of real environments that can be used for developing core algorithms of many real-world applications related to the problem of navigation. However, the best achievements of LCSs in maze problems are still mostly bounded to non-aliasing environments, while LCS complexity seems to obstruct a proper analysis of the reasons of failure. We construct a new LCS agent that has a simpler and more transparent performance mechanism, but that can still solve mazes better than existing algorithms. We use the structure of a predictive LCS model, strip out the evolutionary mechanism, simplify the reinforcement learning procedure and equip the agent with the ability of associative perception, adopted from psychology. To improve our understanding of the nature and structure of maze environments, we analyze mazes used in research for the last two decades, introduce a set of maze complexity characteristics, and develop a set of new maze environments. We then run our new LCS with associative perception through the old and new aliasing mazes, which represent partially observable Markov decision problems (POMDP) and demonstrate that it performs at least as well as, and in some cases better than, other published systems
Market-based reinforcement learning in partially observable worlds
Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a recent approach to market-based RL and for the first time evaluate it in a toy POMDP setting.This work was supported by SNF grants 21-55409.98 and 2000-61847.0
Multi-agent evolutionary systems for the generation of complex virtual worlds
Modern films, games and virtual reality applications are dependent on
convincing computer graphics. Highly complex models are a requirement for the
successful delivery of many scenes and environments. While workflows such as
rendering, compositing and animation have been streamlined to accommodate
increasing demands, modelling complex models is still a laborious task. This
paper introduces the computational benefits of an Interactive Genetic Algorithm
(IGA) to computer graphics modelling while compensating the effects of user
fatigue, a common issue with Interactive Evolutionary Computation. An
intelligent agent is used in conjunction with an IGA that offers the potential
to reduce the effects of user fatigue by learning from the choices made by the
human designer and directing the search accordingly. This workflow accelerates
the layout and distribution of basic elements to form complex models. It
captures the designer's intent through interaction, and encourages playful
discovery
Unlocking the Power of Representations in Long-term Novelty-based Exploration
We introduce Robust Exploration via Clustering-based Online Density
Estimation (RECODE), a non-parametric method for novelty-based exploration that
estimates visitation counts for clusters of states based on their similarity in
a chosen embedding space. By adapting classical clustering to the nonstationary
setting of Deep RL, RECODE can efficiently track state visitation counts over
thousands of episodes. We further propose a novel generalization of the inverse
dynamics loss, which leverages masked transformer architectures for multi-step
prediction; which in conjunction with RECODE achieves a new state-of-the-art in
a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new
state-of-the-art in hard exploration Atari games, and is the first agent to
reach the end screen in "Pitfall!"
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Outperforming Game Theoretic Play with Opponent Modeling in Two Player Dominoes
Dominoes is a partially observable extensive form game with probability. The rules are simple; however, complexity and uncertainty of this game make it difficult to apply standard game theoretic methods to solve. This thesis applies strategy prediction opponent modeling to work with game theoretic search algorithms in the game of two player dominoes. This research also applies methods to compute the upper bound potential that predicting a strategy can provide towards specific strategy types. Furthermore, the actual values are computed according to the accuracy of a trained classifier. Empirical results show that there is a potential value gain over a Nash equilibrium player in score for fully and partially observable environments for specific strategy types. The actual value gained is positive for a fully observable environment for score and total wins and ties. Actual value gained over the Nash equilibrium player from the opponent model only exist for score, while the opponent modeler demonstrates a higher potential to win and/or tie in comparison to a pure game theoretic agent
- …