5 research outputs found
Goal reasoning for autonomous agents using automated planning
MenciĂłn Internacional en el tĂtulo de doctorAutomated planning deals with the task of finding a sequence of actions, namely
a plan, which achieves a goal from a given initial state. Most planning research
consider goals are provided by a external user, and agents just have to find a
plan to achieve them. However, there exist many real world domains where
agents should not only reason about their actions but also about their goals,
generating new ones or changing them according to the perceived environment.
In this thesis we aim at broadening the goal reasoning capabilities of planningbased
agents, both when acting in isolation and when operating in the same
environment as other agents.
In single-agent settings, we firstly explore a special type of planning tasks
where we aim at discovering states that fulfill certain cost-based requirements
with respect to a given set of goals. By computing these states, agents are able
to solve interesting tasks such as find escape plans that move agents in to safe
places, hide their true goal to a potential observer, or anticipate dynamically arriving
goals. We also show how learning the environmentâs dynamics may help
agents to solve some of these tasks. Experimental results show that these states
can be quickly found in practice, making agents able to solve new planning
tasks and helping them in solving some existing ones.
In multi-agent settings, we study the automated generation of goals based on
other agentsâ behavior. We focus on competitive scenarios, where we are interested
in computing counterplans that prevent opponents from achieving their
goals. We frame these tasks as counterplanning, providing theoretical properties
of the counterplans that solve them. We also show how agents can benefit
from computing some of the states we propose in the single-agent setting to
anticipate their opponentâs movements, thus increasing the odds of blocking
them. Experimental results show how counterplans can be found in different
environments ranging from competitive planning domains to real-time strategy
games.Programa de Doctorado en Ciencia y TecnologĂa InformĂĄtica por la Universidad Carlos III de MadridPresidenta: Eva OnaindĂa de la Rivaherrera.- Secretario: Ăngel GarcĂa Olaya.- Vocal: Mark Robert
Recommended from our members
End-to-end deep reinforcement learning in computer systems
Abstract
The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications.
In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping.
Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications
Low-resource learning in complex games
This project is concerned with learning to take decisions in complex domains, in games
in particular. Previous work assumes that massive data resources are available for
training, but aside from a few very popular games, this is generally not the case, and the
state of the art in such circumstances is to rely extensively on hand-crafted heuristics.
On the other hand, human players are able to quickly learn from only a handful of
examples, exploiting specific characteristics of the learning problem to accelerate their
learning process. Designing algorithms that function in a similar way is an open area
of research and has many applications in todayâs complex decision problems.
One solution presented in this work is design learning algorithms that exploit the
inherent structure of the game. Specifically, we take into account how the action space
can be clustered into sets called types and exploit this characteristic to improve planning
at decision time. Action types can also be leveraged to extract high-level strategies
from a sparse corpus of human play, and this generates more realistic trajectories
during planning, further improving performance.
Another approach that proved successful is using an accurate model of the environment
to reduce the complexity of the learning problem. Similar to how human players
have an internal model of the world that allows them to focus on the relevant parts of
the problem, we decouple learning to win from learning the rules of the game, thereby
making supervised learning more data efficient.
Finally, in order to handle partial observability that is usually encountered in complex
games, we propose an extension to Monte Carlo Tree Search that plans in the
Belief Markov Decision Process. We found that this algorithm doesnât outperform
the state of the art models on our chosen domain. Our error analysis indicates that the
method struggles to handle the high uncertainty of the conditions required for the game
to end. Furthermore, our relaxed belief model can cause rollouts in the belief space to
be inaccurate, especially in complex games.
We assess the proposed methods in an agent playing the highly complex board
game Settlers of Catan. Building on previous research, our strongest agent combines
planning at decision time with prior knowledge extracted from an available corpus of
general human play; but unlike this prior work, our human corpus consists of only
60 games, as opposed to many thousands. Our agent defeats the current state of the
art agent by a large margin, showing that the proposed modifications aid in exploiting
general human play in highly complex games