49 research outputs found
Solving planning problems with deep reinforcement learning and tree search
Deep reinforcement learning methods are capable of learning complex heuristics starting with no prior knowledge, but struggle in environments where the learning signal is sparse. In contrast, planning methods can discover the optimal path to a goal in the absence of external rewards, but often require a hand-crafted heuristic function to be effective. In this thesis, we describe a model-based reinforcement learning method that bridges the middle ground between these two approaches. When evaluated on the complex domain of Sokoban, the model-based method was found to be more performant, stable and sample-efficient than a model-free baseline
Level Generation Through Large Language Models
Large Language Models (LLMs) are powerful tools, capable of leveraging their
training on natural language to write stories, generate code, and answer
questions. But can they generate functional video game levels? Game levels,
with their complex functional constraints and spatial relationships in more
than one dimension, are very different from the kinds of data an LLM typically
sees during training. Datasets of game levels are also hard to come by,
potentially taxing the abilities of these data-hungry models. We investigate
the use of LLMs to generate levels for the game Sokoban, finding that LLMs are
indeed capable of doing so, and that their performance scales dramatically with
dataset size. We also perform preliminary experiments on controlling LLM level
generators and discuss promising areas for future work
Self-Motivated Composition of Strategic Action Policies
In the last 50 years computers have made dramatic progress in their capabilities, but at the same time their failings have demonstrated that we, as designers, do not yet understand the nature of intelligence. Chess playing, for example, was long offered up as an example of the unassailability of the human mind to Artificial Intelligence, but now a chess engine on a smartphone can beat a grandmaster. Yet, at the same time, computers struggle to beat amateur players in simpler games, such as Stratego, where sheer processing power cannot substitute for a lack of deeper understanding.
The task of developing that deeper understanding is overwhelming, and has previously been underestimated. There are many threads and all must be investigated. This dissertation explores one of those threads, namely asking the question āHow might an artificial agent decide on a sensible course of action, without being told what to do?ā.
To this end, this research builds upon empowerment, a universal utility which provides an entirely general method for allowing an agent to measure the preferability of one state over another. Empowerment requires no explicit goals, and instead favours states that maximise an agentās control over its environment.
Several extensions to the empowerment framework are proposed, which drastically increase the array of scenarios to which it can be applied, and allow it to evaluate actions in addition to states. These extensions are motivated by concepts such as bounded rationality, sub-goals, and anticipated future utility.
In addition, the novel concept of strategic affinity is proposed as a general method for measuring the strategic similarity between two (or more) potential sequences of actions. It does this in a general fashion, by examining how similar the distribution of future possible states would be in the case of enacting either sequence. This allows an agent to group action sequences, even in an unknown task space, into āstrategiesā.
Strategic affinity is combined with the empowerment extensions to form soft-horizon empowerment, which is capable of composing action policies in a variety of unknown scenarios.
A Pac-Man-inspired prey game and the Gamblerās Problem are used to demonstrate this selfmotivated action selection, and a Sokoban inspired box-pushing scenario is used to highlight the capability to pick strategically diverse actions.
The culmination of this is that soft-horizon empowerment demonstrates a variety of āintuitiveā behaviours, which are not dissimilar to what we might expect a human to try.
This line of thinking demonstrates compelling results, and it is suggested there are a couple of avenues for immediate further research.
One of the most promising of these would be applying the self-motivated methodology and strategic affinity method to a wider range of scenarios, with a view to developing improved heuristic approximations that generate similar results. A goal of replicating similar results, whilst reducing the computational overhead, could help drive an improved understanding of how we may get closer to replicating a human-like approach
Feature matching and learning for controlling multiple identical agents with global inputs
Simple identical agent systems are becoming more common in nanotechnology, biology, and chemistry. Since, in these domains, each agent can implement only necessarily by simple mechanisms, the major challenge of these systems is how to control the agents using limited control input, such as broadcast control. Inspired by previous work, in which identical agents can be controlled via global inputs using a single fixed obstacle, we propose a new pipeline that uses tree search and matching methods to identify target and agent pairs to move, and their orders. In this work, we compare several matching methods from a hand-crafted template matching to learned feature descriptors matching, and discuss their validity in the pathfinding problem. We also employ the Monte Carlo Tree Search algorithm in order to enhance the efficiency of the tree search. In experiments, we execute the proposed pipeline in shape formation tasks. We compare the total number of control steps and computation time between the different matching methods, as well as against previous work and human solutions. The results show all our methods significantly reduce the total number of input steps compared to the previous work. In particular, the combination of learned feature matching and the Monte Carlo Tree Search algorithm outperforms all other methods