49 research outputs found

    Solving planning problems with deep reinforcement learning and tree search

    Get PDF
    Deep reinforcement learning methods are capable of learning complex heuristics starting with no prior knowledge, but struggle in environments where the learning signal is sparse. In contrast, planning methods can discover the optimal path to a goal in the absence of external rewards, but often require a hand-crafted heuristic function to be effective. In this thesis, we describe a model-based reinforcement learning method that bridges the middle ground between these two approaches. When evaluated on the complex domain of Sokoban, the model-based method was found to be more performant, stable and sample-efficient than a model-free baseline

    Level Generation Through Large Language Models

    Full text link
    Large Language Models (LLMs) are powerful tools, capable of leveraging their training on natural language to write stories, generate code, and answer questions. But can they generate functional video game levels? Game levels, with their complex functional constraints and spatial relationships in more than one dimension, are very different from the kinds of data an LLM typically sees during training. Datasets of game levels are also hard to come by, potentially taxing the abilities of these data-hungry models. We investigate the use of LLMs to generate levels for the game Sokoban, finding that LLMs are indeed capable of doing so, and that their performance scales dramatically with dataset size. We also perform preliminary experiments on controlling LLM level generators and discuss promising areas for future work

    Self-Motivated Composition of Strategic Action Policies

    Get PDF
    In the last 50 years computers have made dramatic progress in their capabilities, but at the same time their failings have demonstrated that we, as designers, do not yet understand the nature of intelligence. Chess playing, for example, was long offered up as an example of the unassailability of the human mind to Artificial Intelligence, but now a chess engine on a smartphone can beat a grandmaster. Yet, at the same time, computers struggle to beat amateur players in simpler games, such as Stratego, where sheer processing power cannot substitute for a lack of deeper understanding. The task of developing that deeper understanding is overwhelming, and has previously been underestimated. There are many threads and all must be investigated. This dissertation explores one of those threads, namely asking the question ā€œHow might an artificial agent decide on a sensible course of action, without being told what to do?ā€. To this end, this research builds upon empowerment, a universal utility which provides an entirely general method for allowing an agent to measure the preferability of one state over another. Empowerment requires no explicit goals, and instead favours states that maximise an agentā€™s control over its environment. Several extensions to the empowerment framework are proposed, which drastically increase the array of scenarios to which it can be applied, and allow it to evaluate actions in addition to states. These extensions are motivated by concepts such as bounded rationality, sub-goals, and anticipated future utility. In addition, the novel concept of strategic affinity is proposed as a general method for measuring the strategic similarity between two (or more) potential sequences of actions. It does this in a general fashion, by examining how similar the distribution of future possible states would be in the case of enacting either sequence. This allows an agent to group action sequences, even in an unknown task space, into ā€˜strategiesā€™. Strategic affinity is combined with the empowerment extensions to form soft-horizon empowerment, which is capable of composing action policies in a variety of unknown scenarios. A Pac-Man-inspired prey game and the Gamblerā€™s Problem are used to demonstrate this selfmotivated action selection, and a Sokoban inspired box-pushing scenario is used to highlight the capability to pick strategically diverse actions. The culmination of this is that soft-horizon empowerment demonstrates a variety of ā€˜intuitiveā€™ behaviours, which are not dissimilar to what we might expect a human to try. This line of thinking demonstrates compelling results, and it is suggested there are a couple of avenues for immediate further research. One of the most promising of these would be applying the self-motivated methodology and strategic affinity method to a wider range of scenarios, with a view to developing improved heuristic approximations that generate similar results. A goal of replicating similar results, whilst reducing the computational overhead, could help drive an improved understanding of how we may get closer to replicating a human-like approach

    Feature matching and learning for controlling multiple identical agents with global inputs

    Full text link
    Simple identical agent systems are becoming more common in nanotechnology, biology, and chemistry. Since, in these domains, each agent can implement only necessarily by simple mechanisms, the major challenge of these systems is how to control the agents using limited control input, such as broadcast control. Inspired by previous work, in which identical agents can be controlled via global inputs using a single fixed obstacle, we propose a new pipeline that uses tree search and matching methods to identify target and agent pairs to move, and their orders. In this work, we compare several matching methods from a hand-crafted template matching to learned feature descriptors matching, and discuss their validity in the pathfinding problem. We also employ the Monte Carlo Tree Search algorithm in order to enhance the efficiency of the tree search. In experiments, we execute the proposed pipeline in shape formation tasks. We compare the total number of control steps and computation time between the different matching methods, as well as against previous work and human solutions. The results show all our methods significantly reduce the total number of input steps compared to the previous work. In particular, the combination of learned feature matching and the Monte Carlo Tree Search algorithm outperforms all other methods
    corecore