26 research outputs found
Safe Reinforcement Learning through Meta-learned Instincts
An important goal in reinforcement learning is to create agents that can
quickly adapt to new goals while avoiding situations that might cause damage to
themselves or their environments. One way agents learn is through exploration
mechanisms, which are needed to discover new policies. However, in deep
reinforcement learning, exploration is normally done by injecting noise in the
action space. While performing well in many domains, this setup has the
inherent risk that the noisy actions performed by the agent lead to unsafe
states in the environment. Here we introduce a novel approach called
Meta-Learned Instinctual Networks (MLIN) that allows agents to safely learn
during their lifetime while avoiding potentially hazardous states. At the core
of the approach is a plastic network trained through reinforcement learning and
an evolved "instinctual" network, which does not change during the agent's
lifetime but can modulate the noisy output of the plastic network. We test our
idea on a simple 2D navigation task with no-go zones, in which the agent has to
learn to approach new targets during deployment. MLIN outperforms standard
meta-trained networks and allows agents to learn to navigate to new targets
without colliding with any of the no-go zones. These results suggest that
meta-learning augmented with an instinctual network is a promising new approach
for safe AI, which may enable progress in this area on a variety of different
domains
Safer Reinforcement Learning through Transferable Instinct Networks
Random exploration is one of the main mechanisms through which reinforcement
learning (RL) finds well-performing policies. However, it can lead to
undesirable or catastrophic outcomes when learning online in safety-critical
environments. In fact, safe learning is one of the major obstacles towards
real-world agents that can learn during deployment. One way of ensuring that
agents respect hard limitations is to explicitly configure boundaries in which
they can operate. While this might work in some cases, we do not always have
clear a-priori information which states and actions can lead dangerously close
to hazardous states. Here, we present an approach where an additional policy
can override the main policy and offer a safer alternative action. In our
instinct-regulated RL (IR^2L) approach, an "instinctual" network is trained to
recognize undesirable situations, while guarding the learning policy against
entering them. The instinct network is pre-trained on a single task where it is
safe to make mistakes, and transferred to environments in which learning a new
task safely is critical. We demonstrate IR^2L in the OpenAI Safety gym domain,
in which it receives a significantly lower number of safety violations during
training than a baseline RL approach while reaching similar task performance.Comment: The paper was accepted in the ALIFE 2021 conferenc
Procedurally generating rules to adapt difficulty for narrative puzzle games
This short paper focuses on procedurally generating rules and communicating them to players to adjust the difficulty. This is part of a larger project to collect and adapt games in educational games for young children using a digital puzzle game designed for kindergartens. A genetic algorithm is used together with a difficulty measure to find a target number of solution sets and a large language model is used to communicate the rules in a narrative context. During testing the approach was able to find rules that approximate any given target difficulty within two dozen generations on average. The approach was combined with a large language model to create a narrative puzzle game where players have to host a dinner for animals that can't get along. Future experiments will try to improve evaluation, specialize the language model on children's literature, and collect multi-modal data from players to guide adaptation
EvoCraft: A New Challenge for Open-Endedness
This paper introduces EvoCraft, a framework for Minecraft designed to study
open-ended algorithms. We introduce an API that provides an open-source Python
interface for communicating with Minecraft to place and track blocks. In
contrast to previous work in Minecraft that focused on learning to play the
game, the grand challenge we pose here is to automatically search for
increasingly complex artifacts in an open-ended fashion. Compared to other
environments used to study open-endedness, Minecraft allows the construction of
almost any kind of structure, including actuated machines with circuits and
mechanical components. We present initial baseline results in evolving simple
Minecraft creations through both interactive and automated evolution. While
evolution succeeds when tasked to grow a structure towards a specific target,
it is unable to find a solution when rewarded for creating a simple machine
that moves. Thus, EvoCraft offers a challenging new environment for automated
search methods (such as evolution) to find complex artifacts that we hope will
spur the development of more open-ended algorithms. A Python implementation of
the EvoCraft framework is available at:
https://github.com/real-itu/Evocraft-py
Growing 3D Artefacts and Functional Machines with Neural Cellular Automata
Neural Cellular Automata (NCAs) have been proven effective in simulating
morphogenetic processes, the continuous construction of complex structures from
very few starting cells. Recent developments in NCAs lie in the 2D domain,
namely reconstructing target images from a single pixel or infinitely growing
2D textures. In this work, we propose an extension of NCAs to 3D, utilizing 3D
convolutions in the proposed neural network architecture. Minecraft is selected
as the environment for our automaton since it allows the generation of both
static structures and moving machines. We show that despite their simplicity,
NCAs are capable of growing complex entities such as castles, apartment blocks,
and trees, some of which are composed of over 3,000 blocks. Additionally, when
trained for regeneration, the system is able to regrow parts of simple
functional machines, significantly expanding the capabilities of simulated
morphogenetic systems. The code for the experiment in this paper can be found
at: https://github.com/real-itu/3d-artefacts-nca