13,596 research outputs found
Graphical Object-Centric Actor-Critic
There have recently been significant advances in the problem of unsupervised
object-centric representation learning and its application to downstream tasks.
The latest works support the argument that employing disentangled object
representations in image-based object-centric reinforcement learning tasks
facilitates policy learning. We propose a novel object-centric reinforcement
learning algorithm combining actor-critic and model-based approaches to utilize
these representations effectively. In our approach, we use a transformer
encoder to extract object representations and graph neural networks to
approximate the dynamics of an environment. The proposed method fills a
research gap in developing efficient object-centric world models for
reinforcement learning settings that can be used for environments with discrete
or continuous action spaces. Our algorithm performs better in a visually
complex 3D robotic environment and a 2D environment with compositional
structure than the state-of-the-art model-free actor-critic algorithm built
upon transformer architecture and the state-of-the-art monolithic model-based
algorithm
Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog
A number of recent works have proposed techniques for end-to-end learning of
communication protocols among cooperative multi-agent populations, and have
simultaneously found the emergence of grounded human-interpretable language in
the protocols developed by the agents, all learned without any human
supervision!
In this paper, using a Task and Tell reference game between two agents as a
testbed, we present a sequence of 'negative' results culminating in a
'positive' one -- showing that while most agent-invented languages are
effective (i.e. achieve near-perfect task rewards), they are decidedly not
interpretable or compositional.
In essence, we find that natural language does not emerge 'naturally',
despite the semblance of ease of natural-language-emergence that one may gather
from recent literature. We discuss how it is possible to coax the invented
languages to become more and more human-like and compositional by increasing
restrictions on how two agents may communicate.Comment: 9 pages, 7 figures, 2 tables, accepted at EMNLP 2017 as short pape
Projective simulation for artificial intelligence
We propose a model of a learning agent whose interaction with the environment
is governed by a simulation-based projection, which allows the agent to project
itself into future situations before it takes real action. Projective
simulation is based on a random walk through a network of clips, which are
elementary patches of episodic memory. The network of clips changes
dynamically, both due to new perceptual input and due to certain compositional
principles of the simulation process. During simulation, the clips are screened
for specific features which trigger factual action of the agent. The scheme is
different from other, computational, notions of simulation, and it provides a
new element in an embodied cognitive science approach to intelligent action and
learning. Our model provides a natural route for generalization to
quantum-mechanical operation and connects the fields of reinforcement learning
and quantum computation.Comment: 22 pages, 18 figures. Close to published version, with footnotes
retaine
CompILE: Compositional Imitation Learning and Execution
We introduce Compositional Imitation Learning and Execution (CompILE): a
framework for learning reusable, variable-length segments of
hierarchically-structured behavior from demonstration data. CompILE uses a
novel unsupervised, fully-differentiable sequence segmentation module to learn
latent encodings of sequential data that can be re-composed and executed to
perform new tasks. Once trained, our model generalizes to sequences of longer
length and from environment instances not seen during training. We evaluate
CompILE in a challenging 2D multi-task environment and a continuous control
task, and show that it can find correct task boundaries and event encodings in
an unsupervised manner. Latent codes and associated behavior policies
discovered by CompILE can be used by a hierarchical agent, where the high-level
policy selects actions in the latent code space, and the low-level,
task-specific policies are simply the learned decoders. We found that our
CompILE-based agent could learn given only sparse rewards, where agents without
task-specific policies struggle.Comment: ICML (2019
- …