6 research outputs found
Grounding Spatio-Temporal Language with Transformers
Language is an interface to the outside world. In order for embodied agents
to use it, language must be grounded in other, sensorimotor modalities. While
there is an extended literature studying how machines can learn grounded
language, the topic of how to learn spatio-temporal linguistic concepts is
still largely uncharted. To make progress in this direction, we here introduce
a novel spatio-temporal language grounding task where the goal is to learn the
meaning of spatio-temporal descriptions of behavioral traces of an embodied
agent. This is achieved by training a truth function that predicts if a
description matches a given history of observations. The descriptions involve
time-extended predicates in past and present tense as well as spatio-temporal
references to objects in the scene. To study the role of architectural biases
in this task, we train several models including multimodal Transformer
architectures; the latter implement different attention computations between
words and objects across space and time. We test models on two classes of
generalization: 1) generalization to randomly held-out sentences; 2)
generalization to grammar primitives. We observe that maintaining object
identity in the attention computation of our Transformers is instrumental to
achieving good performance on generalization overall, and that summarizing
object traces in a single token has little influence on performance. We then
discuss how this opens new perspectives for language-guided autonomous embodied
agents. We also release our code under open-source license as well as
pretrained models and datasets to encourage the wider community to build upon
and extend our work in the future.Comment: Contains main article and supplementarie
Simulation Tools for the Study of the Interaction between Communication and Action in Cognitive Robots
In this thesis I report the development of FARSA (Framework for Autonomous Robotics Simulation and Analysis), a simulation tool for the study of the interaction between language and action in cognitive robots and more in general for experiments in embodied cognitive science. Before presenting the tools, I will describe a series of experiments that involve simulated humanoid robots that acquire their behavioural and language skills autonomously through a trial-and-error adaptive process in which random variations of the free parameters of the robots’ controller are retained or discarded on the basis of their effect on the overall behaviour exhibited by the robot in interaction with the environment. More specifically the first series of experiments shows how the availability of linguistic stimuli provided by a caretaker, that indicate the elementary actions that need to be carried out in order to accomplish a certain complex action, facilitates the acquisition of the required behavioural capacity. The second series of experiments shows how a robot trained to comprehend a set of command phrases by executing the corresponding appropriate behaviour can generalize its knowledge by comprehending new, never experienced sentences, and by producing new appropriate actions.
Together with their scientific relevance, these experiments provide a series of requirements that have been taken into account during the development of FARSA. The objective of this project is that to reduce the complexity barrier that currently discourages part of the researchers interested in the study of behaviour and cognition from initiating experimental activity in this area. FARSA is the only available tools that provide an integrated framework for carrying on experiments of this type, i.e. it is the only tool that provides ready to use integrated components that enable to define the characteristics of the robots and of the environment, the characteristics of the robots’ controller, and the characteristics of the adaptive process. Overall this enables users to quickly setup experiments, including complex experiments, and to quickly start collecting results
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
An Experiment on Behavior Generalization and the Emergence of Linguistic Compositionality in Evolving Robots
Populations of simulated agents controlled by dynamical neural networks are trained by artificial evolution to access linguistic instructions and to execute them by indicating, touching or moving specific target objects. During training the agent experiences only a subset of all object/action pairs. During post-evaluation, some of the successful agents proved to be able to access and execute also linguistic instructions not experienced during training. This owes to the development of a semantic space, grounded in the sensory motor capability of the agent and organised in a systematised way in order to facilitate linguistic compositionality and behavioural generalisation. Compositionality seems to be underpinned by a capability of the agents to access and execute the instructions by temporally decomposing their linguistic and behavioural aspects into their constituent parts (i.e., finding the target object and executing the required action). The comparison between two experimental conditions, in one of which the agents are required to ignore rather than to indicate objects, shows that the composition of the behavioural set significantly influences the development of compositional semantic structures
Using MapReduce Streaming for Distributed Life Simulation on the Cloud
Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp