6 research outputs found

    Learning motion primitives and annotative texts from crowd-sourcing

    Get PDF

    Grounding Spatio-Temporal Language with Transformers

    Full text link
    Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future.Comment: Contains main article and supplementarie

    Simulation Tools for the Study of the Interaction between Communication and Action in Cognitive Robots

    Get PDF
    In this thesis I report the development of FARSA (Framework for Autonomous Robotics Simulation and Analysis), a simulation tool for the study of the interaction between language and action in cognitive robots and more in general for experiments in embodied cognitive science. Before presenting the tools, I will describe a series of experiments that involve simulated humanoid robots that acquire their behavioural and language skills autonomously through a trial-and-error adaptive process in which random variations of the free parameters of the robots’ controller are retained or discarded on the basis of their effect on the overall behaviour exhibited by the robot in interaction with the environment. More specifically the first series of experiments shows how the availability of linguistic stimuli provided by a caretaker, that indicate the elementary actions that need to be carried out in order to accomplish a certain complex action, facilitates the acquisition of the required behavioural capacity. The second series of experiments shows how a robot trained to comprehend a set of command phrases by executing the corresponding appropriate behaviour can generalize its knowledge by comprehending new, never experienced sentences, and by producing new appropriate actions. Together with their scientific relevance, these experiments provide a series of requirements that have been taken into account during the development of FARSA. The objective of this project is that to reduce the complexity barrier that currently discourages part of the researchers interested in the study of behaviour and cognition from initiating experimental activity in this area. FARSA is the only available tools that provide an integrated framework for carrying on experiments of this type, i.e. it is the only tool that provides ready to use integrated components that enable to define the characteristics of the robots and of the environment, the characteristics of the robots’ controller, and the characteristics of the adaptive process. Overall this enables users to quickly setup experiments, including complex experiments, and to quickly start collecting results

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    An Experiment on Behavior Generalization and the Emergence of Linguistic Compositionality in Evolving Robots

    Get PDF
    Populations of simulated agents controlled by dynamical neural networks are trained by artificial evolution to access linguistic instructions and to execute them by indicating, touching or moving specific target objects. During training the agent experiences only a subset of all object/action pairs. During post-evaluation, some of the successful agents proved to be able to access and execute also linguistic instructions not experienced during training. This owes to the development of a semantic space, grounded in the sensory motor capability of the agent and organised in a systematised way in order to facilitate linguistic compositionality and behavioural generalisation. Compositionality seems to be underpinned by a capability of the agents to access and execute the instructions by temporally decomposing their linguistic and behavioural aspects into their constituent parts (i.e., finding the target object and executing the required action). The comparison between two experimental conditions, in one of which the agents are required to ignore rather than to indicate objects, shows that the composition of the behavioural set significantly influences the development of compositional semantic structures

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    corecore