33 research outputs found
LU4R: Adaptive Spoken Language Understanding for Robots
Service robots are expected to operate in specific environments, where the presence of humans plays a key role. It is thus essential to enable for a natural and effective communication among humans and robots. One of the main features of such robotics platforms is the ability to react to spoken commands. This requires a comprehensive understanding of the user utterance to trigger the robot reaction. Moreover, the correct interpretation of linguistic interactions depends on physical, cognitive and language-dependent aspects related to the environment. In this work, we present the latest version of LU4R - adaptive spoken Language Understanding 4 Robots, a Spoken Language Understanding framework for the semantic interpretation of robotic commands, that is sensitive to the operational environment. The overall system is designed according to a Client/Server architecture in order to be easily deployed in a vast plethora of robotic platforms. Moreover, an improved version of HuRIC - Human-Robot Interaction Corpus is presented. The main novelty presented in this paper is the extension to commands expressed in Italian. In order to prove the effectiveness of such system, we also present some empirical results in both English and Italian computed over the new HuRIC resource
Ubiquitous Technologies for Emotion Recognition
Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions
Symbiotic interaction between humans and robot swarms
Comprising of a potentially large team of autonomous cooperative robots locally interacting and communicating with each other, robot swarms provide a natural diversity of parallel and distributed functionalities, high flexibility, potential for redundancy, and fault-tolerance. The use of autonomous mobile robots is expected to increase in the future and swarm robotic systems are envisioned to play important roles in tasks such as: search and rescue (SAR) missions, transportation of objects, surveillance, and reconnaissance operations. To robustly deploy robot swarms on the field with humans, this research addresses the fundamental problems in the relatively new field of human-swarm interaction (HSI). Four groups of core classes of problems have been addressed for proximal interaction between humans and robot swarms: interaction and communication; swarm-level sensing and classification; swarm coordination; swarm-level learning. The primary contribution of this research aims to develop a bidirectional human-swarm communication system for non-verbal interaction between humans and heterogeneous robot swarms. The guiding field of application are SAR missions. The core challenges and issues in HSI include: How can human operators interact and communicate with robot swarms? Which interaction modalities can be used by humans? How can human operators instruct and command robots from a swarm? Which mechanisms can be used by robot swarms to convey feedback to human operators? Which type of feedback can swarms convey to humans? In this research, to start answering these questions, hand gestures have been chosen as the interaction modality for humans, since gestures are simple to use, easily recognized, and possess spatial-addressing properties. To facilitate bidirectional interaction and communication, a dialogue-based interaction system is introduced which consists of: (i) a grammar-based gesture language with a vocabulary of non-verbal commands that allows humans to efficiently provide mission instructions to swarms, and (ii) a swarm coordinated multi-modal feedback language that enables robot swarms to robustly convey swarm-level decisions, status, and intentions to humans using multiple individual and group modalities. The gesture language allows humans to: select and address single and multiple robots from a swarm, provide commands to perform tasks, specify spatial directions and application-specific parameters, and build iconic grammar-based sentences by combining individual gesture commands. Swarms convey different types of multi-modal feedback to humans using on-board lights, sounds, and locally coordinated robot movements. The swarm-to-human feedback: conveys to humans the swarm's understanding of the recognized commands, allows swarms to assess their decisions (i.e., to correct mistakes: made by humans in providing instructions, and errors made by swarms in recognizing commands), and guides humans through the interaction process. The second contribution of this research addresses swarm-level sensing and classification: How can robot swarms collectively sense and recognize hand gestures given as visual signals by humans? Distributed sensing, cooperative recognition, and decision-making mechanisms have been developed to allow robot swarms to collectively recognize visual instructions and commands given by humans in the form of gestures. These mechanisms rely on decentralized data fusion strategies and multi-hop messaging passing algorithms to robustly build swarm-level consensus decisions. Measures have been introduced in the cooperative recognition protocol which provide a trade-off between the accuracy of swarm-level consensus decisions and the time taken to build swarm decisions. The third contribution of this research addresses swarm-level cooperation: How can humans select spatially distributed robots from a swarm and the robots understand that they have been selected? How can robot swarms be spatially deployed for proximal interaction with humans? With the introduction of spatially-addressed instructions (pointing gestures) humans can robustly address and select spatially- situated individuals and groups of robots from a swarm. A cascaded classification scheme is adopted in which, first the robot swarm identifies the selection command (e.g., individual or group selection), and then the robots coordinate with each other to identify if they have been selected. To obtain better views of gestures issued by humans, distributed mobility strategies have been introduced for the coordinated deployment of heterogeneous robot swarms (i.e., ground and flying robots) and to reshape the spatial distribution of swarms. The fourth contribution of this research addresses the notion of collective learning in robot swarms. The questions that are answered include: How can robot swarms learn about the hand gestures given by human operators? How can humans be included in the loop of swarm learning? How can robot swarms cooperatively learn as a team? Online incremental learning algorithms have been developed which allow robot swarms to learn individual gestures and grammar-based gesture sentences supervised by human instructors in real-time. Humans provide different types of feedback (i.e., full or partial feedback) to swarms for improving swarm-level learning. To speed up the learning rate of robot swarms, cooperative learning strategies have been introduced which enable individual robots in a swarm to intelligently select locally sensed information and share (exchange) selected information with other robots in the swarm. The final contribution is a systemic one, it aims on building a complete HSI system towards potential use in real-world applications, by integrating the algorithms, techniques, mechanisms, and strategies discussed in the contributions above. The effectiveness of the global HSI system is demonstrated in the context of a number of interactive scenarios using emulation tests (i.e., performing simulations using gesture images acquired by a heterogeneous robotic swarm) and by performing experiments with real robots using both ground and flying robots
Recommended from our members
Continually improving grounded natural language understanding through human-robot dialog
As robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to take me to Alice's office, the robot must know that Alice is a person who owns some unique office, and that take me means it should navigate there. Similarly, if a person requests bring me the heavy, green mug, the robot must have accurate mental models of the physical concepts heavy, green, and mug. To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them. To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort (Thomason et al., 2015; Padmakumar et al., 2017). Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as heavy, green, and mug requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like heavy and empty (e.g. get the empty carton of milk from the fridge), and assist with concepts that have both visual and non-visual expression (e.g. tall things look big and also exert force sooner than short things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment (Thomason et al., 2016, 2017). We also explore ways to tease out polysemy and synonymy in concept words (Thomason and Mooney, 2017) such as light, which can refer to a weight or a color, the latter sense being synonymous with pale. Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept (Thomason et al., 2018). Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations (Thomason et al., 2015) can be combined with multi-modal perception (Thomason et al., 2016) using predicate-object labels gathered through opportunistic active learning (Thomason et al., 2017) during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.Computer Science
Formação do significado perceptual das palavras através de interacção
Doutoramento em Engenharia InformáticaThis thesis addresses the problem of word learning in computational agents.
The motivation behind this work lies in the need to support language-based
communication between service robots and their human users, as well as
grounded reasoning using symbols relevant for the assigned tasks. The
research focuses on the problem of grounding human vocabulary in robotic
agent’s sensori-motor perception.
Words have to be grounded in bodily experiences, which emphasizes the role
of appropriate embodiments. On the other hand, language is a cultural product
created and acquired through social interactions. This emphasizes the role of
society as a source of linguistic input. Taking these aspects into account, an
experimental scenario is set up where a human instructor teaches a robotic
agent the names of the objects present in a visually shared environment. The
agent grounds the names of these objects in visual perception.
Word learning is an open-ended problem. Therefore, the learning architecture
of the agent will have to be able to acquire words and categories in an openended
manner. In this work, four learning architectures were designed that can
be used by robotic agents for long-term and open-ended word and category
acquisition. The learning methods used in these architectures are designed for
incrementally scaling-up to larger sets of words and categories.
A novel experimental evaluation methodology, that takes into account the openended
nature of word learning, is proposed and applied. This methodology is
based on the realization that a robot’s vocabulary will be limited by its
discriminatory capacity which, in turn, depends on its sensors and perceptual
capabilities. An extensive set of systematic experiments, in multiple
experimental settings, was carried out to thoroughly evaluate the described
learning approaches. The results indicate that all approaches were able to
incrementally acquire new words and categories. Although some of the
approaches could not scale-up to larger vocabularies, one approach was
shown to learn up to 293 categories, with potential for learning many more.Esta tese aborda o problema da aprendizagem de palavras em agentes
computacionais. A motivação por trás deste trabalho reside na necessidade de
suportar a comunicação baseada em linguagem entre os robôs de serviço e os
seus utilizadores humanos, bem como suportar o raciocÃnio baseado em
sÃmbolos que sejam relevantes no contexto das tarefas atribuÃdas e cujo
significado seja definido com base na experiência perceptiva. Mais
especificamente, o foco da investigação é o problema de estabelecer o
significado das palavras na percepção do robô através da interacção homemrobô.
A definição do significado das palavras com base em experiências perceptuais
e perceptuo-motoras enfatiza o papel da configuração fÃsica e perceptuomotora
do robô. Entretanto, a lÃngua é um produto cultural criado e adquirido
através de interacções sociais. Isso destaca o papel da sociedade como fonte
linguÃstica. Tendo em conta estes aspectos, um cenário experimental foi
definido no qual um instrutor humano ensina a um agente robótico os nomes
dos objectos presentes num ambiente visualmente partilhado. O agente
associa os nomes desses objectos à sua percepção visual desses objectos.
A aprendizagem de palavras é um problema sem objectivo pré-estabelecido.
Nós adquirimos novas palavras ao longo das nossas vidas. Assim, a
arquitectura de aprendizagem do agente deve poder adquirir palavras e
categorias de uma forma semelhante. Neste trabalho foram concebidas quatro
arquitecturas de aprendizagem que podem ser usadas por agentes robóticos
para aprendizagem e aquisição de novas palavras e categorias,
incrementalmente. Os métodos de aprendizagem utilizados nestas
arquitecturas foram projectados para funcionar de forma incremental,
acumulando um conjunto cada vez maior de palavras e categorias.
É proposta e aplicada uma nova metodologia da avaliação experimental que
leva em conta a natureza aberta e incremental da aprendizagem de palavras.
Esta metodologia leva em consideração a constatação de que o vocabulário de
um robô será limitado pela sua capacidade de discriminação, a qual, por sua
vez, depende dos seus sensores e capacidades perceptuais. Foi realizado um
extenso conjunto de experiências sistemáticas em múltiplas situações
experimentais, para avaliar cuidadosamente estas abordagens de
aprendizagem. Os resultados indicam que todas as abordagens foram capazes
de adquirir novas palavras e categorias incrementalmente. Embora em
algumas das abordagens não tenha sido possÃvel atingir vocabulários maiores,
verificou-se que uma das abordagens conseguiu aprender até 293 categorias,
com potencial para aprender muitas mais
Recommended from our members
Initial Designs for Improving Conversations for People Using Speech Synthesizers
This thesis aims to determine the impact that Augmentative and Alternative Communication (AAC) devices have on social interactions, and then improves the AAC user experience through a user focus design process. AAC devices enable people who cannot speak to communicate with others. Unfortunately, they are tedious to use and make social interaction a dissatisfying experience. The thesis consists of three main studies. The first study focus on gathering information on behaviors of communicative partners, specifically gaze behaviors, of AAC users and how those behaviors impact the users. The second study takes a form of a focus group with people who are experienced with AAC devices. The main discussions on the focus group are to gather ideas about daily interactions of actual AAC users, and brainstorm design ideas for technologies that can improve their interactions with others. Finally, the last study aims to get feedback on prototypes created from ideas in the second study
Acoustic-based Smart Tactile Sensing in Social Robots
Mención Internacional en el tÃtulo de doctorEl sentido del tacto es un componente crucial de la interacción social humana y es único
entre los cinco sentidos. Como único sentido proximal, el tacto requiere un contacto
fÃsico cercano o directo para registrar la información. Este hecho convierte al tacto en
una modalidad de interacción llena de posibilidades en cuanto a comunicación social. A través
del tacto, podemos conocer la intención de la otra persona y comunicar emociones. De esta
idea surge el concepto de social touch o tacto social como el acto de tocar a otra persona en
un contexto social. Puede servir para diversos fines, como saludar, mostrar afecto, persuadir
y regular el bienestar emocional y fÃsico.
Recientemente, el número de personas que interactúan con sistemas y agentes artificiales
ha aumentado, principalmente debido al auge de los dispositivos tecnológicos, como los smartphones
o los altavoces inteligentes. A pesar del auge de estos dispositivos, sus capacidades de
interacción son limitadas. Para paliar este problema, los recientes avances en robótica social han
mejorado las posibilidades de interacción para que los agentes funcionen de forma más fluida y
sean más útiles. En este sentido, los robots sociales están diseñados para facilitar interacciones
naturales entre humanos y agentes artificiales. El sentido del tacto en este contexto se revela
como un vehÃculo natural que puede mejorar la Human-Robot Interaction (HRI) debido a su
relevancia comunicativa en entornos sociales. Además de esto, para un robot social, la relación
entre el tacto social y su aspecto es directa, al disponer de un cuerpo fÃsico para aplicar o recibir
toques.
Desde un punto de vista técnico, los sistemas de detección táctil han sido objeto recientemente
de nuevas investigaciones, sobre todo dedicado a comprender este sentido para crear sistemas
inteligentes que puedan mejorar la vida de las personas. En este punto, los robots sociales
se han convertido en dispositivos muy populares que incluyen tecnologÃas para la detección
táctil. Esto está motivado por el hecho de que un robot puede esperada o inesperadamente
tener contacto fÃsico con una persona, lo que puede mejorar o interferir en la ejecución de sus
comportamientos. Por tanto, el sentido del tacto se antoja necesario para el desarrollo de aplicaciones
robóticas. Algunos métodos incluyen el reconocimiento de gestos táctiles, aunque
a menudo exigen importantes despliegues de hardware que requieren de múltiples sensores. Además, la fiabilidad de estas tecnologÃas de detección es limitada, ya que la mayorÃa de ellas
siguen teniendo problemas tales como falsos positivos o tasas de reconocimiento bajas. La detección
acústica, en este sentido, puede proporcionar un conjunto de caracterÃsticas capaces de
paliar las deficiencias anteriores. A pesar de que se trata de una tecnologÃa utilizada en diversos
campos de investigación, aún no se ha integrado en la interacción táctil entre humanos y robots.
Por ello, en este trabajo proponemos el sistema Acoustic Touch Recognition (ATR), un sistema
inteligente de detección táctil (smart tactile sensing system) basado en la detección acústica
y diseñado para mejorar la interacción social humano-robot. Nuestro sistema está desarrollado
para clasificar gestos táctiles y localizar su origen. Además de esto, se ha integrado en plataformas
robóticas sociales y se ha probado en aplicaciones reales con éxito. Nuestra propuesta
se ha enfocado desde dos puntos de vista: uno técnico y otro relacionado con el tacto social.
Por un lado, la propuesta tiene una motivación técnica centrada en conseguir un sistema táctil
rentable, modular y portátil. Para ello, en este trabajo se ha explorado el campo de las tecnologÃas
de detección táctil, los sistemas inteligentes de detección táctil y su aplicación en HRI. Por
otro lado, parte de la investigación se centra en el impacto afectivo del tacto social durante la
interacción humano-robot, lo que ha dado lugar a dos estudios que exploran esta idea.The sense of touch is a crucial component of human social interaction and is unique
among the five senses. As the only proximal sense, touch requires close or direct physical
contact to register information. This fact makes touch an interaction modality
full of possibilities regarding social communication. Through touch, we are able to ascertain
the other person’s intention and communicate emotions. From this idea emerges the concept
of social touch as the act of touching another person in a social context. It can serve various purposes,
such as greeting, showing affection, persuasion, and regulating emotional and physical
well-being.
Recently, the number of people interacting with artificial systems and agents has increased,
mainly due to the rise of technological devices, such as smartphones or smart speakers. Still,
these devices are limited in their interaction capabilities. To deal with this issue, recent developments
in social robotics have improved the interaction possibilities to make agents more seamless
and useful. In this sense, social robots are designed to facilitate natural interactions between
humans and artificial agents. In this context, the sense of touch is revealed as a natural interaction
vehicle that can improve HRI due to its communicative relevance. Moreover, for a social
robot, the relationship between social touch and its embodiment is direct, having a physical
body to apply or receive touches.
From a technical standpoint, tactile sensing systems have recently been the subject of further
research, mostly devoted to comprehending this sense to create intelligent systems that can
improve people’s lives. Currently, social robots are popular devices that include technologies
for touch sensing. This is motivated by the fact that robots may encounter expected or unexpected
physical contact with humans, which can either enhance or interfere with the execution
of their behaviours. There is, therefore, a need to detect human touch in robot applications.
Some methods even include touch-gesture recognition, although they often require significant
hardware deployments primarily that require multiple sensors. Additionally, the dependability
of those sensing technologies is constrained because the majority of them still struggle with issues
like false positives or poor recognition rates. Acoustic sensing, in this sense, can provide a
set of features that can alleviate the aforementioned shortcomings. Even though it is a technology that has been utilised in various research fields, it has yet to be integrated into human-robot
touch interaction.
Therefore, in thiswork,we propose theATRsystem, a smart tactile sensing system based on
acoustic sensing designed to improve human-robot social interaction. Our system is developed
to classify touch gestures and locate their source. It is also integrated into real social robotic platforms
and tested in real-world applications. Our proposal is approached from two standpoints,
one technical and the other related to social touch. Firstly, the technical motivation of thiswork
centred on achieving a cost-efficient, modular and portable tactile system. For that, we explore
the fields of touch sensing technologies, smart tactile sensing systems and their application in
HRI. On the other hand, part of the research is centred around the affective impact of touch
during human-robot interaction, resulting in two studies exploring this idea.Programa de Doctorado en IngenierÃa Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Pedro Manuel Urbano de Almeida Lima.- Secretaria: MarÃa Dolores Blanco Rojas.- Vocal: Antonio Fernández Caballer