119 research outputs found
Human-Robot Interaction architecture for interactive and lively social robots
Mención Internacional en el título de doctorLa sociedad está experimentando un proceso de envejecimiento que puede provocar un desequilibrio
entre la población en edad de trabajar y aquella fuera del mercado de trabajo. Una de las soluciones
a este problema que se están considerando hoy en día es la introducción de robots en multiples
sectores, incluyendo el de servicios. Sin embargo, para que esto sea una solución viable, estos robots
necesitan ser capaces de interactuar con personas de manera satisfactoria, entre otras habilidades. En
el contexto de la aplicación de robots sociales al cuidado de mayores, esta tesis busca proporcionar
a un robot social las habilidades necesarias para crear interacciones entre humanos y robots que
sean naturales. En concreto, esta tesis se centra en tres problemas que deben ser solucionados: (i) el
modelado de interacciones entre humanos y robots; (ii) equipar a un robot social con las capacidades
expresivas necesarias para una comunicación satisfactoria; y (iii) darle al robot una apariencia vivaz.
La solución al problema de modelado de diálogos presentada en esta tesis propone diseñar estos
diálogos como una secuencia de elementos atómicos llamados Actos Comunicativos (CAs, por sus
siglas en inglés). Se pueden parametrizar en tiempo de ejecución para completar diferentes objetivos
comunicativos, y están equipados con mecanismos para manejar algunas de las imprecisiones que
pueden aparecer durante interacciones. Estos CAs han sido identificados a partir de la combinación
de dos dimensiones: iniciativa (si la tiene el robot o el usuario) e intención (si se pretende obtener o
proporcionar información). Estos CAs pueden ser combinados siguiendo una estructura jerárquica
para crear estructuras mas complejas que sean reutilizables. Esto simplifica el proceso para crear
nuevas interacciones, permitiendo a los desarrolladores centrarse exclusivamente en diseñar el flujo
del diálogo, sin tener que preocuparse de reimplementar otras funcionalidades que tienen que estar
presentes en todas las interacciones (como el manejo de errores, por ejemplo).
La expresividad del robot está basada en el uso de una librería de gestos, o expresiones,
multimodales predefinidos, modelados como estructuras similares a máquinas de estados. El
módulo que controla la expresividad recibe peticiones para realizar dichas expresiones, planifica
su ejecución para evitar cualquier conflicto que pueda aparecer, las carga, y comprueba que su
ejecución se complete sin problemas. El sistema es capaz también de generar estas expresiones en
tiempo de ejecución a partir de una lista de acciones unimodales (como decir una frase, o mover una
articulación). Una de las características más importantes de la arquitectura de expresividad propuesta
es la integración de una serie de métodos de modulación que pueden ser usados para modificar los
gestos del robot en tiempo de ejecución. Esto permite al robot adaptar estas expresiones en base
a circunstancias particulares (aumentando al mismo tiempo la variabilidad de la expresividad del robot), y usar un número limitado de gestos para mostrar diferentes estados internos (como el estado
emocional).
Teniendo en cuenta que ser reconocido como un ser vivo es un requisito para poder participar en
interacciones sociales, que un robot social muestre una apariencia de vivacidad es un factor clave
en interacciones entre humanos y robots. Para ello, esta tesis propone dos soluciones. El primer
método genera acciones a través de las diferentes interfaces del robot a intervalos. La frecuencia e
intensidad de estas acciones están definidas en base a una señal que representa el pulso del robot.
Dicha señal puede adaptarse al contexto de la interacción o al estado interno del robot. El segundo
método enriquece las interacciones verbales entre el robot y el usuario prediciendo los gestos no
verbales más apropiados en base al contenido del diálogo y a la intención comunicativa del robot.
Un modelo basado en aprendizaje automático recibe la transcripción del mensaje verbal del robot,
predice los gestos que deberían acompañarlo, y los sincroniza para que cada gesto empiece en el
momento preciso. Este modelo se ha desarrollado usando una combinación de un encoder diseñado
con una red neuronal Long-Short Term Memory, y un Conditional Random Field para predecir la
secuencia de gestos que deben acompañar a la frase del robot.
Todos los elementos presentados conforman el núcleo de una arquitectura de interacción
humano-robot modular que ha sido integrada en múltiples plataformas, y probada bajo diferentes
condiciones. El objetivo central de esta tesis es contribuir al área de interacción humano-robot
con una nueva solución que es modular e independiente de la plataforma robótica, y que se centra
en proporcionar a los desarrolladores las herramientas necesarias para desarrollar aplicaciones que
requieran interacciones con personas.Society is experiencing a series of demographic changes that can result in an unbalance between
the active working and non-working age populations. One of the solutions considered to mitigate
this problem is the inclusion of robots in multiple sectors, including the service sector. But for
this to be a viable solution, among other features, robots need to be able to interact with humans
successfully. This thesis seeks to endow a social robot with the abilities required for a natural
human-robot interactions. The main objective is to contribute to the body of knowledge on the area
of Human-Robot Interaction with a new, platform-independent, modular approach that focuses on
giving roboticists the tools required to develop applications that involve interactions with humans. In
particular, this thesis focuses on three problems that need to be addressed: (i) modelling interactions
between a robot and an user; (ii) endow the robot with the expressive capabilities required for a
successful communication; and (iii) endow the robot with a lively appearance.
The approach to dialogue modelling presented in this thesis proposes to model dialogues as a
sequence of atomic interaction units, called Communicative Acts, or CAs. They can be parametrized
in runtime to achieve different communicative goals, and are endowed with mechanisms oriented to
solve some of the uncertainties related to interaction. Two dimensions have been used to identify the
required CAs: initiative (the robot or the user), and intention (either retrieve information or to convey
it). These basic CAs can be combined in a hierarchical manner to create more re-usable complex
structures. This approach simplifies the creation of new interactions, by allowing developers to focus
exclusively on designing the flow of the dialogue, without having to re-implement functionalities
that are common to all dialogues (like error handling, for example).
The expressiveness of the robot is based on the use of a library of predefined multimodal gestures,
or expressions, modelled as state machines. The module managing the expressiveness receives requests
for performing gestures, schedules their execution in order to avoid any possible conflict that might
arise, loads them, and ensures that their execution goes without problems. The proposed approach
is also able to generate expressions in runtime based on a list of unimodal actions (an utterance,
the motion of a limb, etc...). One of the key features of the proposed expressiveness management
approach is the integration of a series of modulation techniques that can be used to modify the
robot’s expressions in runtime. This would allow the robot to adapt them to the particularities of a
given situation (which would also increase the variability of the robot expressiveness), and to display
different internal states with the same expressions. Considering that being recognized as a living being is a requirement for engaging in social
encounters, the perception of a social robot as a living entity is a key requirement to foster
human-robot interactions. In this dissertation, two approaches have been proposed. The first
method generates actions for the different interfaces of the robot at certain intervals. The frequency
and intensity of these actions are defined by a signal that represents the pulse of the robot, which can
be adapted to the context of the interaction or the internal state of the robot. The second method
enhances the robot’s utterance by predicting the appropriate non-verbal expressions that should
accompany them, according to the content of the robot’s message, as well as its communicative
intention. A deep learning model receives the transcription of the robot’s utterances, predicts
which expressions should accompany it, and synchronizes them, so each gesture selected starts at
the appropriate time. The model has been developed using a combination of a Long-Short Term
Memory network-based encoder and a Conditional Random Field for generating a sequence of
gestures that are combined with the robot’s utterance.
All the elements presented above conform the core of a modular Human-Robot Interaction
architecture that has been integrated in multiple platforms, and tested under different conditions.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Fernando Torres Medina.- Secretario: Concepción Alicia Monje Micharet.- Vocal: Amirabdollahian Farshi
Enhancing the use of Haptic Devices in Education and Entertainment
This research was part of the two-years Horizon 2020 European Project "weDRAW". The aim of the project was that "specific sensory systems have specific roles to learn specific concepts". This work explores the use of the haptic modality, stimulated by the means of force-feedback devices, to convey abstract concepts inside virtual reality. After a review of the current use of haptic devices in education, available haptic software and game engines, we focus on the implementation of an haptic plugin for game engines (HPGE, based on state of the art rendering library CHAI3D) and its evaluation in human perception experiments and multisensory integration
Multimodality in {VR}: {A} Survey
Virtual reality has the potential to change the way we create and consume content in our everyday life. Entertainment, training, design and manufacturing, communication, or advertising are all applications that already benefit from this new medium reaching consumer level. VR is inherently different from traditional media: it offers a more immersive experience, and has the ability to elicit a sense of presence through the place and plausibility illusions. It also gives the user unprecedented capabilities to explore their environment, in contrast with traditional media. In VR, like in the real world, users integrate the multimodal sensory information they receive to create a unified perception of the virtual world. Therefore, the sensory cues that are available in a virtual environment can be leveraged to enhance the final experience. This may include increasing realism, or the sense of presence; predicting or guiding the attention of the user through the experience; or increasing their performance if the experience involves the completion of certain tasks. In this state-of-the-art report, we survey the body of work addressing multimodality in virtual reality, its role and benefits in the final user experience. The works here reviewed thus encompass several fields of research, including computer graphics, human computer interaction, or psychology and perception. Additionally, we give an overview of different applications that leverage multimodal input in areas such as medicine, training and education, or entertainment; we include works in which the integration of multiple sensory information yields significant improvements, demonstrating how multimodality can play a fundamental role in the way VR systems are designed, and VR experiences created and consumed
Emotion Recognition by Video: A review
Video emotion recognition is an important branch of affective computing, and
its solutions can be applied in different fields such as human-computer
interaction (HCI) and intelligent medical treatment. Although the number of
papers published in the field of emotion recognition is increasing, there are
few comprehensive literature reviews covering related research on video emotion
recognition. Therefore, this paper selects articles published from 2015 to 2023
to systematize the existing trends in video emotion recognition in related
studies. In this paper, we first talk about two typical emotion models, then we
talk about databases that are frequently utilized for video emotion
recognition, including unimodal databases and multimodal databases. Next, we
look at and classify the specific structure and performance of modern unimodal
and multimodal video emotion recognition methods, talk about the benefits and
drawbacks of each, and then we compare them in detail in the tables. Further,
we sum up the primary difficulties right now looked by video emotion
recognition undertakings and point out probably the most encouraging future
headings, such as establishing an open benchmark database and better multimodal
fusion strategys. The essential objective of this paper is to assist scholarly
and modern scientists with keeping up to date with the most recent advances and
new improvements in this speedy, high-influence field of video emotion
recognition
Multimodality in VR: A survey
Virtual reality (VR) is rapidly growing, with the potential to change the way we create and consume content. In VR, users integrate multimodal sensory information they receive, to create a unified perception of the virtual world. In this survey, we review the body of work addressing multimodality in VR, and its role and benefits in user experience, together with different applications that leverage multimodality in many disciplines. These works thus encompass several fields of research, and demonstrate that multimodality plays a fundamental role in VR; enhancing the experience, improving overall performance, and yielding unprecedented abilities in skill and knowledge transfer
Real-time generation and adaptation of social companion robot behaviors
Social robots will be part of our future homes.
They will assist us in everyday tasks, entertain us, and provide helpful advice.
However, the technology still faces challenges that must be overcome to equip the machine with social competencies and make it a socially intelligent and accepted housemate.
An essential skill of every social robot is verbal and non-verbal communication.
In contrast to voice assistants, smartphones, and smart home technology, which are already part of many people's lives today, social robots have an embodiment that raises expectations towards the machine.
Their anthropomorphic or zoomorphic appearance suggests they can communicate naturally with speech, gestures, or facial expressions and understand corresponding human behaviors.
In addition, robots also need to consider individual users' preferences: everybody is shaped by their culture, social norms, and life experiences, resulting in different expectations towards communication with a robot.
However, robots do not have human intuition - they must be equipped with the corresponding algorithmic solutions to these problems.
This thesis investigates the use of reinforcement learning to adapt the robot's verbal and non-verbal communication to the user's needs and preferences.
Such non-functional adaptation of the robot's behaviors primarily aims to improve the user experience and the robot's perceived social intelligence.
The literature has not yet provided a holistic view of the overall challenge: real-time adaptation requires control over the robot's multimodal behavior generation, an understanding of human feedback, and an algorithmic basis for machine learning.
Thus, this thesis develops a conceptual framework for designing real-time non-functional social robot behavior adaptation with reinforcement learning.
It provides a higher-level view from the system designer's perspective and guidance from the start to the end.
It illustrates the process of modeling, simulating, and evaluating such adaptation processes.
Specifically, it guides the integration of human feedback and social signals to equip the machine with social awareness.
The conceptual framework is put into practice for several use cases, resulting in technical proofs of concept and research prototypes.
They are evaluated in the lab and in in-situ studies.
These approaches address typical activities in domestic environments, focussing on the robot's expression of personality, persona, politeness, and humor.
Within this scope, the robot adapts its spoken utterances, prosody, and animations based on human explicit or implicit feedback.Soziale Roboter werden Teil unseres zukünftigen Zuhauses sein.
Sie werden uns bei alltäglichen Aufgaben unterstützen, uns unterhalten und uns mit hilfreichen Ratschlägen versorgen.
Noch gibt es allerdings technische Herausforderungen, die zunächst überwunden werden müssen, um die Maschine mit sozialen Kompetenzen auszustatten und zu einem sozial intelligenten und akzeptierten Mitbewohner zu machen.
Eine wesentliche Fähigkeit eines jeden sozialen Roboters ist die verbale und nonverbale Kommunikation.
Im Gegensatz zu Sprachassistenten, Smartphones und Smart-Home-Technologien, die bereits heute Teil des Lebens vieler Menschen sind, haben soziale Roboter eine Verkörperung, die Erwartungen an die Maschine weckt.
Ihr anthropomorphes oder zoomorphes Aussehen legt nahe, dass sie in der Lage sind, auf natürliche Weise mit Sprache, Gestik oder Mimik zu kommunizieren, aber auch entsprechende menschliche Kommunikation zu verstehen.
Darüber hinaus müssen Roboter auch die individuellen Vorlieben der Benutzer berücksichtigen.
So ist jeder Mensch von seiner Kultur, sozialen Normen und eigenen Lebenserfahrungen geprägt, was zu unterschiedlichen Erwartungen an die Kommunikation mit einem Roboter führt.
Roboter haben jedoch keine menschliche Intuition - sie müssen mit entsprechenden Algorithmen für diese Probleme ausgestattet werden.
In dieser Arbeit wird der Einsatz von bestärkendem Lernen untersucht, um die verbale und nonverbale Kommunikation des Roboters an die Bedürfnisse und Vorlieben des Benutzers anzupassen.
Eine solche nicht-funktionale Anpassung des Roboterverhaltens zielt in erster Linie darauf ab, das Benutzererlebnis und die wahrgenommene soziale Intelligenz des Roboters zu verbessern.
Die Literatur bietet bisher keine ganzheitliche Sicht auf diese Herausforderung: Echtzeitanpassung erfordert die Kontrolle über die multimodale Verhaltenserzeugung des Roboters, ein Verständnis des menschlichen Feedbacks und eine algorithmische Basis für maschinelles Lernen.
Daher wird in dieser Arbeit ein konzeptioneller Rahmen für die Gestaltung von nicht-funktionaler Anpassung der Kommunikation sozialer Roboter mit bestärkendem Lernen entwickelt.
Er bietet eine übergeordnete Sichtweise aus der Perspektive des Systemdesigners und eine Anleitung vom Anfang bis zum Ende.
Er veranschaulicht den Prozess der Modellierung, Simulation und Evaluierung solcher Anpassungsprozesse.
Insbesondere wird auf die Integration von menschlichem Feedback und sozialen Signalen eingegangen, um die Maschine mit sozialem Bewusstsein auszustatten.
Der konzeptionelle Rahmen wird für mehrere Anwendungsfälle in die Praxis umgesetzt, was zu technischen Konzeptnachweisen und Forschungsprototypen führt, die in Labor- und In-situ-Studien evaluiert werden.
Diese Ansätze befassen sich mit typischen Aktivitäten in häuslichen Umgebungen, wobei der Schwerpunkt auf dem Ausdruck der Persönlichkeit, dem Persona, der Höflichkeit und dem Humor des Roboters liegt.
In diesem Rahmen passt der Roboter seine Sprache, Prosodie, und Animationen auf Basis expliziten oder impliziten menschlichen Feedbacks an
- …