613 research outputs found
GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System
This technical paper introduces a chatting robot system that utilizes recent
advancements in large-scale language models (LLMs) such as GPT-3 and ChatGPT.
The system is integrated with a co-speech gesture generation system, which
selects appropriate gestures based on the conceptual meaning of speech. Our
motivation is to explore ways of utilizing the recent progress in LLMs for
practical robotic applications, which benefits the development of both chatbots
and LLMs. Specifically, it enables the development of highly responsive chatbot
systems by leveraging LLMs and adds visual effects to the user interface of
LLMs as an additional value. The source code for the system is available on
GitHub for our in-house robot
(https://github.com/microsoft/LabanotationSuite/tree/master/MSRAbotChatSimulation)
and GitHub for Toyota HSR
(https://github.com/microsoft/GPT-Enabled-HSR-CoSpeechGestures)
A Review of Verbal and Non-Verbal Human-Robot Interactive Communication
In this paper, an overview of human-robot interactive communication is
presented, covering verbal as well as non-verbal aspects of human-robot
interaction. Following a historical introduction, and motivation towards fluid
human-robot communication, ten desiderata are proposed, which provide an
organizational axis both of recent as well as of future research on human-robot
communication. Then, the ten desiderata are examined in detail, culminating to
a unifying discussion, and a forward-looking conclusion
A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents
Embodied Conversational Agents (ECA) take on different forms, including
virtual avatars or physical agents, such as a humanoid robot. ECAs are often
designed to produce nonverbal behaviour to complement or enhance its verbal
communication. One form of nonverbal behaviour is co-speech gesturing, which
involves movements that the agent makes with its arms and hands that is paired
with verbal communication. Co-speech gestures for ECAs can be created using
different generation methods, such as rule-based and data-driven processes.
However, reports on gesture generation methods use a variety of evaluation
measures, which hinders comparison. To address this, we conducted a systematic
review on co-speech gesture generation methods for iconic, metaphoric, deictic
or beat gestures, including their evaluation methods. We reviewed 22 studies
that had an ECA with a human-like upper body that used co-speech gesturing in a
social human-agent interaction, including a user study to evaluate its
performance. We found most studies used a within-subject design and relied on a
form of subjective evaluation, but lacked a systematic approach. Overall,
methodological quality was low-to-moderate and few systematic conclusions could
be drawn. We argue that the field requires rigorous and uniform tools for the
evaluation of co-speech gesture systems. We have proposed recommendations for
future empirical evaluation, including standardised phrases and test scenarios
to test generative models. We have proposed a research checklist that can be
used to report relevant information for the evaluation of generative models as
well as to evaluate co-speech gesture use.Comment: 9 page
Multimodal agents for cooperative interaction
2020 Fall.Includes bibliographical references.Embodied virtual agents offer the potential to interact with a computer in a more natural manner, similar to how we interact with other people. To reach this potential requires multimodal interaction, including both speech and gesture. This project builds on earlier work at Colorado State University and Brandeis University on just such a multimodal system, referred to as Diana. I designed and developed a new software architecture to directly address some of the difficulties of the earlier system, particularly with regard to asynchronous communication, e.g., interrupting the agent after it has begun to act. Various other enhancements were made to the agent systems, including the model itself, as well as speech recognition, speech synthesis, motor control, and gaze control. Further refactoring and new code were developed to achieve software engineering goals that are not outwardly visible, but no less important: decoupling, testability, improved networking, and independence from a particular agent model. This work, combined with the effort of others in the lab, has produced a "version 2'' Diana system that is well positioned to serve the lab's research needs in the future. In addition, in order to pursue new research opportunities related to developmental and intervention science, a "Faelyn Fox'' agent was developed. This is a different model, with a simplified cognitive architecture, and a system for defining an experimental protocol (for example, a toy-sorting task) based on Unity's visual state machine editor. This version too lays a solid foundation for future research
Human-Robot interaction with low computational-power humanoids
This article investigates the possibilities of human-humanoid interaction with robots whose computational power is limited. The project has been carried during a year of work at the Computer and Robot Vision Laboratory (VisLab), part of the Institute for Systems and Robotics in Lisbon, Portugal.
Communication, the basis of interaction, is simultaneously visual, verbal, and gestural. The robot's algorithm provides users a natural language communication, being able to catch and understand the person’s needs and feelings. The design of the system should, consequently, give it the capability to dialogue with people in a way that makes possible the understanding of their needs. The whole experience, to be natural, is independent from the GUI, used just as an auxiliary instrument. Furthermore, the humanoid can communicate with gestures, touch and visual perceptions and feedbacks. This creates a totally new type of interaction where the robot is not just a machine to use, but a figure to interact and talk with: a social robot
Conversational affective social robots for ageing and dementia support
Socially assistive robots (SAR) hold significant potential to assist older adults and people with dementia in human engagement and clinical contexts by supporting mental health and independence at home. While SAR research has recently experienced prolific growth, long-term trust, clinical translation and patient benefit remain immature. Affective human-robot interactions are unresolved and the deployment of robots with conversational abilities is fundamental for robustness and humanrobot engagement. In this paper, we review the state of the art within the past two decades, design trends, and current applications of conversational affective SAR for ageing and dementia support. A horizon scanning of AI voice technology for healthcare, including ubiquitous smart speakers, is further introduced to address current gaps inhibiting home use. We discuss the role of user-centred approaches in the design of voice systems, including the capacity to handle communication breakdowns for effective use by target populations. We summarise the state of development in interactions using speech and natural language processing, which forms a baseline for longitudinal health monitoring and cognitive assessment. Drawing from this foundation, we identify open challenges and propose future directions to advance conversational affective social robots for: 1) user engagement, 2) deployment in real-world settings, and 3) clinical translation
Modular Customizable ROS-Based Framework for Rapid Development of Social Robots
Developing socially competent robots requires tight integration of robotics,
computer vision, speech processing, and web technologies. We present the
Socially-interactive Robot Software platform (SROS), an open-source framework
addressing this need through a modular layered architecture. SROS bridges the
Robot Operating System (ROS) layer for mobility with web and Android interface
layers using standard messaging and APIs. Specialized perceptual and
interactive skills are implemented as ROS services for reusable deployment on
any robot. This facilitates rapid prototyping of collaborative behaviors that
synchronize perception with physical actuation. We experimentally validated
core SROS technologies including computer vision, speech processing, and GPT2
autocomplete speech implemented as plug-and-play ROS services. Modularity is
demonstrated through the successful integration of an additional ROS package,
without changes to hardware or software platforms. The capabilities enabled
confirm SROS's effectiveness in developing socially interactive robots through
synchronized cross-domain interaction. Through demonstrations showing
synchronized multimodal behaviors on an example platform, we illustrate how the
SROS architectural approach addresses shortcomings of previous work by lowering
barriers for researchers to advance the state-of-the-art in adaptive,
collaborative customizable human-robot systems through novel applications
integrating perceptual and social abilities
- …