39 research outputs found
VITA: A Multi-modal LLM-based System for Longitudinal, Autonomous, and Adaptive Robotic Mental Well-being Coaching
Recently, several works have explored if and how robotic coaches can promote
and maintain mental well-being in different settings. However, findings from
these studies revealed that these robotic coaches are not ready to be used and
deployed in real-world settings due to several limitations that span from
technological challenges to coaching success. To overcome these challenges,
this paper presents VITA, a novel multi-modal LLM-based system that allows
robotic coaches to autonomously adapt to the coachee's multi-modal behaviours
(facial valence and speech duration) and deliver coaching exercises in order to
promote mental well-being in adults. We identified five objectives that
correspond to the challenges in the recent literature, and we show how the VITA
system addresses these via experimental validations that include one in-lab
pilot study (N=4) that enabled us to test different robotic coach
configurations (pre-scripted, generic, and adaptive models) and inform its
design for using it in the real world, and one real-world study (N=17)
conducted in a workplace over 4 weeks. Our results show that: (i) coachees
perceived the VITA adaptive and generic configurations more positively than the
pre-scripted one, and they felt understood and heard by the adaptive robotic
coach, (ii) the VITA adaptive robotic coach kept learning successfully by
personalising to each coachee over time and did not detect any interaction
ruptures during the coaching, (iii) coachees had significant mental well-being
improvements via the VITA-based robotic coach practice. The code for the VITA
system is openly available via: https://github.com/Cambridge-AFAR/VITA-system
Appropriateness of LLM-equipped Robotic Well-being Coach Language in the Workplace: A Qualitative Evaluation
Robotic coaches have been recently investigated to promote mental well-being
in various contexts such as workplaces and homes. With the widespread use of
Large Language Models (LLMs), HRI researchers are called to consider language
appropriateness when using such generated language for robotic mental
well-being coaches in the real world. Therefore, this paper presents the first
work that investigated the language appropriateness of robot mental well-being
coach in the workplace. To this end, we conducted an empirical study that
involved 17 employees who interacted over 4 weeks with a robotic mental
well-being coach equipped with LLM-based capabilities. After the study, we
individually interviewed them and we conducted a focus group of 1.5 hours with
11 of them. The focus group consisted of: i) an ice-breaking activity, ii)
evaluation of robotic coach language appropriateness in various scenarios, and
iii) listing shoulds and shouldn'ts for designing appropriate robotic coach
language for mental well-being. From our qualitative evaluation, we found that
a language-appropriate robotic coach should (1) ask deep questions which
explore feelings of the coachees, rather than superficial questions, (2)
express and show emotional and empathic understanding of the context, and (3)
not make any assumptions without clarifying with follow-up questions to avoid
bias and stereotyping. These results can inform the design of
language-appropriate robotic coach to promote mental well-being in real-world
contexts
Multicriteria Decision Analysis and Conversational Agents for children with autism
Conversational agents has emerged as a new means of communication and social skills training for children with autism spectrum disorders (ASD), encouraging academia, industry, and therapeutic centres to investigate it further. This paper aims to develop a methodological framework based on Multicriteria Decision Analysis (MCDA) to identify the best , i.e. the most effective, conversational agent for this target group. To our knowledge, it is the first time the MCDA is applied to this specific domain. Our contribution is twofold: i) our method is an extension of traditional MCDA and we exemplify how to apply it to decision making process related to CA for person with autism: a methodological result that would be adopted for a broader range of technologies for person with impairments similar to ASD; ii) our results, based on the above mentioned method, suggest that Embodied Conversational Agent is most appropriate conversational technology to interact with children with ASD
Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?
According to the Stimulus Organism Response (SOR) theory, all human
behavioral reactions are stimulated by context, where people will process the
received stimulus and produce an appropriate reaction. This implies that in a
specific context for a given input stimulus, a person can react differently
according to their internal state and other contextual factors. Analogously, in
dyadic interactions, humans communicate using verbal and nonverbal cues, where
a broad spectrum of listeners' non-verbal reactions might be appropriate for
responding to a specific speaker behaviour. There already exists a body of work
that investigated the problem of automatically generating an appropriate
reaction for a given input. However, none attempted to automatically generate
multiple appropriate reactions in the context of dyadic interactions and
evaluate the appropriateness of those reactions using objective measures. This
paper starts by defining the facial Multiple Appropriate Reaction Generation
(fMARG) task for the first time in the literature and proposes a new set of
objective evaluation metrics to evaluate the appropriateness of the generated
reactions. The paper subsequently introduces a framework to predict, generate,
and evaluate multiple appropriate facial reactions
Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation
Generating facial reactions in a human-human dyadic interaction is complex
and highly dependent on the context since more than one facial reactions can be
appropriate for the speaker's behaviour. This has challenged existing machine
learning (ML) methods, whose training strategies enforce models to reproduce a
specific (not multiple) facial reaction from each input speaker behaviour. This
paper proposes the first multiple appropriate facial reaction generation
framework that re-formulates the one-to-many mapping facial reaction generation
problem as a one-to-one mapping problem. This means that we approach this
problem by considering the generation of a distribution of the listener's
appropriate facial reactions instead of multiple different appropriate facial
reactions, i.e., 'many' appropriate facial reaction labels are summarised as
'one' distribution label during training. Our model consists of a perceptual
processor, a cognitive processor, and a motor processor. The motor processor is
implemented with a novel Reversible Multi-dimensional Edge Graph Neural Network
(REGNN). This allows us to obtain a distribution of appropriate real facial
reactions during the training process, enabling the cognitive processor to be
trained to predict the appropriate facial reaction distribution. At the
inference stage, the REGNN decodes an appropriate facial reaction by using this
distribution as input. Experimental results demonstrate that our approach
outperforms existing models in generating more appropriate, realistic, and
synchronized facial reactions. The improved performance is largely attributed
to the proposed appropriate facial reaction distribution learning strategy and
the use of a REGNN. The code is available at
https://github.com/TongXu-05/REGNN-Multiple-Appropriate-Facial-Reaction-Generation
A Systematic Review on Reproducibility in Child-Robot Interaction
Research reproducibility - i.e., rerunning analyses on original data to
replicate the results - is paramount for guaranteeing scientific validity.
However, reproducibility is often very challenging, especially in research
fields where multi-disciplinary teams are involved, such as child-robot
interaction (CRI). This paper presents a systematic review of the last three
years (2020-2022) of research in CRI under the lens of reproducibility, by
analysing the field for transparency in reporting. Across a total of 325
studies, we found deficiencies in reporting demographics (e.g. age of
participants), study design and implementation (e.g. length of interactions),
and open data (e.g. maintaining an active code repository). From this analysis,
we distill a set of guidelines and provide a checklist to systematically report
CRI studies to help and guide research to improve reproducibility in CRI and
beyond
REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge
The Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the first benchmark test set for multi-modal information processing and to foster collaboration among the audio, visual, and audio-visual behaviour analysis and behaviour generation (a.k.a generative AI) communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under different spontaneous dyadic interaction conditions. This paper presents: (i) the novelties, contributions and guidelines of the REACT2023 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of the baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline-react2023.</p