39 research outputs found

    VITA: A Multi-modal LLM-based System for Longitudinal, Autonomous, and Adaptive Robotic Mental Well-being Coaching

    Full text link
    Recently, several works have explored if and how robotic coaches can promote and maintain mental well-being in different settings. However, findings from these studies revealed that these robotic coaches are not ready to be used and deployed in real-world settings due to several limitations that span from technological challenges to coaching success. To overcome these challenges, this paper presents VITA, a novel multi-modal LLM-based system that allows robotic coaches to autonomously adapt to the coachee's multi-modal behaviours (facial valence and speech duration) and deliver coaching exercises in order to promote mental well-being in adults. We identified five objectives that correspond to the challenges in the recent literature, and we show how the VITA system addresses these via experimental validations that include one in-lab pilot study (N=4) that enabled us to test different robotic coach configurations (pre-scripted, generic, and adaptive models) and inform its design for using it in the real world, and one real-world study (N=17) conducted in a workplace over 4 weeks. Our results show that: (i) coachees perceived the VITA adaptive and generic configurations more positively than the pre-scripted one, and they felt understood and heard by the adaptive robotic coach, (ii) the VITA adaptive robotic coach kept learning successfully by personalising to each coachee over time and did not detect any interaction ruptures during the coaching, (iii) coachees had significant mental well-being improvements via the VITA-based robotic coach practice. The code for the VITA system is openly available via: https://github.com/Cambridge-AFAR/VITA-system

    Appropriateness of LLM-equipped Robotic Well-being Coach Language in the Workplace: A Qualitative Evaluation

    Full text link
    Robotic coaches have been recently investigated to promote mental well-being in various contexts such as workplaces and homes. With the widespread use of Large Language Models (LLMs), HRI researchers are called to consider language appropriateness when using such generated language for robotic mental well-being coaches in the real world. Therefore, this paper presents the first work that investigated the language appropriateness of robot mental well-being coach in the workplace. To this end, we conducted an empirical study that involved 17 employees who interacted over 4 weeks with a robotic mental well-being coach equipped with LLM-based capabilities. After the study, we individually interviewed them and we conducted a focus group of 1.5 hours with 11 of them. The focus group consisted of: i) an ice-breaking activity, ii) evaluation of robotic coach language appropriateness in various scenarios, and iii) listing shoulds and shouldn'ts for designing appropriate robotic coach language for mental well-being. From our qualitative evaluation, we found that a language-appropriate robotic coach should (1) ask deep questions which explore feelings of the coachees, rather than superficial questions, (2) express and show emotional and empathic understanding of the context, and (3) not make any assumptions without clarifying with follow-up questions to avoid bias and stereotyping. These results can inform the design of language-appropriate robotic coach to promote mental well-being in real-world contexts

    Multicriteria Decision Analysis and Conversational Agents for children with autism

    Get PDF
    Conversational agents has emerged as a new means of communication and social skills training for children with autism spectrum disorders (ASD), encouraging academia, industry, and therapeutic centres to investigate it further. This paper aims to develop a methodological framework based on Multicriteria Decision Analysis (MCDA) to identify the best , i.e. the most effective, conversational agent for this target group. To our knowledge, it is the first time the MCDA is applied to this specific domain. Our contribution is twofold: i) our method is an extension of traditional MCDA and we exemplify how to apply it to decision making process related to CA for person with autism: a methodological result that would be adopted for a broader range of technologies for person with impairments similar to ASD; ii) our results, based on the above mentioned method, suggest that Embodied Conversational Agent is most appropriate conversational technology to interact with children with ASD

    Multiple Appropriate Facial Reaction Generation in Dyadic Interaction Settings: What, Why and How?

    Full text link
    According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions, humans communicate using verbal and nonverbal cues, where a broad spectrum of listeners' non-verbal reactions might be appropriate for responding to a specific speaker behaviour. There already exists a body of work that investigated the problem of automatically generating an appropriate reaction for a given input. However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures. This paper starts by defining the facial Multiple Appropriate Reaction Generation (fMARG) task for the first time in the literature and proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions. The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions

    Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation

    Full text link
    Generating facial reactions in a human-human dyadic interaction is complex and highly dependent on the context since more than one facial reactions can be appropriate for the speaker's behaviour. This has challenged existing machine learning (ML) methods, whose training strategies enforce models to reproduce a specific (not multiple) facial reaction from each input speaker behaviour. This paper proposes the first multiple appropriate facial reaction generation framework that re-formulates the one-to-many mapping facial reaction generation problem as a one-to-one mapping problem. This means that we approach this problem by considering the generation of a distribution of the listener's appropriate facial reactions instead of multiple different appropriate facial reactions, i.e., 'many' appropriate facial reaction labels are summarised as 'one' distribution label during training. Our model consists of a perceptual processor, a cognitive processor, and a motor processor. The motor processor is implemented with a novel Reversible Multi-dimensional Edge Graph Neural Network (REGNN). This allows us to obtain a distribution of appropriate real facial reactions during the training process, enabling the cognitive processor to be trained to predict the appropriate facial reaction distribution. At the inference stage, the REGNN decodes an appropriate facial reaction by using this distribution as input. Experimental results demonstrate that our approach outperforms existing models in generating more appropriate, realistic, and synchronized facial reactions. The improved performance is largely attributed to the proposed appropriate facial reaction distribution learning strategy and the use of a REGNN. The code is available at https://github.com/TongXu-05/REGNN-Multiple-Appropriate-Facial-Reaction-Generation

    A Systematic Review on Reproducibility in Child-Robot Interaction

    Full text link
    Research reproducibility - i.e., rerunning analyses on original data to replicate the results - is paramount for guaranteeing scientific validity. However, reproducibility is often very challenging, especially in research fields where multi-disciplinary teams are involved, such as child-robot interaction (CRI). This paper presents a systematic review of the last three years (2020-2022) of research in CRI under the lens of reproducibility, by analysing the field for transparency in reporting. Across a total of 325 studies, we found deficiencies in reporting demographics (e.g. age of participants), study design and implementation (e.g. length of interactions), and open data (e.g. maintaining an active code repository). From this analysis, we distill a set of guidelines and provide a checklist to systematically report CRI studies to help and guide research to improve reproducibility in CRI and beyond

    REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge

    Get PDF
    The Multiple Appropriate Facial Reaction Generation Challenge (REACT2023) is the first competition event focused on evaluating multimedia processing and machine learning techniques for generating human-appropriate facial reactions in various dyadic interaction scenarios, with all participants competing strictly under the same conditions. The goal of the challenge is to provide the first benchmark test set for multi-modal information processing and to foster collaboration among the audio, visual, and audio-visual behaviour analysis and behaviour generation (a.k.a generative AI) communities, to compare the relative merits of the approaches to automatic appropriate facial reaction generation under different spontaneous dyadic interaction conditions. This paper presents: (i) the novelties, contributions and guidelines of the REACT2023 challenge; (ii) the dataset utilized in the challenge; and (iii) the performance of the baseline systems on the two proposed sub-challenges: Offline Multiple Appropriate Facial Reaction Generation and Online Multiple Appropriate Facial Reaction Generation, respectively. The challenge baseline code is publicly available at https://github.com/reactmultimodalchallenge/baseline-react2023.</p
    corecore