801 research outputs found
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
A Multi-Party Conversation-Based Effective Robotic Navigation System for Futuristic Vehicle
In response to the growing need for advanced in-car navigation systems that prioritize user experience and aim to reduce driver cognitive workload, this study addresses the research question of how to enhance the interaction between drivers and navigation systems. The focus is on minimizing distraction while providing personalized and geographically relevant information. The research introduces an innovative in-car robotic navigation system comprising three subsystem models: geofencing,personalization, and conversation. The dynamic geofencing model acquires geographic details related to the user's current location and provides information about required destinations. The personalization model tailors suggestions based on user preferences, while the conversation model, employing two virtual robots, fosters interactive multiparty conversations aligned with the driver's interests. The study's scope is specifically confined to interactive conversations centered on nearby restaurants and the driver's dietary preferences. Evaluation of the system indicates a notable prevalence of neutral expressions amongparticipants during interaction, suggesting that the implemented system successfully mitigates cognitive workload. Participants in the experiments express higher usability and interactivity levels, as evidenced by feedback collected at the study's conclusion, affirming the system's effectiveness in enhancing the user experience while maintaining a driver-friendly environment.
Keywords: Human-Robot Interaction, Multiparty Conversation, In-Car Navigatio
To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills
Communicating shapes our social word. For a robot to be considered social and
being consequently integrated in our social environment it is fundamental to
understand some of the dynamics that rule human-human communication. In this
work, we tackle the problem of Addressee Estimation, the ability to understand
an utterance's addressee, by interpreting and exploiting non-verbal bodily cues
from the speaker. We do so by implementing an hybrid deep learning model
composed of convolutional layers and LSTM cells taking as input images
portraying the face of the speaker and 2D vectors of the speaker's body
posture. Our implementation choices were guided by the aim to develop a model
that could be deployed on social robots and be efficient in ecological
scenarios. We demonstrate that our model is able to solve the Addressee
Estimation problem in terms of addressee localisation in space, from a robot
ego-centric point of view.Comment: Accepted version of a paper published at 2023 International Joint
Conference on Neural Networks (IJCNN). Please find the published version and
info to cite the paper at https://doi.org/10.1109/IJCNN54540.2023.10191452 .
10 pages, 8 Figures, 3 Table
- …