Search CORE

801 research outputs found

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

Author: Beskow Jonas
Salvi Giampiero
Stefanov Kalin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

NORA - Norwegian Open Research Archives

A Multi-Party Conversation-Based Effective Robotic Navigation System for Futuristic Vehicle

Author: D.N.M. Hettiarachchi
Ravindra De Silva
Udaka Ayas Manawadu
Yasith R Wanigarathna
Publication venue: University of Sri Jayewardenepura
Publication date: 31/12/2023
Field of study

In response to the growing need for advanced in-car navigation systems that prioritize user experience and aim to reduce driver cognitive workload, this study addresses the research question of how to enhance the interaction between drivers and navigation systems. The focus is on minimizing distraction while providing personalized and geographically relevant information. The research introduces an innovative in-car robotic navigation system comprising three subsystem models: geofencing,personalization, and conversation. The dynamic geofencing model acquires geographic details related to the user's current location and provides information about required destinations. The personalization model tailors suggestions based on user preferences, while the conversation model, employing two virtual robots, fosters interactive multiparty conversations aligned with the driver's interests. The study's scope is specifically confined to interactive conversations centered on nearby restaurants and the driver's dietary preferences. Evaluation of the system indicates a notable prevalence of neutral expressions amongparticipants during interaction, suggesting that the implemented system successfully mitigates cognitive workload. Participants in the experiments express higher usability and interactivity levels, as evidenced by feedback collected at the study's conclusion, affirming the system's effectiveness in enhancing the user experience while maintaining a driver-friendly environment. Keywords: Human-Robot Interaction, Multiparty Conversation, In-Car Navigatio

University of Sri Jayewardenepura: Journals & Proceedings

To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

Author: Cangelosi Angelo
Mazzola Carlo
Rea Francesco
Romeo Marta
Sciutti Alessandra
Publication venue
Publication date: 02/08/2023
Field of study

Communicating shapes our social word. For a robot to be considered social and being consequently integrated in our social environment it is fundamental to understand some of the dynamics that rule human-human communication. In this work, we tackle the problem of Addressee Estimation, the ability to understand an utterance's addressee, by interpreting and exploiting non-verbal bodily cues from the speaker. We do so by implementing an hybrid deep learning model composed of convolutional layers and LSTM cells taking as input images portraying the face of the speaker and 2D vectors of the speaker's body posture. Our implementation choices were guided by the aim to develop a model that could be deployed on social robots and be efficient in ecological scenarios. We demonstrate that our model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a robot ego-centric point of view.Comment: Accepted version of a paper published at 2023 International Joint Conference on Neural Networks (IJCNN). Please find the published version and info to cite the paper at https://doi.org/10.1109/IJCNN54540.2023.10191452 . 10 pages, 8 Figures, 3 Table

arXiv.org e-Print Archive

Heriot Watt Pure