380,118 research outputs found

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    Design and User Satisfaction of Interactive Maps for Visually Impaired People

    Get PDF
    Multimodal interactive maps are a solution for presenting spatial information to visually impaired people. In this paper, we present an interactive multimodal map prototype that is based on a tactile paper map, a multi-touch screen and audio output. We first describe the different steps for designing an interactive map: drawing and printing the tactile paper map, choice of multi-touch technology, interaction technologies and the software architecture. Then we describe the method used to assess user satisfaction. We provide data showing that an interactive map - although based on a unique, elementary, double tap interaction - has been met with a high level of user satisfaction. Interestingly, satisfaction is independent of a user's age, previous visual experience or Braille experience. This prototype will be used as a platform to design advanced interactions for spatial learning

    Improving Cognitive Visual-Motor Abilities in Individuals with Down Syndrome

    Get PDF
    Down syndrome causes a reduction in cognitive abilities, with visual-motor skills being particularly affected. In this work, we have focused on this skill in order to stimulate better learning. The proposal relies on stimulating the cognitive visual-motor skills of individuals with Down Syndrome (DS) using exercises with a gestural interaction platform based on the KINECT sensor named TANGO:H, the goal being to improve them. To validate the proposal, an experimental single-case study method was designed using two groups: a control group and an experimental one, with similar cognitive ages. Didactic exercises were provided to the experimental group using visual cognitive stimulation. These exercises were created on the TANGO:H Designer, a platform that was designed for gestural interaction using the KINECT sensor. As a result, TANGO:H allows for visual-motor cognitive stimulation through the movement of hands, arms, feet and head. The “Illinois Test of Psycholinguistic Abilities (ITPA)” was applied to both groups as a pre-test and post-test in its four reference sections: visual comprehension, visual-motor sequential memory, visual association, and visual integration. Two checks were made, one using the longitudinal comparison of the pre-test/post-test of the experimental group, and another that relied on comparing the difference of the means of the pre-test/post-test. We also used an observational methodology for the working sessions from the experimental group. Although the statistical results do not show significant differences between the two groups, the results of the observations exhibited an improvement in visual-motor cognitive skills

    Designing multimodal interactive systems using EyesWeb XMI

    Get PDF
    This paper introduces the EyesWeb XMI platform (for eXtended Multimodal Interaction) as a tool for fast prototyping of multimodal systems, including interconnection of multiple smart devices, e.g., smartphones. EyesWeb is endowed with a visual programming language enabling users to compose modules into applications. Modules are collected in several libraries and include support of many input devices (e.g., video, audio, motion capture, accelerometers, and physiological sensors), output devices (e.g., video, audio, 2D and 3D graphics), and synchronized multimodal data processing. Specific libraries are devoted to real-time analysis of nonverbal expressive motor and social behavior. The EyesWeb platform encompasses further tools such EyesWeb Mobile supporting the development of customized Graphical User Interfaces for specific classes of users. The paper will review the EyesWeb platform and its components, starting from its historical origins, and with a particular focus on the Human-Computer Interaction aspects
    • …
    corecore