380,118 research outputs found
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
Design and User Satisfaction of Interactive Maps for Visually Impaired People
Multimodal interactive maps are a solution for presenting spatial information
to visually impaired people. In this paper, we present an interactive
multimodal map prototype that is based on a tactile paper map, a multi-touch
screen and audio output. We first describe the different steps for designing an
interactive map: drawing and printing the tactile paper map, choice of
multi-touch technology, interaction technologies and the software architecture.
Then we describe the method used to assess user satisfaction. We provide data
showing that an interactive map - although based on a unique, elementary,
double tap interaction - has been met with a high level of user satisfaction.
Interestingly, satisfaction is independent of a user's age, previous visual
experience or Braille experience. This prototype will be used as a platform to
design advanced interactions for spatial learning
Improving Cognitive Visual-Motor Abilities in Individuals with Down Syndrome
Down syndrome causes a reduction in cognitive abilities, with visual-motor skills being
particularly affected. In this work, we have focused on this skill in order to stimulate better learning.
The proposal relies on stimulating the cognitive visual-motor skills of individuals with Down
Syndrome (DS) using exercises with a gestural interaction platform based on the KINECT sensor
named TANGO:H, the goal being to improve them. To validate the proposal, an experimental
single-case study method was designed using two groups: a control group and an experimental
one, with similar cognitive ages. Didactic exercises were provided to the experimental group using
visual cognitive stimulation. These exercises were created on the TANGO:H Designer, a platform that
was designed for gestural interaction using the KINECT sensor. As a result, TANGO:H allows for
visual-motor cognitive stimulation through the movement of hands, arms, feet and head. The “Illinois
Test of Psycholinguistic Abilities (ITPA)” was applied to both groups as a pre-test and post-test in its
four reference sections: visual comprehension, visual-motor sequential memory, visual association,
and visual integration. Two checks were made, one using the longitudinal comparison of the
pre-test/post-test of the experimental group, and another that relied on comparing the difference of the
means of the pre-test/post-test. We also used an observational methodology for the working sessions
from the experimental group. Although the statistical results do not show significant differences
between the two groups, the results of the observations exhibited an improvement in visual-motor
cognitive skills
Designing multimodal interactive systems using EyesWeb XMI
This paper introduces the EyesWeb XMI platform (for eXtended Multimodal Interaction) as a tool for fast prototyping of multimodal systems, including interconnection of multiple smart devices, e.g., smartphones. EyesWeb is endowed with a visual programming language enabling users to compose modules into applications. Modules are collected in several libraries and include support of many input devices (e.g., video, audio, motion capture, accelerometers, and physiological sensors), output devices (e.g., video, audio, 2D and 3D graphics), and synchronized multimodal data processing. Specific libraries are devoted to real-time analysis of nonverbal expressive motor and social behavior. The EyesWeb platform encompasses further tools such EyesWeb Mobile supporting the development of customized Graphical User Interfaces for specific classes of users. The paper will review the EyesWeb platform and its components, starting from its historical origins, and with a particular focus on the Human-Computer Interaction aspects
- …