Search CORE

380,118 research outputs found

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

Author: Beskow Jonas
Salvi Giampiero
Stefanov Kalin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

NORA - Norwegian Open Research Archives

Design and User Satisfaction of Interactive Maps for Visually Impaired People

Author: A. Bangor
A. Brock
A.B. Wagner
A.F. Tatham
B. New
D.A. Paladugu
G. Casiez
J. Brooke
J.A. Miele
K. Minatani
L. Zeng
M. Buisson
M. Rice
M.C. Buzzi
P. Côté-Giroux
S. Shimada
S.K. Kane
Y. Hatwell
Y. Hatwell
Z. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Multimodal interactive maps are a solution for presenting spatial information to visually impaired people. In this paper, we present an interactive multimodal map prototype that is based on a tactile paper map, a multi-touch screen and audio output. We first describe the different steps for designing an interactive map: drawing and printing the tactile paper map, choice of multi-touch technology, interaction technologies and the software architecture. Then we describe the method used to assess user satisfaction. We provide data showing that an interactive map - although based on a unique, elementary, double tap interaction - has been met with a high level of user satisfaction. Interestingly, satisfaction is independent of a user's age, previous visual experience or Braille experience. This prototype will be used as a platform to design advanced interactions for spatial learning

arXiv.org e-Print Archive

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Improving Cognitive Visual-Motor Abilities in Individuals with Down Syndrome

Author: Carina S. González-González
Nuria Reyes-Alonso
Pablo V. Torres-Carrión
Pedro A. Toledo-Delgado
Rosa Gil-Iranzo
Selene Hernández-Morales
Vanesa Muñoz-Cruz
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Down syndrome causes a reduction in cognitive abilities, with visual-motor skills being particularly affected. In this work, we have focused on this skill in order to stimulate better learning. The proposal relies on stimulating the cognitive visual-motor skills of individuals with Down Syndrome (DS) using exercises with a gestural interaction platform based on the KINECT sensor named TANGO:H, the goal being to improve them. To validate the proposal, an experimental single-case study method was designed using two groups: a control group and an experimental one, with similar cognitive ages. Didactic exercises were provided to the experimental group using visual cognitive stimulation. These exercises were created on the TANGO:H Designer, a platform that was designed for gestural interaction using the KINECT sensor. As a result, TANGO:H allows for visual-motor cognitive stimulation through the movement of hands, arms, feet and head. The “Illinois Test of Psycholinguistic Abilities (ITPA)” was applied to both groups as a pre-test and post-test in its four reference sections: visual comprehension, visual-motor sequential memory, visual association, and visual integration. Two checks were made, one using the longitudinal comparison of the pre-test/post-test of the experimental group, and another that relied on comparing the difference of the means of the pre-test/post-test. We also used an observational methodology for the working sessions from the experimental group. Although the statistical results do not show significant differences between the two groups, the results of the observations exhibited an improvement in visual-motor cognitive skills

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Obert UdL

Designing multimodal interactive systems using EyesWeb XMI

Author: Alborno Paolo
Camurri Antonio
Coletta Paolo
Ghisio Simone
Mancini Maurizio
Massari Alberto
Niewiadomski Radoslaw
Piana Stefano
Sagoleo Roberto
Volpe Gualtiero
Publication venue: CEUR-WS
Publication date: 01/01/2016
Field of study

This paper introduces the EyesWeb XMI platform (for eXtended Multimodal Interaction) as a tool for fast prototyping of multimodal systems, including interconnection of multiple smart devices, e.g., smartphones. EyesWeb is endowed with a visual programming language enabling users to compose modules into applications. Modules are collected in several libraries and include support of many input devices (e.g., video, audio, motion capture, accelerometers, and physiological sensors), output devices (e.g., video, audio, 2D and 3D graphics), and synchronized multimodal data processing. Specific libraries are devoted to real-time analysis of nonverbal expressive motor and social behavior. The EyesWeb platform encompasses further tools such EyesWeb Mobile supporting the development of customized Graphical User Interfaces for specific classes of users. The paper will review the EyesWeb platform and its components, starting from its historical origins, and with a particular focus on the Human-Computer Interaction aspects

Archivio istituzionale della ricerca - Università di Genova

Archivio della ricerca- Università di Roma La Sapienza