Search CORE

21,422 research outputs found

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Author: Busso Carlos
Tao Fei
Publication venue
Publication date: 12/09/2018
Field of study

Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

arXiv.org e-Print Archive

Working Document on Gloss Ontology

Author: Coutaz Joelle
Dearle Alan
Dupuy-Chessa Sophie
Kirby Graham
Lachenal Christophe
Morrison Ron
Rey Gaetan
Zirintsis Evangelos
Publication venue
Publication date: 29/06/2010
Field of study

This document describes the Gloss Ontology. The ontology and associated class model are organised into several packages. Section 2 describes each package in detail, while Section 3 contains a summary of the whole ontology

arXiv.org e-Print Archive

St Andrews Research Repository

An Introduction to 3D User Interface Design

Author: Bowman D.
Kruijff E.
LaViola J.
Poupyrev I.
Publication venue
Publication date: 01/01/2001
Field of study

3D user interface design is a critical component of any virtual environment (VE) application. In this paper, we present a broad overview of three-dimensional (3D) interaction and user interfaces. We discuss the effect of common VE hardware devices on user interaction, as well as interaction techniques for generic 3D tasks and the use of traditional two-dimensional interaction styles in 3D environments. We divide most user interaction tasks into three categories: navigation, selection/manipulation, and system control. Throughout the paper, our focus is on presenting not only the available techniques, but also practical guidelines for 3D interaction design and widely held myths. Finally, we briefly discuss two approaches to 3D interaction design, and some example applications with complex 3D interaction requirements. We also present an annotated online bibliography as a reference companion to this article

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Crossref

Vision systems with the human in the loop

Author: Bauckhage Christian
Hanheide Marc
Kaster Thomas
Pfeiffer Michael
Sagerer Gerhard
Wrede Sebastian
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2005
Field of study

The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed

University of Lincoln Institutional Repository

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Publications at Bielefeld University

Human-Computer Interaction for BCI Games: Usability and User Experience

Author: Gürkök Hayrettin
Heylen Dirk K.J.
Mühl C.
Nijholt Antinus
Plass - Oude Bos D.
Poel Mannes
Reuderink B.
Sourin A.
van de Laar B.L.A.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2010
Field of study

Brain-computer interfaces (BCI) come with a lot of issues, such as delays, bad recognition, long training times, and cumbersome hardware. Gamers are a large potential target group for this new interaction modality, but why would healthy subjects want to use it? BCI provides a combination of information and features that no other input modality can offer. But for general acceptance of this technology, usability and user experience will need to be taken into account when designing such systems. This paper discusses the consequences of applying knowledge from Human-Computer Interaction (HCI) to the design of BCI for games. The integration of HCI with BCI is illustrated by research examples and showcases, intended to take this promising technology out of the lab. Future research needs to move beyond feasibility tests, to prove that BCI is also applicable in realistic, real-world settings

CiteSeerX

University of Twente Research Information