21,422 research outputs found
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech
processing systems, including automatic speech recognition (ASR). SAD is
particularly difficult in environments with acoustic noise. A practical
solution is to incorporate visual information, increasing the robustness of the
SAD approach. An audiovisual system has the advantage of being robust to
different speech modes (e.g., whisper speech) or background noise. Recent
advances in audiovisual speech processing using deep learning have opened
opportunities to capture in a principled way the temporal relationships between
acoustic and visual features. This study explores this idea proposing a
\emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach
models the temporal dynamic of the sequential audiovisual data, improving the
accuracy and robustness of the proposed SAD system. Instead of estimating
hand-crafted features, the study investigates an end-to-end training approach,
where acoustic and visual features are directly learned from the raw data
during training. The experimental evaluation considers a large audiovisual
corpus with over 60.8 hours of recordings, collected from 105 speakers. The
results demonstrate that the proposed framework leads to absolute improvements
up to 1.2% under practical scenarios over a VAD baseline using only audio
implemented with deep neural network (DNN). The proposed approach achieves
92.7% F1-score when it is evaluated using the sensors from a portable tablet
under noisy acoustic environment, which is only 1.0% lower than the performance
obtained under ideal conditions (e.g., clean speech obtained with a high
definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio
Working Document on Gloss Ontology
This document describes the Gloss Ontology. The ontology and associated class
model are organised into several packages. Section 2 describes each package in
detail, while Section 3 contains a summary of the whole ontology
An Introduction to 3D User Interface Design
3D user interface design is a critical component of any virtual environment (VE) application. In this paper, we present a broad overview of three-dimensional (3D) interaction and user interfaces. We discuss the effect of common VE hardware devices on user interaction, as well as interaction techniques for generic 3D tasks and the use of traditional two-dimensional interaction styles in 3D environments. We divide most user interaction tasks into three categories: navigation, selection/manipulation, and system control. Throughout the paper, our focus is on presenting not only the available techniques, but also practical guidelines for 3D interaction design and widely held myths. Finally, we briefly discuss two approaches to 3D interaction design, and some example applications with complex 3D interaction requirements. We also present an annotated online bibliography as a reference companion to this article
Vision systems with the human in the loop
The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed
Human-Computer Interaction for BCI Games: Usability and User Experience
Brain-computer interfaces (BCI) come with a lot of issues, such as delays, bad recognition, long training times, and cumbersome hardware. Gamers are a large potential target group for this new interaction modality, but why would healthy subjects want to use it? BCI provides a combination of information and features that no other input modality can offer. But for general acceptance of this technology, usability and user experience will need to be taken into account when designing such systems. This paper discusses the consequences of applying knowledge from Human-Computer Interaction (HCI) to the design of BCI for games. The integration of HCI with BCI is illustrated by research examples and showcases, intended to take this promising technology out of the lab. Future research needs to move beyond feasibility tests, to prove that BCI is also applicable in realistic, real-world settings
- âŠ