21,422 research outputs found

    End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

    Full text link
    Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

    Working Document on Gloss Ontology

    Get PDF
    This document describes the Gloss Ontology. The ontology and associated class model are organised into several packages. Section 2 describes each package in detail, while Section 3 contains a summary of the whole ontology

    An Introduction to 3D User Interface Design

    Get PDF
    3D user interface design is a critical component of any virtual environment (VE) application. In this paper, we present a broad overview of three-dimensional (3D) interaction and user interfaces. We discuss the effect of common VE hardware devices on user interaction, as well as interaction techniques for generic 3D tasks and the use of traditional two-dimensional interaction styles in 3D environments. We divide most user interaction tasks into three categories: navigation, selection/manipulation, and system control. Throughout the paper, our focus is on presenting not only the available techniques, but also practical guidelines for 3D interaction design and widely held myths. Finally, we briefly discuss two approaches to 3D interaction design, and some example applications with complex 3D interaction requirements. We also present an annotated online bibliography as a reference companion to this article

    Vision systems with the human in the loop

    Get PDF
    The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed

    Human-Computer Interaction for BCI Games: Usability and User Experience

    Get PDF
    Brain-computer interfaces (BCI) come with a lot of issues, such as delays, bad recognition, long training times, and cumbersome hardware. Gamers are a large potential target group for this new interaction modality, but why would healthy subjects want to use it? BCI provides a combination of information and features that no other input modality can offer. But for general acceptance of this technology, usability and user experience will need to be taken into account when designing such systems. This paper discusses the consequences of applying knowledge from Human-Computer Interaction (HCI) to the design of BCI for games. The integration of HCI with BCI is illustrated by research examples and showcases, intended to take this promising technology out of the lab. Future research needs to move beyond feasibility tests, to prove that BCI is also applicable in realistic, real-world settings
    • 

    corecore