Search CORE

498 research outputs found

Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization

Author: Brinklow Nathan Thanyehténhas
Littell Patrick
Pine Aidan
Richmond Korin
Wells Dan
Publication venue
Publication date: 16/05/2022
Field of study

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Author: Busso Carlos
Tao Fei
Publication venue
Publication date: 12/09/2018
Field of study

Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

arXiv.org e-Print Archive

Dialogue scene detection in movies using low and mid-level visual features

Author: Lehane Bart
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/10/2004
Field of study

This paper describes an approach for detecting dialogue scenes in movies. The approach uses automatically extracted low- and mid-level visual features that characterise the visual content of individual shots, and which are then combined using a state transition machine that models the shot-level temporal characteristics of the scene under investigation. The choice of visual features used is motivated by a consideration of formal film syntax. The system is designed so that the analysis may be applied in order to detect different types of scenes, although in this paper we focus on dialogue sequences as these are the most prevalent scenes in the movies considered to date

Irish Universities

DCU Online Research Access Service

Bibliographic Review on Distributed Kalman Filtering

Author: Khalid Dr. Haris M.
Mahmoud Professor Magdi S.
Publication venue
Publication date: 01/01/2013
Field of study

In recent years, a compelling need has arisen to understand the effects of distributed information structures on estimation and filtering. In this paper, a bibliographical review on distributed Kalman filtering (DKF) is provided.\ud The paper contains a classification of different approaches and methods involved to DKF. The applications of DKF are also discussed and explained separately. A comparison of different approaches is briefly carried out. Focuses on the contemporary research are also addressed with emphasis on the practical applications of the techniques. An exhaustive list of publications, linked directly or indirectly to DKF in the open literature, is compiled to provide an overall picture of different developing aspects of this area

CogPrints Cognitive Sciences Eprint Archive