Search CORE

259 research outputs found

Far-Field Voice Activity Detection and Its Applications in Adverse Acoustic Environments

Author: Petsatodis Theodoros
Publication venue
Publication date: 01/01/2012
Field of study

Complete-linkage clustering for voice activity detection in audio and visual speech

Author: Dean David
Fookes Clinton
Ghaemmaghami Houman
Kalantari Shahram
Sridharan Sridha
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2015
Field of study

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features

Queensland University of Technology ePrints Archive

Voice Activity Detection. Fundamentals and Speech Recognition System Robustness

Author: J. C. Segura
J. M. Gorriz
J. Ramirez
Publication venue: 'IntechOpen'
Publication date: 01/01/2007
Field of study

IntechOpen

CiteSeerX

Audio-assisted movie dialogue detection

Author: Kotropoulos C.
Kotropoulos C.
Kotti M.
Kotti M.
Maragos P.
Maragos P.
Panagakis Y.
Panagakis Y.
Pitas I.
Pitas I.
Ververidis D.
Ververidis D.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2008
Field of study

An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the crosscorrelation and the magnitude of the corresponding the crosspower spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptrons, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported

Middlesex University Research Repository

Decision fusion of voice activity detectors

Author: Nasibov Zaur
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications

Audio-assisted movie dialogue detection

Author: Evangelopoulos G
Kotropoulos C
Kotti M
Maragos P
Panagakis I
Pitas I
Ververidis D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE

Crossref

Middlesex University Research Repository

DSpace at NTUA

Spiral - Imperial College Digital Repository

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive