Search CORE

44 research outputs found

Robust excitation-based features for Automatic Speech Recognition

Author: Chen L
Chen X
Drugman T
Gales MJF
Stylianou Y
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/01/2015
Field of study

In this paper we investigate the use of robust to noise features characterizing the speech excitation signal as complementary features to the usually considered vocal tract based features for automatic speech recognition (ASR). The features are tested in a state-of-the-art Deep Neural Network (DNN) based hybrid acoustic model for speech recognition. The suggested excitation features expands the set of excitation features previously considered for ASR, expecting that these features help in a better discrimination of the broad phonetic classes (e.g., fricatives, nasal, vowels, etc.). Relative improvements in the word error rate are observed in the AMI meeting transcription system with greater gains (about 5%) if PLP features are combined with the suggested excitation features. For Aurora 4, significant improvements are observed as well. Combining the suggested excitation features with filter banks, a word error rate of 9.96% is achieved.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717885

CiteSeerX

Crossref

Apollo (Cambridge)

CUED - Cambridge University Engineering Department

Relevant Feature Selection for Audio-Visual Speech Recognition

Author: Drugman T.
Gurban M.
Thiran Jean-Philippe
Publication venue
Publication date: 01/01/2007
Field of study

We present a feature selection method based on information theoretic measures, targeted at multimodal signal processing, showing how we can quantitatively assess the relevance of features from different modalities. We are able to find the features with the highest amount of information relevant for the recognition task, and at the same having minimal redundancy. Our application is audio- visual speech recognition, and in particular selecting relevant visual features. Experimental results show that our method outperforms other feature selection algorithms from the literature by improving recognition accuracy even with a significantly reduced number of features

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

Author: A Borthwick
A Ratnaparkhi
AL Berger
AW Black
B Picart
CJ Leggetter
Fahimeh Bahmaninezhad
H Kawahara
H Liang
H Zen
H Zen
H Zen
H Zen
H Zen
H Zen
Hossein Sameti
J Ghomeshi
J Nocedal
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
J Yamagishi
JJ Odell
K Hashimoto
K Hashimoto
K Oura
K Shinoda
K Tokuda
K Tokuda
K Tokuda
K Yu
K Yu
L Qin
M Bijankhan
M Gibson
MJ Gales
R Kubichek
S Sakai
S Takaki
S Takaki
Simon King
SJ Young
Soheil Khorram
T Drugman
T Drugman
T Koriyama
T Toda
T Toda
T Yoshimura
T Yoshimura
Thomas Drugman
V Rangarajan
VV Digalakis
Y Qian
YJ Wu
YJ Wu
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Edinburgh Research Explorer

Efficient GCI detection for efficient sparse linear prediction

Author: A. Turiel
A. Turiel
A. Turiel
D. Giacobello
D. Meng
E. Denoel
E.J. Candès
N. Hurley
T. Drugman
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceWe propose a unified non-linear approach that offers an ef- ficient closed-form solution for the problem of sparse linear prediction analysis. The approach is based on our previous work for minimization of the weighted l2 -norm of the prediction error. The weighting of the l2 -norm is done in a way that less emphasis is given to the prediction error around the Glottal Closure Instants (GCI) as they are expected to attain the largest values of error and hence, the resulting cost function approaches the ideal l0 -norm cost function for sparse residual recovery. As such, the method requires knowledge of the GCIs. In this paper we use our recently developed GCI detection algorithm which is particularly suitable for this problem as it does not rely on residuals themselves for detection of GCIs. We show that our GCI detection algorithm provides slightly better sparsity properties in comparison to a recent powerful GCI detection algorithm. Moreover, as the computational cost of our GCI detection algorithm is quite low, the computational cost of the overall solution is considerably lower

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Fundamental frequency estimation of low-quality electroglottographic signals

Author: Ananthapadmanabha
ANSI
Babacan
Baken
Baken
Bergan
Bergé
Boersma
Camacho
Cheng
Childers
Christian T. Herbst
Deliyski
Drugman
Drugman
Eaton
Fabre
Fischer
Fitch
Fitch
Fletcher
Friedrich
Hagmüller
Hampala
Henrich
Henrich
Herbst
Hertegard
Herzel
Hess
Jacob C. Dunn
Jang
Jones
Kadambe
Kane
Kawahara
Koike
Kounoudes
Manfredi
Naylor
Owren
Parsa
Rabiner
Roark
Rossing
Talkin
Thomas
Thomas
Titze
Titze
Titze
Titze
Titze
Tsanas
Tuan
Young
Publication venue: 'Elsevier BV'
Publication date: 01/06/2018
Field of study

Fundamental frequency (fo) is often estimated based on electroglottographic (EGG) signals. Due to the nature of the method, the quality of EGG signals may be impaired by certain features like amplitude or baseline drifts, mains hum or noise. The potential adverse effects of these factors on fo estimation has to date not been investigated. Here, the performance of thirteen algorithms for estimating fo was tested, based on 147 synthesized EGG signals with varying degrees of signal quality deterioration. Algorithm performance was assessed through the standard deviation σfo of the difference between known and estimated fo data, expressed in octaves. With very few exceptions, simulated mains hum, and amplitude and baseline drifts did not influence fo results, even though some algorithms consistently outperformed others. When increasing either cycle-to-cycle fo variation or the degree of subharmonics, the SIGMA algorithm had the best performance (max. σfo = 0.04). That algorithm was however more easily disturbed by typical EGG equipment noise, whereas the NDF and Praat's auto-correlation algorithms performed best in this category (σfo = 0.01). These results suggest that the algorithm for fo estimation of EGG signals needs to be selected specifically for each particular data set. Overall, estimated fo data should be interpreted with care

Crossref

Anglia Ruskin Research