Search CORE

165 research outputs found

Automatic Speech Recognition: the New Millennium

Author: Daoudi Khalid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2002
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceWe present a new approach to automatic speech recognition (ASR) based on the formalism of Bayesian networks. We put the foundations of new ASR systems for which the robustness relies on the fidelity in speech modeling and on the information contained in training data

INRIA a CCSD electronic archive server

HAL-Rennes 1

Nouveaux paradigmes en traitement de la parole et de ses troubles

Author: Daoudi Khalid
Publication venue: HAL CCSD
Publication date: 17/03/2021
Field of study

INRIA a CCSD electronic archive server

Unsupervised Stream-Weights Computation in Classification and Recognition Tasks

Author: Daoudi Khalid
Potamianos Alexandros
Sanchez-Soto E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

International audienceIn this paper, we provide theoretical results on the problem of optimal stream weight selection for the multi-stream classi- fication problem. It is shown, that in the presence of estimation or modeling errors using stream weights can decrease the total classification error. Stream weight estimates are computed for various conditions. Then we turn our attention to the problem of unsupervised stream weights computation. Based on the theoretical results we propose to use models and “anti-models” (class- specific background models) to estimate stream weights. A non-linear function of the ratio of the inter- to intra-class distance is used for stream weight estimation. The proposed unsupervised stream weight estimation algorithm is evaluated on both artificial data and on the problem of audio-visual speech classification. Finally the proposed algorithm is extended to the problem of audio- visual speech recognition. It is shown that the proposed algorithms achieve results comparable to the supervised minimum-error training approach under most testing conditions

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-Rennes 1

On classification between normal and pathological voices using the MEEI-KayPENTAX database: Issues and consequences

Author: Bertrac Blaise
Daoudi Khalid
Publication venue: HAL CCSD
Publication date: 14/09/2014
Field of study

International audienceA large amount of research in pathological voice classification consider the task of feature extraction for discrimination between normal and dysphonic sustained vowels. The most widely used dataset for this purpose is the Massachusetts Eye \& Ear Infirmary (MEEI) Voice Disorders Database commercialized by KayPENTAX Corp. During the last two decades, dozens of methods have been proposed to extract discriminative features from these signals in order to design accurate classifiers between the two classes of this database. The main contribution of this paper is to show that the normal and dysphonic sustained vowels of the KayPENTAX database are actually perfectly separable. This implies that this dataset is not suited for the normal-vs-dysphonic classification task, as long as the only concern is to achieve high classification accuracy. Indeed, we show that a single scalar parameter extracted from a matching pursuit decomposition of these signals (with a Gabor dictionary) yields a prefect classification accuracy (100 \% with a large margin). We then discuss the implication of this finding on the precaution that should be taken with this database and on research in pathological voice detection in general. \end{abstract} \noindent{\bf Index Terms}: Pathological voice classification, speech perturbation measure, dysphonia, matching pursuit, MEEI-KayPENTAX Voice Disorders Database

INRIA a CCSD electronic archive server

Large Margin GMM for discriminative speaker verifi cation

Author: Aboutajdine Driss
André-Obrecht Régine
Daoudi Khalid
Jourani Reda
Publication venue: Springer Verlag
Publication date: 01/01/2012
Field of study

International audienceGaussian mixture models (GMM), trained using the generative cri- terion of maximum likelihood estimation, have been the most popular ap- proach in speaker recognition during the last decades. This approach is also widely used in many other classi cation tasks and applications. Generative learning in not however the optimal way to address classi cation problems. In this paper we rst present a new algorithm for discriminative learning of diagonal GMM under a large margin criterion. This algorithm has the ma- jor advantage of being highly e cient, which allow fast discriminative GMM training using large scale databases. We then evaluate its performances on a full NIST speaker veri cation task using NIST-SRE'2006 data. In particular, we use the popular Symmetrical Factor Analysis (SFA) for session variability compensation. The results show that our system outperforms the state-of-the- art approaches of GMM-SFA and the SVM-based one, GSL-NAP. Relative reductions of the Equal Error Rate of about 9.33% and 14.88% are respec- tively achieved over these systems

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-Rennes 1

Nouveau noyau de séquences pour la vérification du locuteur

Author: DAOUDI Khalid
LOURADOUR Jérome
Publication venue: GRETSI, Groupe d’Etudes du Traitement du Signal et des Images
Publication date: 01/01/2005
Field of study

En utilisant la théorie des Espaces de Hilbert à Noyau Reproduisant, nous concevons un nouveau noyau de séquences, qui mesure la similarité entre deux séquences d'observations. Nous appliquons ce noyau à une tâche de vérification du locuteur (campagne d'évaluation NIST 2004). Les résultats montrent qu'incorporer notre nouveau noyau de séquences dans une architecture SVM non seulement fournit des résultats bien meilleurs qu'un classifieur UBM-GMM de base, mais aussi donne de meilleures performances que le classifieur utilisant un noyau GLDS (Generalized Linear Discriminant Séquence kernel). De plus, notre noyau opère dans un espace de plus faible dimension, tout en permettant un large choix de noyaux

I-Revues

Efficient multipulse approximation of speech excitation using the most singular manifold

Author: Khalid Daoudi
Khanagha Vahid
Publication venue: HAL CCSD
Publication date: 02/04/2012
Field of study

INTERSPEECH 2012We propose a novel approach to find the locations of the multipulse sequence that approximates the speech source excitation. This approach is based on the notion of Most Singular Manifold (MSM) which is associated to the set of less predictable events. The MSM is formed by identifying (directly from the speech waveform) multiscale singularities which may correspond to significant impulsive excitations of the vocal tract. This identification is done through a multiscale measure of local predictability and the estimation of its associated singularity exponents. Once the pulse locations are found using the MSM, their amplitudes are computed using the second stage of the classical MultiPulse Excitation (MPE) coder. The multipulse sequence is then fed to the classical LPC synthesizer to reconstruct speech. The resulting MSM-based algorithm is shown to be significantly more efficient than MPE. We evaluate our algorithm using 1 hour of speech from the TIMIT database and compare its performances to MPE and a recent approach based on compressed sensing (CS). The results show that our algorithm yields similar perceptual quality as MPE and outperforms the CS method when the number of pulses is low

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Fast training of Large Margin diagonal Gaussian mixture models for speaker identification

Author: Aboutajdine Driss
André-Obrecht Régine
Daoudi Khalid
Jourani Reda
Publication venue: HAL CCSD
Publication date: 18/05/2011
Field of study

International audienceGaussian mixture models (GMM) have been widely and successfully used in speaker recognition during the last decades. They are generally trained using the generative criterion of maximum likelihood estimation. In an earlier work, we proposed an algorithm for discriminative training of GMM with diagonal covariances under a large margin criterion. In this paper, we present a new version of this algorithm which has the major advantage of being computationally highly efficient. The resulting algorithm is thus well suited to handle large scale databases. We carry out experiments on a speaker identification task using NIST-SRE'2006 data and compare our new algorithm to the baseline generative GMM using different GMM sizes. The results show that our system significantly outperforms the baseline GMM in all configurations, and with high computational efficiency

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-Rennes 1

Combination of SVM and Large Margin GMM modeling for speaker identification

Author: Aboutajdine Driss
André-Obrecht Régine
Daoudi Khalid
Jourani Reda
Publication venue: HAL CCSD
Publication date: 09/09/2013
Field of study

International audienceMost state-of-the-art speaker recognition systems are partially or completely based on Gaussian mixture models (GMM). GMM have been widely and successfully used in speaker recognition during the last decades. They are traditionally estimated from a world model using the generative criterion of Maximum A Posteriori. In an earlier work, we proposed an efficient algorithm for discriminative learning of GMM with diagonal covariances under a large margin criterion. In this paper, we evaluate the combination of the large margin GMM modeling approach with SVM in the setting of speaker identification. We carry out a full NIST speaker identification task using NIST-SRE'2006 data, in a Symmetrical Factor Analysis compensation scheme. The results show that the two modeling approaches are complementary and that their combination outperforms their single use

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-Rennes 1

Speaker verification using Large Margin GMM discriminative training

Author: Aboutajdine Driss
André-Obrecht Régine
Daoudi Khalid
Jourani Reda
Publication venue: HAL CCSD
Publication date: 07/04/2011
Field of study

International audienceGaussian mixture models (GMM) have been widely and successfully used in speaker recognition during the last decades. They are generally trained using the generative criterion of maximum likelihood estimation. In an earlier work, we proposed an algorithm for discriminative training of GMM with diagonal covariances under a large margin criterion. In this paper, we present a new version of this algorithm which has the major advantage of being computationally highly efficient. The resulting algorithm is thus well suited to handle large scale databases. To show the effectiveness of the new algorithm, we carry out a full NIST speaker verification task using NISTSRE' 2006 data. The results show that our system outperforms the baseline GMM, and with high computational efficiency

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server