Search CORE

151 research outputs found

Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition

Author: Bouselmi Ghazi
Fohr Dominique
Illina Irina
Publication venue
Publication date: 27/08/2007
Field of study

In this paper, we present several adaptation methods for non-native speech recognition. We have tested pronunciation modelling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The ``phonetic confusion'' scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our experiments, we have used different combinations of acoustic models representing the canonical and the foreign pronunciations: spoken and native models, models adapted to the non-native accent with MAP and MLLR. The joint use of pronunciation modelling and acoustic adaptation led to further improvements in recognition accuracy. The best combination of the above mentioned techniques resulted in a relative word error reduction ranging from 46% to 71%

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

Author: Fohr Dominique
Illina Irina
Publication venue
Publication date: 02/11/2020
Field of study

The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable. This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features. We propose to perform this through rescoring of the ASR N-best hypotheses list. To achieve this, we train a deep neural network (DNN). Our DNN rescoring model is aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate two types of representations as part of input features to our DNN model: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM mixed with real noise. The proposed rescoring approaches give significant improvement of the WER over the ASR system without rescoring models in two noisy conditions and with n-gram and RNNLM

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

RNN Language Model Estimation for Out-of-Vocabulary Words

Author: Fohr Dominique
Illina Irina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM

Crossref

INRIA a CCSD electronic archive server

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Author: Aletras Nikolaos
Bose Tulika
Fohr Dominique
Illina Irina
Publication venue
Publication date: 18/09/2022
Field of study

State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.Comment: COLING 2022 pre-prin

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Out-of-Vocabulary Word Probability Estimation using RNN Language Model

Author: Fohr Dominique
Illina Irina
Publication venue: HAL CCSD
Publication date: 17/11/2017
Field of study

International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show perplexity reductions of about 14% relative compared to baseline RNNLM

INRIA a CCSD electronic archive server

Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents

Author: Fohr Dominique
Illina Irina
Publication venue: HAL CCSD
Publication date: 27/11/2015
Field of study

International audienceDeveloping high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names

INRIA a CCSD electronic archive server

Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

Author: Essid Slim
Furnon Nicolas
Illina Irina
Serizel Romain
Publication venue
Publication date: 15/06/2021
Field of study

Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene. However, speech enhancement in ad-hoc microphone arrays still raises many challenges. In particular, the algorithms should be able to handle a variable number of microphones, as some devices in the array might appear or disappear. In this paper, we propose a solution that can efficiently process the spatial information captured by the different devices of the microphone array, while being robust to a link failure. To do this, we use an attention mechanism in order to put more weight on the relevant signals sent throughout the array and to neglect the redundant or empty channels

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server