151 research outputs found
Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition
In this paper, we present several adaptation methods for non-native speech
recognition. We have tested pronunciation modelling, MLLR and MAP non-native
pronunciation adaptation and HMM models retraining on the HIWIRE foreign
accented English speech database. The ``phonetic confusion'' scheme we have
developed consists in associating to each spoken phone several sequences of
confused phones. In our experiments, we have used different combinations of
acoustic models representing the canonical and the foreign pronunciations:
spoken and native models, models adapted to the non-native accent with MAP and
MLLR. The joint use of pronunciation modelling and acoustic adaptation led to
further improvements in recognition accuracy. The best combination of the above
mentioned techniques resulted in a relative word error reduction ranging from
46% to 71%
DNN-Based Semantic Model for Rescoring N-best Speech Recognition List
The word error rate (WER) of an automatic speech recognition (ASR) system
increases when a mismatch occurs between the training and the testing
conditions due to the noise, etc. In this case, the acoustic information can be
less reliable. This work aims to improve ASR by modeling long-term semantic
relations to compensate for distorted acoustic features. We propose to perform
this through rescoring of the ASR N-best hypotheses list. To achieve this, we
train a deep neural network (DNN). Our DNN rescoring model is aimed at
selecting hypotheses that have better semantic consistency and therefore lower
WER. We investigate two types of representations as part of input features to
our DNN model: static word embeddings (from word2vec) and dynamic contextual
embeddings (from BERT). Acoustic and linguistic features are also included. We
perform experiments on the publicly available dataset TED-LIUM mixed with real
noise. The proposed rescoring approaches give significant improvement of the
WER over the ASR system without rescoring models in two noisy conditions and
with n-gram and RNNLM
RNN Language Model Estimation for Out-of-Vocabulary Words
International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection
State-of-the-art approaches for hate-speech detection usually exhibit poor
performance in out-of-domain settings. This occurs, typically, due to
classifiers overemphasizing source-specific information that negatively impacts
its domain invariance. Prior work has attempted to penalize terms related to
hate-speech from manually curated lists using feature attribution methods,
which quantify the importance assigned to input terms by the classifier when
making a prediction. We, instead, propose a domain adaptation approach that
automatically extracts and penalizes source-specific terms using a domain
classifier, which learns to differentiate between domains, and
feature-attribution scores for hate-speech classes, yielding consistent
improvements in cross-domain evaluation.Comment: COLING 2022 pre-prin
Out-of-Vocabulary Word Probability Estimation using RNN Language Model
International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show perplexity reductions of about 14% relative compared to baseline RNNLM
Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents
International audienceDeveloping high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names
Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes
Speech enhancement promises higher efficiency in ad-hoc microphone arrays
than in constrained microphone arrays thanks to the wide spatial coverage of
the devices in the acoustic scene. However, speech enhancement in ad-hoc
microphone arrays still raises many challenges. In particular, the algorithms
should be able to handle a variable number of microphones, as some devices in
the array might appear or disappear. In this paper, we propose a solution that
can efficiently process the spatial information captured by the different
devices of the microphone array, while being robust to a link failure. To do
this, we use an attention mechanism in order to put more weight on the relevant
signals sent throughout the array and to neglect the redundant or empty
channels
- …