112 research outputs found
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
Multilingual Adaptation of RNN Based ASR Systems
In this work, we focus on multilingual systems based on recurrent neural
networks (RNNs), trained using the Connectionist Temporal Classification (CTC)
loss function. Using a multilingual set of acoustic units poses difficulties.
To address this issue, we proposed Language Feature Vectors (LFVs) to train
language adaptive multilingual systems. Language adaptation, in contrast to
speaker adaptation, needs to be applied not only on the feature level, but also
to deeper layers of the network. In this work, we therefore extended our
previous approach by introducing a novel technique which we call "modulation".
Based on this method, we modulated the hidden layers of RNNs using LFVs. We
evaluated this approach in both full and low resource conditions, as well as
for grapheme and phone based systems. Lower error rates throughout the
different conditions could be achieved by the use of the modulation.Comment: 5 pages, 1 figure, to appear in 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP 2018
Multi-stage Large Language Model Correction for Speech Recognition
In this paper, we investigate the usage of large language models (LLMs) to
improve the performance of competitive speech recognition systems. Different
from traditional language models that focus on one single data domain, the rise
of LLMs brings us the opportunity to push the limit of state-of-the-art ASR
performance, and at the same time to achieve higher robustness and generalize
effectively across multiple domains. Motivated by this, we propose a novel
multi-stage approach to combine traditional language model re-scoring and LLM
prompting. Specifically, the proposed method has two stages: the first stage
uses a language model to re-score an N-best list of ASR hypotheses and run a
confidence check; The second stage uses prompts to a LLM to perform ASR error
correction on less confident results from the first stage. Our experimental
results demonstrate the effectiveness of the proposed method by showing a 10% ~
20% relative improvement in WER over a competitive ASR system -- across
multiple test domains.Comment: Submitted to ICASSP 202
A Cornucopia of Iridium Nitrogen Compounds Produced from Laser‐Ablated Iridium Atoms and Dinitrogen
The reaction of laser‐ablated iridium atoms with dinitrogen molecules and nitrogen atoms yield several neutral and ionic iridium dinitrogen complexes such as Ir(N2), Ir(N2)+, Ir(N2)2, Ir(N2)2−, IrNNIr, as well as the nitrido complexes IrN, Ir(N)2 and IrIrN. These reaction products were deposited in solid neon, argon and nitrogen matrices and characterized by their infrared spectra. Assignments of vibrational bands are supported by ab initio and first principle calculations as well as 14/15N isotope substitution experiments. The structural and electronic properties of the new dinitrogen and nitrido iridium complexes are discussed. While the formation of the elusive dinitrido complex Ir(N)2 was observed in a subsequent reaction of IrN with N atoms within the cryogenic solid matrices, the threefold coordinated iridium trinitride Ir(N)3 could not be observed so far
Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features
In an increasingly globalized world, there is a rising demand for speech recognition systems. Systems for languages like English, German or French do achieve a decent performance, but there exists a long tail of languages for which such systems do not yet exist. State-of-the-art speech recognition systems feature Deep Neural Networks (DNNs). Being a data driven method and therefore highly dependent on sufficient training data, the lack of resources directly affects the recognition performance. There exist multiple techniques to deal with such resource constraint conditions, one approach is the use of additional data from other languages. In the past, is was demonstrated that multilingually trained systems benefit from adding language feature vectors (LFVs) to the input features, similar to i-Vectors. In this work, we extend this approach by the addition of articulatory features (AFs). We show that AFs also benefit from LFVs and that multilingual system setups benefit from adding both AFs and LFVs. Pretending English to be a low-resource language, we restricted ourselves to use only 10h of English acoustic training data. For system training, we use additional data from French, German and Turkish. By using a combination of AFs and LFVs, we were able to decrease the WER from 18.1% to 17.3% after system combination in our setup using a multilingual phone set
- …