129 research outputs found
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR
In automatic speech recognition (ASR) systems, recurrent neural network
language models (RNNLM) are used to rescore a word lattice or N-best hypotheses
list. Due to the expensive training, the RNNLM's vocabulary set accommodates
only small shortlist of most frequent words. This leads to suboptimal
performance if an input speech contains many out-of-shortlist (OOS) words. An
effective solution is to increase the shortlist size and retrain the entire
network which is highly inefficient. Therefore, we propose an efficient method
to expand the shortlist set of a pretrained RNNLM without incurring expensive
retraining and using additional training data. Our method exploits the
structure of RNNLM which can be decoupled into three parts: input projection
layer, middle layers, and output projection layer. Specifically, our method
expands the word embedding matrices in projection layers and keeps the middle
layers unchanged. In this approach, the functionality of the pretrained RNNLM
will be correctly maintained as long as OOS words are properly modeled in two
embedding spaces. We propose to model the OOS words by borrowing linguistic
knowledge from appropriate in-shortlist words. Additionally, we propose to
generate the list of OOS words to expand vocabulary in unsupervised manner by
automatically extracting them from ASR output.Comment: 5 pages, 1 figure, accepted at INTERSPEECH 201
Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin
This paper presents the work of restoring punctuation for ASR transcripts
generated by multilingual ASR systems. The focus languages are English,
Mandarin, and Malay which are three of the most popular languages in Singapore.
To the best of our knowledge, this is the first system that can tackle
punctuation restoration for these three languages simultaneously. Traditional
approaches usually treat the task as a sequential labeling task, however, this
work adopts a slot-filling approach that predicts the presence and type of
punctuation marks at each word boundary. The approach is similar to the
Masked-Language Model approach employed during the pre-training stages of BERT,
but instead of predicting the masked word, our model predicts masked
punctuation. Additionally, we find that using Jieba1 instead of only using the
built-in SentencePiece tokenizer of XLM-R can significantly improve the
performance of punctuating Mandarin transcripts. Experimental results on
English and Mandarin IWSLT2022 datasets and Malay News show that the proposed
approach achieved state-of-the-art results for Mandarin with 73.8% F1-score
while maintaining a reasonable F1-score for English and Malay, i.e. 74.7% and
78% respectively. Our source code that allows reproducing the results and
building a simple web-based application for demonstration purposes is available
on Github
Codec Data Augmentation for Time-domain Heart Sound Classification
Heart auscultations are a low-cost and effective way of detecting valvular
heart diseases early, which can save lives. Nevertheless, it has been difficult
to scale this screening method since the effectiveness of auscultations is
dependent on the skill of doctors. As such, there has been increasing research
interest in the automatic classification of heart sounds using deep learning
algorithms. However, it is currently difficult to develop good heart sound
classification models due to the limited data available for training. In this
work, we propose a simple time domain approach, to the heart sound
classification problem with a base classification error rate of 0.8 and show
that augmentation of the data through codec simulation can improve the
classification error rate to 0.2. With data augmentation, our approach
outperforms the existing time-domain CNN-BiLSTM baseline model. Critically, our
experiments show that codec data augmentation is effective in getting around
the data limitation.Comment: Accepted by ICAICTA 202
Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation
The neural language models (NLM) achieve strong generalization capability by
learning the dense representation of words and using them to estimate
probability distribution function. However, learning the representation of rare
words is a challenging problem causing the NLM to produce unreliable
probability estimates. To address this problem, we propose a method to enrich
representations of rare words in pre-trained NLM and consequently improve its
probability estimation performance. The proposed method augments the word
embedding matrices of pre-trained NLM while keeping other parameters unchanged.
Specifically, our method updates the embedding vectors of rare words using
embedding vectors of other semantically and syntactically similar words. To
evaluate the proposed method, we enrich the rare street names in the
pre-trained NLM and use it to rescore 100-best hypotheses output from the
Singapore English speech recognition system. The enriched NLM reduces the word
error rate by 6% relative and improves the recognition accuracy of the rare
words by 16% absolute as compared to the baseline NLM.Comment: 5 pages, 2 figures, accepted to INTERSPEECH 201
- …