129 research outputs found

    Applications of nonlinear filters with the linear-in-the-parameter structure

    Get PDF

    Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

    Full text link
    In automatic speech recognition (ASR) systems, recurrent neural network language models (RNNLM) are used to rescore a word lattice or N-best hypotheses list. Due to the expensive training, the RNNLM's vocabulary set accommodates only small shortlist of most frequent words. This leads to suboptimal performance if an input speech contains many out-of-shortlist (OOS) words. An effective solution is to increase the shortlist size and retrain the entire network which is highly inefficient. Therefore, we propose an efficient method to expand the shortlist set of a pretrained RNNLM without incurring expensive retraining and using additional training data. Our method exploits the structure of RNNLM which can be decoupled into three parts: input projection layer, middle layers, and output projection layer. Specifically, our method expands the word embedding matrices in projection layers and keeps the middle layers unchanged. In this approach, the functionality of the pretrained RNNLM will be correctly maintained as long as OOS words are properly modeled in two embedding spaces. We propose to model the OOS words by borrowing linguistic knowledge from appropriate in-shortlist words. Additionally, we propose to generate the list of OOS words to expand vocabulary in unsupervised manner by automatically extracting them from ASR output.Comment: 5 pages, 1 figure, accepted at INTERSPEECH 201

    Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin

    Full text link
    This paper presents the work of restoring punctuation for ASR transcripts generated by multilingual ASR systems. The focus languages are English, Mandarin, and Malay which are three of the most popular languages in Singapore. To the best of our knowledge, this is the first system that can tackle punctuation restoration for these three languages simultaneously. Traditional approaches usually treat the task as a sequential labeling task, however, this work adopts a slot-filling approach that predicts the presence and type of punctuation marks at each word boundary. The approach is similar to the Masked-Language Model approach employed during the pre-training stages of BERT, but instead of predicting the masked word, our model predicts masked punctuation. Additionally, we find that using Jieba1 instead of only using the built-in SentencePiece tokenizer of XLM-R can significantly improve the performance of punctuating Mandarin transcripts. Experimental results on English and Mandarin IWSLT2022 datasets and Malay News show that the proposed approach achieved state-of-the-art results for Mandarin with 73.8% F1-score while maintaining a reasonable F1-score for English and Malay, i.e. 74.7% and 78% respectively. Our source code that allows reproducing the results and building a simple web-based application for demonstration purposes is available on Github

    Codec Data Augmentation for Time-domain Heart Sound Classification

    Full text link
    Heart auscultations are a low-cost and effective way of detecting valvular heart diseases early, which can save lives. Nevertheless, it has been difficult to scale this screening method since the effectiveness of auscultations is dependent on the skill of doctors. As such, there has been increasing research interest in the automatic classification of heart sounds using deep learning algorithms. However, it is currently difficult to develop good heart sound classification models due to the limited data available for training. In this work, we propose a simple time domain approach, to the heart sound classification problem with a base classification error rate of 0.8 and show that augmentation of the data through codec simulation can improve the classification error rate to 0.2. With data augmentation, our approach outperforms the existing time-domain CNN-BiLSTM baseline model. Critically, our experiments show that codec data augmentation is effective in getting around the data limitation.Comment: Accepted by ICAICTA 202

    Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

    Full text link
    The neural language models (NLM) achieve strong generalization capability by learning the dense representation of words and using them to estimate probability distribution function. However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates. To address this problem, we propose a method to enrich representations of rare words in pre-trained NLM and consequently improve its probability estimation performance. The proposed method augments the word embedding matrices of pre-trained NLM while keeping other parameters unchanged. Specifically, our method updates the embedding vectors of rare words using embedding vectors of other semantically and syntactically similar words. To evaluate the proposed method, we enrich the rare street names in the pre-trained NLM and use it to rescore 100-best hypotheses output from the Singapore English speech recognition system. The enriched NLM reduces the word error rate by 6% relative and improves the recognition accuracy of the rare words by 16% absolute as compared to the baseline NLM.Comment: 5 pages, 2 figures, accepted to INTERSPEECH 201
    • …
    corecore