8 research outputs found

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Automatic Speech Recognition for Low-Resource and Morphologically Complex Languages

    Get PDF
    The application of deep neural networks to the task of acoustic modeling for automatic speech recognition (ASR) has resulted in dramatic decreases of word error rates, allowing for the use of this technology in smart phones and personal home assistants in high-resource languages. Developing ASR models of this caliber, however, requires hundreds or thousands of hours of transcribed speech recordings, which presents challenges for most of the world’s languages. In this work, we investigate the applicability of three distinct architectures that have previously been used for ASR in languages with limited training resources. We tested these architectures using publicly available ASR datasets for several typologically and orthographically diverse languages, whose data was produced under a variety of conditions using different speech collection strategies, practices, and equipment. Additionally, we performed data augmentation on this audio, such that the amount of data could increase nearly tenfold, synthetically creating higher resource training. The architectures and their individual components were modified, and parameters explored such that we might find a best-fit combination of features and modeling schemas to fit a specific language morphology. Our results point to the importance of considering language-specific and corpus-specific factors and experimenting with multiple approaches when developing ASR systems for resource-constrained languages

    A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings

    Get PDF
    AbstractWe train neural networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization

    Using weighted model averaging in distributed multilingual DNNs to improve low resource ASR

    No full text
    © 2016 The Authors. Multilingual Deep Neural Networks (DNNs) have been successfully used to leverage out-of-language data to boost the performance of a low resource ASR. However, the mismatch between auxiliary source languages and the target language can leave a negative effect on acoustic modeling for the target language. Thus, a key challenge in multilingual DNNs is to exploit acoustic data from multiple donor languages to improve on ASR performance while mitigating the problem of language mismatch. In this paper, we propose to employ weighted model averaging in the framework of distributed multilingual DNN which allows the target language or similar languages to take higher weights during the multilingual DNN training, and consequently shift the parameters towards the acoustic space of target data. Furthermore, we utilize the same strategy in the adaptation phase where a conventional multilingual DNN is the starting point and retraining is applied using all languages with different weights. The experiments with four languages from the GlobalPhone dataset show that the recognition performances in both scenarios are improved. The latter, moreover, provides a low-cost and efficient methodology for multilingual DNNs.Sahraeian R., Van Compernolle D., ''Using weighted model averaging in distributed multilingual DNNs to improve low resource ASR'', Procedia computer science, vol. 81, pp. 152-158, 2016 (5th international workshop on spoken language technologies for under-resourced languages - SLTU 2016, May 9-12, 2016, Yogyakarta, Indonesia).status: publishe

    Parallel Speech Collection for Under-resourced Language Studies Using the Lig-Aikuma Mobile Device App

    Get PDF
    International audienceThis paper reports on our ongoing efforts to collect speech data in under-resourced or endangered languages of Africa. Data collection is carried out using an improved version of the Android application Aikuma developed by Steven Bird and colleagues 1. Features were added to the app in order to facilitate the collection of parallel speech data in line with the requirements of the French-German ANR/DFG BULB (Breaking the Unwritten Language Barrier) project. The resulting app, called Lig-Aikuma, runs on various mobile phones and tablets and proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). Lig-Aikuma's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping. It was used for field data collections in Congo-Brazzaville resulting in a total of over 80 hours of speech. Design issues of the mobile app as well as the use of Lig-Aikuma during two recording campaigns, are further described in this paper
    corecore