8 research outputs found
Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning
Almost none of the 2,000+ languages spoken in Africa have widely available
automatic speech recognition systems, and the required data is also only
available for a few languages. We have experimented with two techniques which
may provide pathways to large vocabulary speech recognition for African
languages: multilingual modeling and self-supervised learning. We gathered
available open source data and collected data for 15 languages, and trained
experimental models using these techniques. Our results show that pooling the
small amounts of data available in multilingual end-to-end models, and
pre-training on unsupervised data can help improve speech recognition quality
for many African languages
Speech recognition with probabilistic transcriptions and end-to-end systems using deep learning
In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contrasting tasks characterized by the amounts of labeled data available for training. In the first half, we deal with scenarios when there are limited or no labeled data for training ASR systems. This situation is commonly prevalent in languages which are under-resourced. However, in the second half, we train ASR systems with large amounts of labeled data in English. Our objective is to improve modern end-to-end (E2E) ASR using attention modeling. Thus, the two primary contributions of this thesis are the following:
Cross-Lingual Speech Recognition in Under-Resourced Scenarios:
A well-resourced language is a language with an abundance of resources to support the development of speech technology. Those resources are usually defined in terms of 100+ hours of speech data, corresponding transcriptions, pronunciation dictionaries, and language models. In contrast, an under-resourced language lacks one or more of these resources. The most expensive and time-consuming resource is the acquisition of transcriptions due to the difficulty in finding native transcribers. The first part of the thesis proposes methods by which deep neural networks (DNNs) can be trained when there are limited or no transcribed data in the target language. Such scenarios are common for languages which are under-resourced.
Two key components of this proposition are Transfer Learning and Crowdsourcing. Through these methods, we demonstrate that it is possible to borrow statistical knowledge of acoustics from a variety of other well-resourced languages to learn the parameters of a the DNN in the target under-resourced language. In particular, we use well-resourced languages as cross-entropy regularizers to improve the generalization capacity of the target language. A key accomplishment of this study is that it is the first to train DNNs using noisy labels in the target language transcribed by non-native speakers available in online marketplaces.
End-to-End Large Vocabulary Automatic Speech Recognition:
Recent advances in ASR have been mostly due to the advent of deep learning models. Such models have the ability to discover complex non-linear relationships between attributes that are usually found in real-world tasks. Despite these advances, building a conventional ASR system is a cumbersome procedure since it involves optimizing several components separately in a disjoint fashion. To alleviate this problem, modern ASR systems have adopted a new approach of directly transducing speech signals to text. Such systems are known as E2E systems and one such system is the Connectionist Temporal Classification (CTC). However, one drawback of CTC is the hard alignment problem as it relies only on the current input to generate the current output. In reality, the output at the current time is influenced not only by the current input but also by inputs in the past and the future.
Thus, the second part of the thesis proposes advancing state-of-the-art E2E speech recognition for large corpora by directly incorporating attention modeling within the CTC framework. In attention modeling, inputs in the current, past, and future are distinctively weighted depending on the degree of influence they exert on the current output. We accomplish this by deriving new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we extract more reliable content information from a network representing an implicit language model. Finally, we used vector based attention weights that are applied on context vectors across both time and their individual components. A key accomplishment of this study is that it is the first to incorporate attention directly within the CTC network. Furthermore, we show that our proposed attention-based CTC model, even in the absence of an explicit language model, is able to achieve lower word error rates than a well-trained conventional ASR system equipped with a strong external language model
Selected papers from the 49th Annual Conference on African Linguistics
Descriptive and Theoretical Approaches to African Linguistics contains a selection of revised and peer-reviewed papers from the 49th Annual Conference on African Linguistics, held at Michigan State University in 2018. The contributions from both students and more senior scholars, based in North America, Africa and other parts of the world, provide a glimpse of the breadth and quality of current research in African linguistics from both descriptive and theoretical perspectives. Fields of interest range from phonetics, phonology, morphology, syntax, semantics to sociolinguistics, historical linguistics, discourse analysis, language documentation, computational linguistics and beyond. The articles reflect both the typological and genetic diversity of languages in Africa and the wide range of research areas covered by presenters at ACAL conferences
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
Celebrating 50 years of ACAL
The papers in this volume were presented at the 50th Annual Conference on African Linguistics held at the University of British Columbia in 2019. The contributions span a range of theoretical topics as well as topics in descriptive and applied linguistics. The papers reflect the typological and genetic diversity of languages in Africa and also represent the breadth of the ACAL community, with papers from both students and more senior scholars, based in North America and beyond. They thus provide a snapshot on current research in African linguistics, from multiple perspectives. To mark the 50th anniversary of the conference, the volume editors reminisce, in the introductory chapter, about their memorable ACALs
Celebrating 50 years of ACAL
The papers in this volume were presented at the 50th Annual Conference on African Linguistics held at the University of British Columbia in 2019. The contributions span a range of theoretical topics as well as topics in descriptive and applied linguistics. The papers reflect the typological and genetic diversity of languages in Africa and also represent the breadth of the ACAL community, with papers from both students and more senior scholars, based in North America and beyond. They thus provide a snapshot on current research in African linguistics, from multiple perspectives. To mark the 50th anniversary of the conference, the volume editors reminisce, in the introductory chapter, about their memorable ACALs
Recommended from our members
Federal Register
Daily publication of the U.S. Office of the Federal Register contains rules and regulations, proposed legislation and rule changes, and other notices, including "Presidential proclamations and Executive Orders, Federal agency documents having general applicability and legal effect, documents required to be published by act of Congress, and other Federal agency documents of public interest" (p. ii). Table of Contents starts on page iii