Search CORE

516 research outputs found

Non-Native Pronunciation Variation Modeling for Automatic Speech Recognition

Author: Hong Kook Kim
Mina Kim
Yoo Rhee Oh
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

Grammaticalization and phonological reidentification in White Hmong

Author: White Nathan
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2021
Field of study

The “dynamic coevolution of meaning and form” of Bybee et al. ( 1994 : 20) has been the subject of significant discussion as regards the languages of Mainland Southeast Asia. However, little work has focused on the mechanisms through which this coevolution occurs when it does surface in these languages. The current work considers phonological reidentification resulting from phonetic reduction in White Hmong (Hmong-Mien, Laos) involving four morphemes, ntshai/ntshe ‘maybe’, saib/seb ‘see if/whether; COMP.CFACT’, puag/pug ‘LOCL;INTS’, and niaj/nej ‘each, every’. These morphemes exhibit an alternation where a rime is phonologically reidentified in a manner consistent with typical phonetic underarticulation patterns, such that an exemplar-model approach (Pierrehumbert 2001 , inter alia) provides a straightforward explanation. Furthermore, the data show that the phonological reidentification patterns found in White Hmong exhibit parallels in other languages in the region, confirming that an areal approach to grammaticalization provides greater descriptive adequacy cross-linguistically as regards this phenomenon

ResearchOnline@JCU

ResearchOnline at James Cook University

UTILIZING DATA-DRIVEN AND KNOWLEDGE-BASED TECHNIQUES TO ENHANCE ARABIC SPEECH RECOGNITION

Author
Publication venue
Publication date
Field of study

UTILIZING DATA-DRIVEN AND KNOWLEDGE-BASED TECHNIQUES TO ENHANCE ARABIC SPEECH RECOGNITION

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

Regularities and Irregularities in Chinese Historical Phonology

Author: Bu Tianrang
Publication venue: Digital Commons at Oberlin
Publication date: 01/01/2018
Field of study

With a combination of methodologies from Western and Chinese traditional historical linguistics, this thesis is an attempt to survey and synthetically analyze the major sound changes in Chinese phonological history. It addresses two hypotheses – the Neogrammarian regularity hypothesis and the unidirectionality hypothesis – and tries to question their validity and applicability. Drawing from fourteen types of “regular” and “irregular” processes, the thesis argues that the origins and impetuses of sound change is far from just phonetic environment (“regular” changes) and lexical diffusion (“irregular” changes), and that sound change is not unidirectional because of the existence and significance of fortifying and bi/multidirectional changes. The thesis also examines the sociopolitical aspect of sound change through the discussion of language changes resulting from social, geographical and historical factors, suggesting that the study of sound change should be more interdisciplinary and miscellaneous in order to explain the phenomena more thoroughly and reach a better understanding of how human languages function both synchronically and diachronically

Digital Commons at Oberlin (Oberlin College)

Experience with foreign accent influences non-native (L2) word recognition: The case of th-substitutions [Abstract]

Author: Hanulikova A.
Weber A.
Publication venue
Publication date: 01/04/2009
Field of study

MPG.PuRe

Multi-dialect Arabic broadcast speech recognition

Author: Ali Ahmed Mohamed Abdel Maksoud
Publication venue: The University of Edinburgh
Publication date: 02/07/2018
Field of study

Dialectal Arabic speech research suffers from the lack of labelled resources and standardised orthography. There are three main challenges in dialectal Arabic speech recognition: (i) finding labelled dialectal Arabic speech data, (ii) training robust dialectal speech recognition models from limited labelled data and (iii) evaluating speech recognition for dialects with no orthographic rules. This thesis is concerned with the following three contributions: Arabic Dialect Identification: We are mainly dealing with Arabic speech without prior knowledge of the spoken dialect. Arabic dialects could be sufficiently diverse to the extent that one can argue that they are different languages rather than dialects of the same language. We have two contributions: First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected from Al Jazeera TV channel. We obtained utterance level dialect labels for 57 hours of high-quality consisting of four major varieties of dialectal Arabic (DA), comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification (ADI) system. We explored two main groups of features, namely acoustic features and linguistic features. For the linguistic features, we look at a wide range of features, addressing words, characters and phonemes. With respect to acoustic features, we look at raw features such as mel-frequency cepstral coefficients combined with shifted delta cepstra (MFCC-SDC), bottleneck features and the i-vector as a latent variable. We studied both generative and discriminative classifiers, in addition to deep learning approaches, namely deep neural network (DNN) and convolutional neural network (CNN). In our work, we propose Arabic as a five class dialect challenge comprising of the previously mentioned four dialects as well as modern standard Arabic. Arabic Speech Recognition: We introduce our effort in building Arabic automatic speech recognition (ASR) and we create an open research community to advance it. This section has two main goals: First, creating a framework for Arabic ASR that is publicly available for research. We address our effort in building two multi-genre broadcast (MGB) challenges. MGB-2 focuses on broadcast news using more than 1,200 hours of speech and 130M words of text collected from the broadcast domain. MGB-3, however, focuses on dialectal multi-genre data with limited non-orthographic speech collected from YouTube, with special attention paid to transfer learning. Second, building a robust Arabic ASR system and reporting a competitive word error rate (WER) to use it as a potential benchmark to advance the state of the art in Arabic ASR. Our overall system is a combination of five acoustic models (AM): unidirectional long short term memory (LSTM), bidirectional LSTM (BLSTM), time delay neural network (TDNN), TDNN layers along with LSTM layers (TDNN-LSTM) and finally TDNN layers followed by BLSTM layers (TDNN-BLSTM). The AM is trained using purely sequence trained neural networks lattice-free maximum mutual information (LFMMI). The generated lattices are rescored using a four-gram language model (LM) and a recurrent neural network with maximum entropy (RNNME) LM. Our official WER is 13%, which has the lowest WER reported on this task. Evaluation: The third part of the thesis addresses our effort in evaluating dialectal speech with no orthographic rules. Our methods learn from multiple transcribers and align the speech hypothesis to overcome the non-orthographic aspects. Our multi-reference WER (MR-WER) approach is similar to the BLEU score used in machine translation (MT). We have also automated this process by learning different spelling variants from Twitter data. We mine automatically from a huge collection of tweets in an unsupervised fashion to build more than 11M n-to-m lexical pairs, and we propose a new evaluation metric: dialectal WER (WERd). Finally, we tried to estimate the word error rate (e-WER) with no reference transcription using decoding and language features. We show that our word error rate estimation is robust for many scenarios with and without the decoding features

Edinburgh Research Archive