1,057 research outputs found

    Multi-Agent Simulation of Emergence of Schwa Deletion Pattern in Hindi

    Get PDF
    Recently, there has been a revival of interest in multi-agent simulation techniques for exploring the nature of language change. However, a lack of appropriate validation of simulation experiments against real language data often calls into question the general applicability of these methods in modeling realistic language change. We try to address this issue here by making an attempt to model the phenomenon of schwa deletion in Hindi through a multi-agent simulation framework. The pattern of Hindi schwa deletion and its diachronic nature are well studied, not only out of general linguistic inquiry, but also to facilitate Hindi grapheme-to-phoneme conversion, which is a preprocessing step to text-to-speech synthesis. We show that under certain conditions, the schwa deletion pattern observed in modern Hindi emerges in the system from an initial state of no deletion. The simulation framework described in this work can be extended to model other phonological changes as well.Language Change, Linguistic Agent, Language Game, Multi-Agent Simulation, Schwa Deletion

    Linguistically-motivated sub-word modeling with applications to speech recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 173-185).Despite the proliferation of speech-enabled applications and devices, speech-driven human-machine interaction still faces several challenges. One of theses issues is the new word or the out-of-vocabulary (OOV) problem, which occurs when the underlying automatic speech recognizer (ASR) encounters a word it does not "know". With ASR being deployed in constantly evolving domains such as restaurant ratings, or music querying, as well as on handheld devices, the new word problem continues to arise.This thesis is concerned with the OOV problem, and in particular with the process of modeling and learning the lexical properties of an OOV word through a linguistically-motivated sub-syllabic model. The linguistic model is designed using a context-free grammar which describes the sub-syllabic structure of English words, and encapsulates phonotactic and phonological constraints. The context-free grammar is supported by a probability model, which captures the statistics of the parses generated by the grammar and encodes spatio-temporal context. The two main outcomes of the grammar design are: (1) sub-word units, which encode pronunciation information, and can be viewed as clusters of phonemes; and (2) a high-quality alignment between graphemic and sub-word units, which results in hybrid entities denoted as spellnemes. The spellneme units are used in the design of a statistical bi-directional letter-to-sound (L2S) model, which plays a significant role in automatically learning the spelling and pronunciation of a new word.The sub-word units and the L2S model are assessed on the task of automatic lexicon generation. In a first set of experiments, knowledge of the spelling of the lexicon is assumed. It is shown that the phonemic pronunciations associated with the lexicon can be successfully learned using the L2S model as well as a sub-word recognizer.(cont.) In a second set of experiments, the assumption of perfect spelling knowledge is relaxed, and an iterative and unsupervised algorithm, denoted as Turbo-style, makes use of spoken instances of both spellings and words to learn the lexical entries in a dictionary.Sub-word speech recognition is also embedded in a parallel fashion as a backoff mechanism for a word recognizer. The resulting hybrid model is evaluated in a lexical access application, whereby a word recognizer first attempts to recognize an isolated word. Upon failure of the word recognizer, the sub-word recognizer is manually triggered. Preliminary results show that such a hybrid set-up outperforms a large-vocabulary recognizer.Finally, the sub-word units are embedded in a flat hybrid OOV model for continuous ASR. The hybrid ASR is deployed as a front-end to a song retrieval application, which is queried via spoken lyrics. Vocabulary compression and open-ended query recognition are achieved by designing a hybrid ASR. The performance of the frontend recognition system is reported in terms of sentence, word, and sub-word error rates. The hybrid ASR is shown to outperform a word-only system over a range of out-of-vocabulary rates (1%-50%). The retrieval performance is thoroughly assessed as a fmnction of ASR N-best size, language model order, and the index size. Moreover, it is shown that the sub-words outperform alternative linguistically-motivated sub-lexical units such as phonemes. Finally, it is observed that a dramatic vocabulary compression - by more than a factor of 10 - is accompanied by a minor loss in song retrieval performance.by Ghinwa F. Choueiter.Ph.D

    A New Acoustic-Based Pronunciation Distance Measure

    Get PDF
    We present an acoustic distance measure for comparing pronunciations, and apply the measure to assess foreign accent strength in American-English by comparing speech of non-native American-English speakers to a collection of native American-English speakers. An acoustic-only measure is valuable as it does not require the time-consuming and error-prone process of phonetically transcribing speech samples which is necessary for current edit distance-based approaches. We minimize speaker variability in the data set by employing speaker-based cepstral mean and variance normalization, and compute word-based acoustic distances using the dynamic time warping algorithm. Our results indicate a strong correlation of r = −0.71 (p < 0.0001) between the acoustic distances and human judgments of native-likeness provided by more than 1,100 native American-English raters. Therefore, the convenient acoustic measure performs only slightly lower than the state-of-the-art transcription-based performance of r = −0.77. We also report the results of several small experiments which show that the acoustic measure is not only sensitive to segmental differences, but also to intonational differences and durational differences. However, it is not immune to unwanted differences caused by using a different recording device

    Pronunciation Diagnosis : What to correct at first in YOUR case?

    Get PDF
    Native-sounding vs. intelligible. This has been a controversial issue for a long time in language learning and many teachers claim that intelligible pronunciation should be the goal. What is the physical definition of intelligibility? The current work shows a very good candidate answer to this question. The first author proposed a new paradigm of observing speech acoustics based upon structural phonology, where all the kinds of speech events are viewed as an entire structure and this structure was shown to be mathematically invariant with any static non-linguistic features such as age, gender, size, shape, microphone, room, line, and so on. This acoustic structure is purely linguistic and the phoneme-level structure is regarded as the pronunciation structure of individual students. This structure is matched with another linguistic structure, the lexical structure of the target language, and degree of compatibility between the two different levels of structures is calculated, which is defined as the intelligibility in this work. To increase the intelligibility, different instructions should be prepared for different students because no two students are the same. The phonological structure can be divided into some sub-structures. By evaluating which sub-structure causes the largest damage when communicating in the target language with the student’s phonological structure, the instruction is automatically generated on what to correct at first in his/her case

    Measuring foreign accent strength in English:Validating Levenshtein distance as a measure

    Get PDF
    With an eye toward measuring the strength of foreign accents in American English, we evaluate the suitability of a modified version of the Levenshtein distance for comparing (the phonetic transcriptions of) accented pronunciations. Although this measure has been used successfully inter alia to study the differences among dialect pronunciations, it has not been applied to studying foreign accents. Here, we use it to compare the pronunciation of non-native English speakers to native American English speech. Our results indicate that the Levenshtein distance is a valid native-likeness measurement, as it correlates strongly (r = -0.81) with the average "native-like" judgments given by more than 1000 native American English raters

    Acoustic Data-driven Pronunciation Lexicon for Large Vocabulary Speech Recognition

    Get PDF
    Speech recognition systems normally use handcrafted pronunciation lexicons designed by linguistic experts. Building and maintaining such a lexicon is expensive and time consuming. This paper concerns automatically learning a pronunciation lexicon for speech recognition. We assume the availability of a small seed lexicon and then learn the pronunciations of new words directly from speech that is transcribed at word-level. We present two implementations for refining the putative pronunciations of new words based on acoustic evidence. The first one is an expectation maximization (EM) algorithm based on weighted finite state transducers (WFSTs) and the other is its Viterbi approximation. We carried out experiments on the Switchboard corpus of conversational telephone speech. The expert lexicon has a size of more than 30,000 words, from which we randomly selected 5,000 words to form the seed lexicon. By using the proposed lexicon learning method, we have significantly improved the accuracy compared with a lexicon learned using a grapheme-tophoneme transformation, and have obtained a word error rate that approaches that achieved using a fully handcrafted lexicon. Index Terms — Lexical modelling, Probabilistic pronunciation model, Automatic speech recognition

    Determinants of English accents

    Get PDF
    In this study we investigate which factors affect the degree of non-native accent of L2 speakers of English who learned English in school and mostly lived for some time in an anglophone setting. We use data from the Speech Accent Archive containing over 700 speakers speaking almost 160 different native languages. We show that besides several important predictors, including the age of English onset and length of anglophone residence, the linguistic distance between the speaker’s native language and English is a significant predictor of the degree of non-native accent in pronunciation. This study extends an earlier study which only focused on Indo-European L2 learners of Dutch and used a general speaking proficiency measure

    An interactive speech training system with virtual reality articulation for Mandarin-speaking hearing impaired children

    Get PDF
    The present project involved the development of a novel interactive speech training system based on virtual reality articulation and examination of the efficacy of the system for hearing impaired (HI) children. Twenty meaningful Mandarin words were presented to the HI children via a 3-D talking head during articulation training. Electromagnetic Articulography (EMA) and graphic transform technology were used to depict movements of various articulators. In addition, speech corpuses were organized in listening and speaking training modules of the system to help improve language skills of the HI children. Accuracy of virtual reality articulatory movement was evaluated through a series of experiments. Finally, a pilot test was performed to train two HI children using the system. Preliminary results showed improvement in speech production by the HI children, and the system was recognized as acceptable and interesting for children. It can be concluded that the training system is effective and valid in articulation training for HI children. © 2013 IEEE.published_or_final_versio

    ESL Learners' Enhancement of Standard English Accent Among Khotimul Quran of A Primary School Students in Malaysia

    Get PDF
    CEFR (Common European Framework of Reference for Languages) is a new syllabus for English language subject of Malaysia primary school students. However, none of pronunciation elements involve the stress, rhythm and intonation have been emphasized to accomplish Standard English Accent. The issue is that, Malaysian primary school students do not apply correct stress and intonation while speaking and reading Standard English. Therefore, this study aims to identify the use of stress, rhythm and intonation applied in their spoken English words by both groups. The researcher had choose 15 Khotimul Quran students as the experimental group, whose background had complete reciting Quran for the whole 30 Juz and 15 common students as the control group from Sekolah Kebangsaan Pusat Air Tawar, Johor as the unit of analysis. The researcher uses semi-structured interview, observation and focus group discussion to triangulate the data. Pilot data analysis of the responses had shown a strong correlation between speaking Standard accent with correct stress, rhythm and intonation among the Khotimul Quran. Based on this results, the researcher expects the experimental group those who have Quranic-phonological background achieves higher percentages of accurateness of speaking Standard English accent compares to the control group
    corecore