354 research outputs found

    Perception and production of Mandarin-Accented English: The effect of degree of Accentedness on the Interlanguage Speech Intelligibility Benefit for Listeners (ISIB-L) and Talkers (ISIB-T)

    Get PDF
    Previous research on the Interlanguage Speech Intelligibility Benefit (ISIB) indicates nonnative listeners may have an advantage at understanding nonnative speech of talkers with the same first language (L1) due to shared interlanguage knowledge. The present study offers a comprehensive analysis of various factors that may modulate this advantage, including the proficiency of both the listeners and the talkers, the mapping of phonemes between the L1 and second language (L2), and the acoustic properties of the phones. Accuracy scores on a lexical decision task were used to investigate both native English listeners’ and native Mandarin learners’ of English perception of native English and Mandarin-accented English speech. Results show clear ISIB-L and ISIB-T effects and demonstrate the dynamic nature of ISIB effects, with both being modulated by speaker and listener proficiency. More striking ISIB effects typically occur at the most extreme ends of accentedness. Additionally, an advantage for common-phoneme over unique-phoneme words in nonnative speech was observed. While nonnative productions of common-phoneme words are more accurate than those of unique-phoneme words, for the most accented productions, nonnative listeners are faster to respond to these unique, often mispronounced, productions. The nonnative listener advantage at perceiving nonnative speech depends on various factors, including listener proficiency, speaker proficiency, phoneme characteristics, and the acoustics of specific speech tokens

    The Acoustic Correlates of Stress-Shifting Suffixes in Native and Nonnative English

    Get PDF
    Although laboratory phonology techniques have been widely employed to discover the interplay between the acoustic correlates of English Lexical Stress (ELS)–fundamental frequency, duration, and intensity - studies on ELS in polysyllabic words are rare, and cross-linguistic acoustic studies in this area are even rarer. Consequently, the effects of language experience on L2 lexical stress acquisition are not clear. This investigation of adult Arabic (Saudi Arabian) and Mandarin (Mainland Chinese) speakers analyzes their ELS production in tokens with seven different stress-shifting suffixes; i.e., Level 1 [+cyclic] derivations to phonologists. Stress productions are then systematically analyzed and compared with those of speakers of Midwest American English using the acoustic phonetic software, Praat. In total, one hundred subjects participated in the study, spread evenly across the three language groups, and 2,125 vowels in 800 spectrograms were analyzed (excluding stress placement and pronunciation errors). Nonnative speakers completed a sociometric survey prior to recording so that statistical sampling techniques could be used to evaluate acquisition of accurate ELS production. The speech samples of native speakers were analyzed to provide norm values for cross-reference and to provide insights into the proposed Salience Hierarchy of the Acoustic Correlates of Stress (SHACS). The results support the notion that a SHACS does exist in the L1 sound system, and that native-like command of this system through accurate ELS production can be acquired by proficient L2 learners via increased L2 input. Other findings raise questions as to the accuracy of standard American English dictionary pronunciations as well as the generalizability of claims made about the acoustic properties of tonic accent shift

    Multilingual Spoken Language Translation

    Get PDF

    Developing Sparse Representations for Anchor-Based Voice Conversion

    Get PDF
    Voice conversion is the task of transforming speech from one speaker to sound as if it was produced by another speaker, changing the identity while retaining the linguistic content. There are many methods for performing voice conversion, but oftentimes these methods have onerous training requirements or fail in instances where one speaker has a nonnative accent. To address these issues, this dissertation presents and evaluates a novel “anchor-based” representation of speech that separates speaker content from speaker identity by modeling how speakers form English phonemes. We call the proposed method Sparse, Anchor-Based Representation of Speech (SABR), and explore methods for optimizing the parameters of this model in native-to-native and native-to-nonnative voice conversion contexts. We begin the dissertation by demonstrating how sparse coding in combination with a compact, phoneme-based dictionary can be used to separate speaker identity from content in objective and subjective tests. The formulation of the representation then presents several research questions. First, we propose a method for improving the synthesis quality by using the sparse coding residual in combination with a frequency warping algorithm to convert the residual from the source to target speaker’s space, and add it to the target speaker’s estimated spectrum. Experimentally, we find that synthesis quality is significantly improved via this transform. Second, we propose and evaluate two methods for selecting and optimizing SABR anchors in native-to-native and native-to-nonnative voice conversion. We find that synthesis quality is significantly improved by the proposed methods, especially in native-to- nonnative voice conversion over baseline algorithms. In a detailed analysis of the algorithms, we find they focus on phonemes that are difficult for nonnative speakers of English or naturally have multiple acoustic states. Following this, we examine methods for adding in temporal constraints to SABR via the Fused Lasso. The proposed method significantly reduces the inter-frame variance in the sparse codes over other methods that incorporate temporal features into sparse coding representations. Finally, in a case study, we examine the use of the SABR methods and optimizations in the context of a computer aided pronunciation training system for building “Golden Speakers”, or ideal models for nonnative speakers of a second language to learn correct pronunciation. Under the hypothesis that the optimal “Golden Speaker” was the learner’s voice, synthesized with a native accent, we used SABR to build voice models for nonnative speakers and evaluated the resulting synthesis in terms of quality, identity, and accentedness. We found that even when deployed in the field, the SABR method generated synthesis with low accentedness and similar acoustic identity to the target speaker, validating the use of the method for building “golden speakers”

    Methods for pronunciation assessment in computer aided language learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 149-176).Learning a foreign language is a challenging endeavor that entails acquiring a wide range of new knowledge including words, grammar, gestures, sounds, etc. Mastering these skills all require extensive practice by the learner and opportunities may not always be available. Computer Aided Language Learning (CALL) systems provide non-threatening environments where foreign language skills can be practiced where ever and whenever a student desires. These systems often have several technologies to identify the different types of errors made by a student. This thesis focuses on the problem of identifying mispronunciations made by a foreign language student using a CALL system. We make several assumptions about the nature of the learning activity: it takes place using a dialogue system, it is a task- or game-oriented activity, the student should not be interrupted by the pronunciation feedback system, and that the goal of the feedback system is to identify severe mispronunciations with high reliability. Detecting mispronunciations requires a corpus of speech with human judgements of pronunciation quality. Typical approaches to collecting such a corpus use an expert phonetician to both phonetically transcribe and assign judgements of quality to each phone in a corpus. This is time consuming and expensive. It also places an extra burden on the transcriber. We describe a novel method for obtaining phone level judgements of pronunciation quality by utilizing non-expert, crowd-sourced, word level judgements of pronunciation. Foreign language learners typically exhibit high variation and pronunciation shapes distinct from native speakers that make analysis for mispronunciation difficult. We detail a simple, but effective method for transforming the vowel space of non-native speakers to make mispronunciation detection more robust and accurate. We show that this transformation not only enhances performance on a simple classification task, but also results in distributions that can be better exploited for mispronunciation detection. This transformation of the vowel is exploited to train a mispronunciation detector using a variety of features derived from acoustic model scores and vowel class distributions. We confirm that the transformation technique results in a more robust and accurate identification of mispronunciations than traditional acoustic models.by Mitchell A. Peabody.Ph.D

    Loan Phonology

    Get PDF
    For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native language’s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena

    The Effect of Using Authentic Videos on English Major Students' Prosodic Competence

    Get PDF
    This study aims to investigate the effect of using authentic videos on the prosodic competence of foreign language learners. It is hypothesized worldwide that authentic videos have a positive effect on the EFL learners' supra segmental competence. The population of the study included 32 students majoring in English Language at Taibah University in KSA during the academic year 2011/2012. The sample consisted of two sections, a control group and an experimental one. A pretest was administered to both groups to ensure that they were homogeneous. The control group was taught supra segmental aspects of language using a traditional approach while the experimental group was taught authentic videos. About four months later, a posttest was administered. The results of the study showed that there was much progress in the experimental group which significantly outperformed the control group in the different aspects of prosody. These findings confirm the hypothesis which read videos can have a positive effect on the EFL learners' supra segmental competence.  Keywords :Supra segmental competence, authentic videos ,Saudi English major students as  EFL learners, Intonation, Pronunciation, Stress, Pause , Juncture , Rhyme ,  and Prosodic aspects of language

    Automatic Pronunciation Assessment -- A Review

    Full text link
    Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding

    Tonal Adaptation of Loanwords in Mandarin: Phonology and Beyond

    Full text link
    This study examines the tonal adaptation of English and Japanese loanwords in Mandarin, and considers data collected from different types of sources. The purpose overall is to identify the mechanisms underlying the adaptation processes by which tone is assigned, and to check if the same mechanisms are invoked regardless of donor languages and source types. Both corpus and experimental methods were utilized to survey a broad sampling of borrowings and a wide array of syllable types that target specific phonetic properties. To maximally rule out the effect of semantic tingeing, this study examined English place names that were extracted from a dictionary and from online travel blogs. And to explore how semantic association might interfere with the adaptation processes, this study also investigated a separate corpus of Japanese manga role names and brand names. Revisiting discussions in previous studies about how phonetic properties of the source form might affect tonal assignments in the adapted forms, this study also included an expanded reanalysis of adaptations elicited in an experimental setting. Observations made in the study suggest that the primary mechanisms behind tonal assignments for loanwords in Mandarin operate at a level beyond any usual phonological concerns: the adaptation processes are heavily reliant on factors that are inherent to Mandarin lexical distributions, such as tone probability and character frequency. Adapters apparently utilize their tacit knowledge about such distributional properties when assigning tones. Also crucial to the tonal assignment mechanism is the seeking of appropriate characters based on their meanings, either to avoid unintended readings of loanwords or to form desired interpretations. Such adaptation mechanisms are mainly attributable to the morpho-syllabic nature of the Chinese writing system, the language’s high productivity of compound words, and its high incidence of homophony. Also noted in the study is the influence of prescriptive conventions formulated for formally established loanwords. Research findings reported in this study highlight such non-phonological aspects of loanword adaptation, especially the role of the writing system, that have been underestimated to date in the field of loanword phonology and cross-linguistic studies of loanword typology
    • …
    corecore