231 research outputs found

    Temporal articulatory stability, phonological variation, and lexical contrast preservation in diaspora Tibetan

    Get PDF
    This dissertation examines how lexical tone can be represented with articulatory gestures, and the ways a gestural perspective can inform synchronic and diachronic analysis of the phonology and phonetics of a language. Tibetan is chosen an example of a language with interacting laryngeal and tonal phonology, a history of tonogenesis and dialect diversification, and recent contact-induced realignment of the tonal and consonantal systems. Despite variation in voice onset time (VOT) and presence/absence of the lexical tone contrast, speakers retain a consistent relative timing of consonant and vowel gestures. Recent research has attempted to integrate tone into the framework of Articulatory Phonology through the addition of tone gestures. Unlike other theories of phonetics-phonology, Articulatory Phonology uniquely incorporates relative timing as a key parameter. This allows the system to represent contrasts instantiated not just in the presence or absence of gestures, but also in how gestures are timed with each other. Building on the different predictions of various timing relations, along with the historical developments in the language, hypotheses are generated and tested with acoustic and articulatory experiments. Following an overview of relevant theory, the second chapter surveys past literature on the history of sound change and present phonological diversity of Tibetic dialects. Whereas Old Tibetan lacked lexical tone, contrasted voiced and voiceless obstruents, and exhibited complex clusters, a series of overlapping sound changes have led to some modern varieties that are tone, lack clusters, and vary in the expression of voicing and aspiration. Furthermore, speakers in the Tibetan diaspora use a variety that has grown out of the contact between diverse Tibetic dialects. The state of the language and the dynamics of diaspora have created a situation ripe for sound change, including the recombination of elements from different dialects and, potentially, the loss of tone contrasts. The nature of the diaspora Tibetan is investigated through an acoustic corpus study. Recordings made in Kathmandu, Nepal, are being transcribed and forced-aligned into a useful audio corpus. Speakers in the corpus come from diverse backgrounds across and outside traditional Tibetan-speaking regions, but the analysis presented here focuses on speakers who grew up in diaspora, with a mixed input of Standard Tibetan (spyi skad) and other Tibetan varieties. Especially notable among these speakers is the high variability of voice onset time (VOT) and its interaction with tone. An analysis of this data in terms of the relative timing of oral, laryngeal, and tone gestures leads to the generation of hypotheses for testing using articulatory data. The articulatory study is conducted using electromagnetic articulography (EMA), and six Tibetan-speaking participants. The key finding is that the relative timing of consonant and vowel gestures is consistent across phonological categories and across speakers who do and do not contrast tone. This result leads to the conclusion that the relative timing of speech gestures is conserved and acquired independently. Speakers acquire and generalize a limited inventory of timing patterns, and can use timing patterns even when the conditioning environment for the development of those patterns, namely tone, has been lost

    The production and perception of coronal fricatives in Seoul Korean: The case for a fourth laryngeal category

    Get PDF
    This article presents new data on the contrast between the two voiceless coronal fricatives of Korean, variously described as a lenis/fortis or aspirated/fortis contrast. In utterance-initial position, the fricatives were found to differ in centroid frequency; duration of frication, aspiration, and the following vowel; and several aspects of the following vowel onset, including intensity profile, spectral tilt, and F1 onset. The between-fricative differences varied across vowel contexts, however, and spectral differences in the vowel onset especially were more pronounced for /a/ than for /i, ÉŻ, u/. This disparity led to the hypothesis that cues in the following vowel onset would exert a weaker influence on perception for high vowels than for low vowels. Perception data provided general support for this hypothesis, indicating that while vowel onset cues had the largest impact on perception for both high- and low-vowel stimuli, this influence was weaker for high vowels. Perception was also strongly influenced by aspiration duration, with modest contributions from frication duration and f0 onset. Taken together, these findings suggest that the 'non-fortis' fricative is best characterized not in terms of the lenis or aspirated categories for stops, but in terms of a unique representation that is both lenis and aspirated

    Improving the Speech Intelligibility By Cochlear Implant Users

    Get PDF
    In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients

    Tones in Zhangzhou: Pitch and Beyond

    Get PDF
    This study draws on various approaches—field linguistics; auditory and acoustic phonetics; and statistics—to explore and explain the nature of Zhangzhou tones, an under-described Southern Min variety. Several original findings emerged from the analyses of the data from 21 speakers. The realisations of Zhangzhou tones are multidimensional. The single parameter of pitch/F0 is not sufficient to characterise tonal contrasts in either monosyllabic or polysyllabic settings in Zhangzhou. Instead, various parameters, including pitch/F0, duration, vowel quality, voice quality, and syllable coda type, interact in a complicated but consistent way to code tonal distinctions. Zhangzhou has eight tones rather than seven tones as proposed in previous studies. This finding resulted from examining the realisations of diverse parameters across three different contexts—isolation, phrase-initial, and phrase-final—, rather than classifying tones in citation and in terms of the preservation of Middle Chinese tonal categories. Tonal contrasts in Zhangzhou can be neutralised across different linguistic contexts. Identifying the number of tonal contrasts based simply on tonal realisations in the citation environment is not sufficient. Instead, examining tonal realisations across different linguistic contexts beyond monosyllables is imperative for understanding the nature of tone. Tone sandhi in Zhangzhou is syntactically relevant. The tone sandhi domain is not phonologically determined but rather is aligned with a syntactic phrase XP. Within a given XP, the realisations of the tones at non-phrase-final positions undergo alternation phonologically and phonetically. Nevertheless, the alterations are sensitive only to the phrase boundaries and are not affected by the internal structure of syntactic phrases. Tone sandhi in Zhangzhou is phonologically inert but phonetically sensitive. The realisations of Zhangzhou tones in disyllabic phrases are not categorically affected by their surrounding tones but are phonetically sensitive to surrounding environments. For instance, the pitch/F0 onsets of phrase-final tones are largely sensitive to pitch/F0 offsets of preceding tones and appear to have diverse variants. The mappings between Zhangzhou citation and disyllabic tones are morphologically conditioned. Phrase-initial tones are largely not related to the citation tones at either the phonological or the phonetic level while phrase-final tones are categorically related to the citation tones but phonetically are not quite the same because of predictable sensitivity to surrounding environments. Each tone in Zhangzhou can be regarded as a single morpheme having two alternating allomorphs (tonemes), one for non-phrase-final variants and one for variants in citation and phrase-final contexts, both of which are listed in the mental lexicon of native Zhangzhou speakers but are phonetically distant on the surface. In summary, the realisations of Zhangzhou tones are multidimensional, involving a variety of segmental and suprasegmental parameters. The interactions of Zhangzhou tones are complicated, involving phonetics, phonology, syntax, and morphology. Neutralisation of Zhangzhou tonal contrasts occurs across different contexts, including citation, phrase-final, and non-phrase-final. Thus, researchers must go beyond pitch to understand tone thoroughly as a phenomenon in Southern Min

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

    Get PDF
    Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks

    Acoustic Characteristics of the Shanghai-Zhenhai Syllable Types

    Get PDF
    • …
    corecore