231 research outputs found
Temporal articulatory stability, phonological variation, and lexical contrast preservation in diaspora Tibetan
This dissertation examines how lexical tone can be represented with articulatory gestures, and the ways a gestural perspective can inform synchronic and diachronic analysis of the phonology and phonetics of a language. Tibetan is chosen an example of a language with interacting laryngeal and tonal phonology, a history of tonogenesis and dialect diversification, and recent contact-induced realignment of the tonal and consonantal systems. Despite variation in voice onset time (VOT) and presence/absence of the lexical tone contrast, speakers retain a consistent relative timing of consonant and vowel gestures. Recent research has attempted to integrate tone into the framework of Articulatory Phonology through the addition of tone gestures. Unlike other theories of phonetics-phonology, Articulatory Phonology uniquely incorporates relative timing as a key parameter. This allows the system to represent contrasts instantiated not just in the presence or absence of gestures, but also in how gestures are timed with each other. Building on the different predictions of various timing relations, along with the historical developments in the language, hypotheses are generated and tested with acoustic and articulatory experiments. Following an overview of relevant theory, the second chapter surveys past literature on the history of sound change and present phonological diversity of Tibetic dialects. Whereas Old Tibetan lacked lexical tone, contrasted voiced and voiceless obstruents, and exhibited complex clusters, a series of overlapping sound changes have led to some modern varieties that are tone, lack clusters, and vary in the expression of voicing and aspiration. Furthermore, speakers in the Tibetan diaspora use a variety that has grown out of the contact between diverse Tibetic dialects. The state of the language and the dynamics of diaspora have created a situation ripe for sound change, including the recombination of elements from different dialects and, potentially, the loss of tone contrasts. The nature of the diaspora Tibetan is investigated through an acoustic corpus study. Recordings made in Kathmandu, Nepal, are being transcribed and forced-aligned into a useful audio corpus. Speakers in the corpus come from diverse backgrounds across and outside traditional Tibetan-speaking regions, but the analysis presented here focuses on speakers who grew up in diaspora, with a mixed input of Standard Tibetan (spyi skad) and other Tibetan varieties. Especially notable among these speakers is the high variability of voice onset time (VOT) and its interaction with tone. An analysis of this data in terms of the relative timing of oral, laryngeal, and tone gestures leads to the generation of hypotheses for testing using articulatory data. The articulatory study is conducted using electromagnetic articulography (EMA), and six Tibetan-speaking participants. The key finding is that the relative timing of consonant and vowel gestures is consistent across phonological categories and across speakers who do and do not contrast tone. This result leads to the conclusion that the relative timing of speech gestures is conserved and acquired independently. Speakers acquire and generalize a limited inventory of timing patterns, and can use timing patterns even when the conditioning environment for the development of those patterns, namely tone, has been lost
The production and perception of coronal fricatives in Seoul Korean: The case for a fourth laryngeal category
This article presents new data on the contrast between the two voiceless coronal fricatives of Korean, variously described as a lenis/fortis or aspirated/fortis contrast. In utterance-initial position, the fricatives were found to differ in centroid frequency; duration of frication, aspiration, and the following vowel; and several aspects of the following vowel onset, including intensity profile, spectral tilt, and F1 onset. The between-fricative differences varied across vowel contexts, however, and spectral differences in the vowel onset especially were more pronounced for /a/ than for /i, ÉŻ, u/. This disparity led to the hypothesis that cues in the following vowel onset would exert a weaker influence on perception for high vowels than for low vowels. Perception data provided general support for this hypothesis, indicating that while vowel onset cues had the largest impact on perception for both high- and low-vowel stimuli, this influence was weaker for high vowels. Perception was also strongly influenced by aspiration duration, with modest contributions from frication duration and f0 onset. Taken together, these findings suggest that the 'non-fortis' fricative is best characterized not in terms of the lenis or aspirated categories for stops, but in terms of a unique representation that is both lenis and aspirated
Improving the Speech Intelligibility By Cochlear Implant Users
In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients
Tones in Zhangzhou: Pitch and Beyond
This study draws on various approaches—field linguistics;
auditory and acoustic phonetics; and statistics—to explore and
explain the nature of Zhangzhou tones, an under-described
Southern Min variety. Several original findings emerged from the
analyses of the data from 21 speakers. The realisations of
Zhangzhou tones are multidimensional. The single parameter of
pitch/F0 is not sufficient to characterise tonal contrasts in
either monosyllabic or polysyllabic settings in Zhangzhou.
Instead, various parameters, including pitch/F0, duration, vowel
quality, voice quality, and syllable coda type, interact in a
complicated but consistent way to code tonal distinctions.
Zhangzhou has eight tones rather than seven tones as proposed in
previous studies. This finding resulted from examining the
realisations of diverse parameters across three different
contexts—isolation, phrase-initial, and phrase-final—, rather
than classifying tones in citation and in terms of the
preservation of Middle Chinese tonal categories. Tonal contrasts
in Zhangzhou can be neutralised across different linguistic
contexts. Identifying the number of tonal contrasts based simply
on tonal realisations in the citation environment is not
sufficient. Instead, examining tonal realisations across
different linguistic contexts beyond monosyllables is imperative
for understanding the nature of tone.
Tone sandhi in Zhangzhou is syntactically relevant. The tone
sandhi domain is not phonologically determined but rather is
aligned with a syntactic phrase XP. Within a given XP, the
realisations of the tones at non-phrase-final positions undergo
alternation phonologically and phonetically. Nevertheless, the
alterations are sensitive only to the phrase boundaries and are
not affected by the internal structure of syntactic phrases.
Tone sandhi in Zhangzhou is phonologically inert but phonetically
sensitive. The realisations of Zhangzhou tones in disyllabic
phrases are not categorically affected by their surrounding tones
but are phonetically sensitive to surrounding environments. For
instance, the pitch/F0 onsets of phrase-final tones are largely
sensitive to pitch/F0 offsets of preceding tones and appear to
have diverse variants.
The mappings between Zhangzhou citation and disyllabic tones are
morphologically conditioned. Phrase-initial tones are largely not
related to the citation tones at either the phonological or the
phonetic level while phrase-final tones are categorically related
to the citation tones but phonetically are not quite the same
because of predictable sensitivity to surrounding environments.
Each tone in Zhangzhou can be regarded as a single morpheme
having two alternating allomorphs (tonemes), one for
non-phrase-final variants and one for variants in citation and
phrase-final contexts, both of which are listed in the mental
lexicon of native Zhangzhou speakers but are phonetically distant
on the surface.
In summary, the realisations of Zhangzhou tones are
multidimensional, involving a variety of segmental and
suprasegmental parameters. The interactions of Zhangzhou tones
are complicated, involving phonetics, phonology, syntax, and
morphology. Neutralisation of Zhangzhou tonal contrasts occurs
across different contexts, including citation, phrase-final, and
non-phrase-final. Thus, researchers must go beyond pitch to
understand tone thoroughly as a phenomenon in Southern Min
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis
Physiologically-Motivated Feature Extraction Methods for Speaker Recognition
Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks
- …