108 research outputs found

    Temporal articulatory stability, phonological variation, and lexical contrast preservation in diaspora Tibetan

    Get PDF
    This dissertation examines how lexical tone can be represented with articulatory gestures, and the ways a gestural perspective can inform synchronic and diachronic analysis of the phonology and phonetics of a language. Tibetan is chosen an example of a language with interacting laryngeal and tonal phonology, a history of tonogenesis and dialect diversification, and recent contact-induced realignment of the tonal and consonantal systems. Despite variation in voice onset time (VOT) and presence/absence of the lexical tone contrast, speakers retain a consistent relative timing of consonant and vowel gestures. Recent research has attempted to integrate tone into the framework of Articulatory Phonology through the addition of tone gestures. Unlike other theories of phonetics-phonology, Articulatory Phonology uniquely incorporates relative timing as a key parameter. This allows the system to represent contrasts instantiated not just in the presence or absence of gestures, but also in how gestures are timed with each other. Building on the different predictions of various timing relations, along with the historical developments in the language, hypotheses are generated and tested with acoustic and articulatory experiments. Following an overview of relevant theory, the second chapter surveys past literature on the history of sound change and present phonological diversity of Tibetic dialects. Whereas Old Tibetan lacked lexical tone, contrasted voiced and voiceless obstruents, and exhibited complex clusters, a series of overlapping sound changes have led to some modern varieties that are tone, lack clusters, and vary in the expression of voicing and aspiration. Furthermore, speakers in the Tibetan diaspora use a variety that has grown out of the contact between diverse Tibetic dialects. The state of the language and the dynamics of diaspora have created a situation ripe for sound change, including the recombination of elements from different dialects and, potentially, the loss of tone contrasts. The nature of the diaspora Tibetan is investigated through an acoustic corpus study. Recordings made in Kathmandu, Nepal, are being transcribed and forced-aligned into a useful audio corpus. Speakers in the corpus come from diverse backgrounds across and outside traditional Tibetan-speaking regions, but the analysis presented here focuses on speakers who grew up in diaspora, with a mixed input of Standard Tibetan (spyi skad) and other Tibetan varieties. Especially notable among these speakers is the high variability of voice onset time (VOT) and its interaction with tone. An analysis of this data in terms of the relative timing of oral, laryngeal, and tone gestures leads to the generation of hypotheses for testing using articulatory data. The articulatory study is conducted using electromagnetic articulography (EMA), and six Tibetan-speaking participants. The key finding is that the relative timing of consonant and vowel gestures is consistent across phonological categories and across speakers who do and do not contrast tone. This result leads to the conclusion that the relative timing of speech gestures is conserved and acquired independently. Speakers acquire and generalize a limited inventory of timing patterns, and can use timing patterns even when the conditioning environment for the development of those patterns, namely tone, has been lost

    CONTROL AND BIOMECHANICS IN COARTICULATION: INSIGHTS FROM AN ULTRASOUND STUDY OF STANDARD MANDARIN APICAL VOWELS

    Get PDF
    This study investigated the extent to which speaker-induced control and biomechanics play a role in determining the outcome of spatial coarticulation. Employing ultrasound tongue imaging, coarticulatory effects from and induced on adjacent consonants were quantified as measures of coarticulatory resistance and aggressiveness for the two apical vowels of Standard Mandarin in comparison to the three corner vowels. The results show that the two apical vowels are much less resistant to coarticulatory effects than the vowels [i a u], and they often do not induce larger effects on adjacent consonants than these vowels, due to speaker-targeted effects. It was also found that the retroflex apical vowel was consistently more resistant and aggressive than the dental apical vowel, due to biomechanical differences. Together, both of these findings implicate the roles of speaker control and biomechanics in coarticulation and highlight the need for a model of coarticulation to include both of these factors.Master of Art

    A syllable-based investigation of coarticulation

    Get PDF
    Coarticulation has been long investigated in Speech Sciences and Linguistics (Kühnert & Nolan, 1999). This thesis explores coarticulation through a syllable based model (Y. Xu, 2020). First, it is hypothesised that consonant and vowel are synchronised at the syllable onset for the sake of reducing temporal degrees of freedom, and such synchronisation is the essence of coarticulation. Previous efforts in the examination of CV alignment mainly report onset asynchrony (Gao, 2009; Shaw & Chen, 2019). The first study of this thesis tested the synchrony hypothesis using articulatory and acoustic data in Mandarin. Departing from conventional approaches, a minimal triplet paradigm was applied, in which the CV onsets were determined through the consonant and vowel minimal pairs, respectively. Both articulatory and acoustical results showed that CV articulation started in close temporal proximity, supporting the synchrony hypothesis. The second study extended the research to English and syllables with cluster onsets. By using acoustic data in conjunction with Deep Learning, supporting evidence was found for co-onset, which is in contrast to the widely reported c-center effect (Byrd, 1995). Secondly, the thesis investigated the mechanism that can maximise synchrony – Dimension Specific Sequential Target Approximation (DSSTA), which is highly relevant to what is commonly known as coarticulation resistance (Recasens & Espinosa, 2009). Evidence from the first two studies show that, when conflicts arise due to articulation requirements between CV, the CV gestures can be fulfilled by the same articulator on separate dimensions simultaneously. Last but not least, the final study tested the hypothesis that resyllabification is the result of coarticulation asymmetry between onset and coda consonants. It was found that neural network based models could infer syllable affiliation of consonants, and those inferred resyllabified codas had similar coarticulatory structure with canonical onset consonants. In conclusion, this thesis found that many coarticulation related phenomena, including local vowel to vowel anticipatory coarticulation, coarticulation resistance, and resyllabification, stem from the articulatory mechanism of the syllable

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    The Development Of Glide Deletion In Seoul Korean: A Corpus And Articulatory Study

    Get PDF
    This dissertation investigates the pathways and causes of the development of glide deletion in Seoul Korean. Seoul provides fertile ground for studies of linguistic innovation in an urban setting since it has seen rapid historical, social and demographic changes in the twentieth century. The phenomenon under investigation is the variable deletion of the labiovelar glide /w/ found to be on the rise in Seoul Korean (Silva, 1991; Kang, 1997). I present two studies addressing variation and change at two different levels: a corpus study tracking the development of /w/-deletion at the phonological level and an articulatory study examining the phonetic aspect of this change. The corpus data are drawn from the sociolinguistic interviews with 48 native Seoul Koreans between 2015 and 2017. A trend comparison with the data from an earlier study of /w/- deletion (Kang, 1997) reveals that /w/-deletion in postconsonantal position has begun to retreat, while non-postconsonantal /w/-deletion has been rising vigorously. More importantly, the effect of preceding segment that used to be the strongest constraint on /w/-deletion has weakened over time. I conclude that /w/-deletion in Seoul Korean is being reanalyzed with the structural details being diluted over time. I analyze this weakening of the original pattern as the result of linguistic diffusion induced by a great influx of migrants into Seoul after the Korean War (1950-1953). In an articulatory study, ultrasound data of tongue movements and video data of lip rounding for the production of /w/ for three native Seoul Koreans in their 20s, 30s and 50s were analyzed using Optical Flow Analysis. I find that /w/ in Seoul Korean is subject to both gradient reduction and categorical deletion and that younger speakers exhibit a significantly larger articulatory gestures for /w/ after a bilabial than older generation, which is consistent with the pattern of phonological change found in the corpus study. This dissertation demonstrates the importance of using both corpus and articulatory data in the investigation of a change, finding the coexistence of gradient and categorical effects in segmental deletion processes. Finally, it advances our understanding of the outcome of migration-induced dialect contact in contemporary urban settings

    Articulation of the Japanese Moraic Nasal: Place of Articulation, Assimilation, and L2 Transfer

    Full text link
    The moraic nasal /N/ in Japanese has been transcribed in multiple ways, but very few studies have examined its articulation. The nature of its assimilation has often been described in phonology, but again, very few articulatory investigations have been conducted. Also, while a first language (L1) effect on second language (L2) production has been discussed for some phonemes, there is no good research on the effect of Japanese /N/ on L2 English syllable-final nasals. This dissertation investigates the articulation of the moraic nasal /N/ in Japanese using an ultrasound articulatory imaging technique to assess 1) its place of articulation, 2) patterns of place assimilation to the following segment, and 3) the effect of L1 /N/ on L2 English syllable-final nasal production. Eight native speakers of Japanese participated. Their productions of Japanese words and English words were analyzed acoustically and articulatorily. The results showed that the place of articulation for utterance-final /N/ following the vowel /a/ varied across native speakers of Japanese from alveolar to uvular, which is compatible with previous descriptions of /N/ in intervocalic position. Patterns of place assimilation of the moraic nasal to a following segment were not always categorical, and a gesture for the target of the moraic nasal, while varying among individuals, sometimes remained depending on the phonological environments. This suggests that the assimilation takes place not only at the phonological level but also at the phonetic level, even if the assimilation is considered to be obligatory. An effect of L1 /N/ on the production of word-final nasals in L2 English was observed, although the degree of the effect varied across speakers. In conclusion, these findings enhance our understanding of the articulatory characteristics of the moraic nasal /N/ in Japanese, providing a firmer basis for phonological and phonetic arguments. The findings should also encourage further investigation and discussion of the phonological and phonetic behavior of /N/

    Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab

    Get PDF
    Articulatory copy synthesis (ACS), a subarea of speech inversion, refers to the reproduction of natural utterances and involves both the physiological articulatory processes and their corresponding acoustic results. This thesis proposes two novel methods for the ACS of human speech using the articulatory speech synthesizer VocalTractLab (VTL) to address or mitigate the existing problems of speech inversion, such as non-unique mapping, acoustic variation among different speakers, and the time-consuming nature of the process. The first method involved finding appropriate VTL gestural scores for given natural utterances using a genetic algorithm. It consisted of two steps: gestural score initialization and optimization. In the first step, gestural scores were initialized using the given acoustic signals with speech recognition, grapheme-to-phoneme (G2P), and a VTL rule-based method for converting phoneme sequences to gestural scores. In the second step, the initial gestural scores were optimized by a genetic algorithm via an analysis-by-synthesis (ABS) procedure that sought to minimize the cosine distance between the acoustic features of the synthetic and natural utterances. The articulatory parameters were also regularized during the optimization process to restrict them to reasonable values. The second method was based on long short-term memory (LSTM) and convolutional neural networks, which were responsible for capturing the temporal dependence and the spatial structure of the acoustic features, respectively. The neural network regression models were trained, which used acoustic features as inputs and produced articulatory trajectories as outputs. In addition, to cover as much of the articulatory and acoustic space as possible, the training samples were augmented by manipulating the phonation type, speaking effort, and the vocal tract length of the synthetic utterances. Furthermore, two regularization methods were proposed: one based on the smoothness loss of articulatory trajectories and another based on the acoustic loss between original and predicted acoustic features. The best-performing genetic algorithms and convolutional LSTM systems (evaluated in terms of the difference between the estimated and reference VTL articulatory parameters) obtained average correlation coefficients of 0.985 and 0.983 for speaker-dependent utterances, respectively, and their reproduced speech achieved recognition accuracies of 86.25% and 64.69% for speaker-independent utterances of German words, respectively. When applied to German sentence utterances, as well as English and Mandarin Chinese word utterances, the neural network based ACS systems achieved recognition accuracies of 73.88%, 52.92%, and 52.41%, respectively. The results showed that both of these methods not only reproduced the articulatory processes but also reproduced the acoustic signals of reference utterances. Moreover, the regularization methods led to more physiologically plausible articulatory processes and made the estimated articulatory trajectories be more articulatorily preferred by VTL, thus reproducing more natural and intelligible speech. This study also found that the convolutional layers, when used in conjunction with batch normalization layers, automatically learned more distinctive features from log power spectrograms. Furthermore, the neural network based ACS systems trained using German data could be generalized to the utterances of other languages
    corecore