279 research outputs found

    A syllable-based investigation of coarticulation

    Get PDF
    Coarticulation has been long investigated in Speech Sciences and Linguistics (Kühnert & Nolan, 1999). This thesis explores coarticulation through a syllable based model (Y. Xu, 2020). First, it is hypothesised that consonant and vowel are synchronised at the syllable onset for the sake of reducing temporal degrees of freedom, and such synchronisation is the essence of coarticulation. Previous efforts in the examination of CV alignment mainly report onset asynchrony (Gao, 2009; Shaw & Chen, 2019). The first study of this thesis tested the synchrony hypothesis using articulatory and acoustic data in Mandarin. Departing from conventional approaches, a minimal triplet paradigm was applied, in which the CV onsets were determined through the consonant and vowel minimal pairs, respectively. Both articulatory and acoustical results showed that CV articulation started in close temporal proximity, supporting the synchrony hypothesis. The second study extended the research to English and syllables with cluster onsets. By using acoustic data in conjunction with Deep Learning, supporting evidence was found for co-onset, which is in contrast to the widely reported c-center effect (Byrd, 1995). Secondly, the thesis investigated the mechanism that can maximise synchrony – Dimension Specific Sequential Target Approximation (DSSTA), which is highly relevant to what is commonly known as coarticulation resistance (Recasens & Espinosa, 2009). Evidence from the first two studies show that, when conflicts arise due to articulation requirements between CV, the CV gestures can be fulfilled by the same articulator on separate dimensions simultaneously. Last but not least, the final study tested the hypothesis that resyllabification is the result of coarticulation asymmetry between onset and coda consonants. It was found that neural network based models could infer syllable affiliation of consonants, and those inferred resyllabified codas had similar coarticulatory structure with canonical onset consonants. In conclusion, this thesis found that many coarticulation related phenomena, including local vowel to vowel anticipatory coarticulation, coarticulation resistance, and resyllabification, stem from the articulatory mechanism of the syllable

    The analysis of breathing and rhythm in speech

    Get PDF
    Speech rhythm can be described as the temporal patterning by which speech events, such as vocalic onsets, occur. Despite efforts to quantify and model speech rhythm across languages, it remains a scientifically enigmatic aspect of prosody. For instance, one challenge lies in determining how to best quantify and analyse speech rhythm. Techniques range from manual phonetic annotation to the automatic extraction of acoustic features. It is currently unclear how closely these differing approaches correspond to one another. Moreover, the primary means of speech rhythm research has been the analysis of the acoustic signal only. Investigations of speech rhythm may instead benefit from a range of complementary measures, including physiological recordings, such as of respiratory effort. This thesis therefore combines acoustic recording with inductive plethysmography (breath belts) to capture temporal characteristics of speech and speech breathing rhythms. The first part examines the performance of existing phonetic and algorithmic techniques for acoustic prosodic analysis in a new corpus of rhythmically diverse English and Mandarin speech. The second part addresses the need for an automatic speech breathing annotation technique by developing a novel function that is robust to the noisy plethysmography typical of spontaneous, naturalistic speech production. These methods are then applied in the following section to the analysis of English speech and speech breathing in a second, larger corpus. Finally, behavioural experiments were conducted to investigate listeners' perception of speech breathing using a novel gap detection task. The thesis establishes the feasibility, as well as limits, of automatic methods in comparison to manual annotation. In the speech breathing corpus analysis, they help show that speakers maintain a normative, yet contextually adaptive breathing style during speech. The perception experiments in turn demonstrate that listeners are sensitive to the violation of these speech breathing norms, even if unconsciously so. The thesis concludes by underscoring breathing as a necessary, yet often overlooked, component in speech rhythm planning and production

    Are tones aligned with articulatory events? Evidence from Italian and French

    No full text
    International audienceTonal alignment work has suggested that the temporal location of tonal targets relative to segmental "anchors" might be governed by principles of synchrony and stability (Arvaniti et al 1998, Ladd et al. 1999, inter alia). However, a number of discrepancies have emerged in the cross-linguistic study of alignment. For instance, despite some regularities in the alignment of L targets (Caspers and van Heuven 1993; Prieto et al. 1995), the alignment of H targets appears to be quite controversial. In fact, it is sometimes difficult to find definite segmental landmarks to which such targets might be aligned. Also, most of the alignment proposals so far inherently assume that if some anchors for tonal alignment do exist they must be acoustic in nature. A plausible alternative would be to assume that such anchors are primarily articulatory, which would explain why in some cases the underlying regularities would be masked. Hence, we adopt a new experimental paradigm for alignment research in which articulatory measures are performed simultaneously with acoustic measures. In order to test the constant alignment hypothesis, a preliminary study (D'Imperio et al. 2003) was conducted in which various latency measures, both acoustically and articulatorily based, were analyzed. Specifically, the kinematics of OPTOTRAK markers attached to the speaker's upper and lower lip was tracked over time during the production of the corpus sentences. The melodic target considered is the H tone of LH nuclear rises in Neapolitan Italian. In this variety, yes/no question LH rises are systematically later than (narrow focus) statement LH rises (D'Imperio 2000, 2001, 2002; D'Imperio and House 1997). In order to test the hypothesis of constant anchoring of H targets, the materials were produced with two different rates of speech, i.e. normal and fast. Summarizing the results, H targets of nuclear rises in Neapolitan statements and questions appear to be more closely phased with the articulatory dimension of between-lip distance than with two of the most commonly employed acoustic segmental landmarks for tonal alignment (i.e., onset and offset of stressed vowel). Statement H tones are phased with maximum between-lip distance within the stressed syllable. Note that this location does not correspond to any identifiable segmental boundary, acoustic event or phonological unit, and does not overlap with RMS peak amplitude. In fact, RMS peaks were generally much earlier than articulatory peaks, hence further away from H peaks. This calls for the collection and analysis of more articulatory data (especially jaw and tongue movements) to shed light on tonal alignment issues.In a second study, a French corpus was collected on the basis of the alignment contrast found by Welby (2003, in press). Welby's results show that listeners use the alignment of the initial rise (LHi) in French Accentual Phrases as a cue to speech segmentation. Specifically, listeners exploit the presence of an early rise to demarcate the beginning of a content word. In the present study, a corpus was built with a set of utterances displaying this specific alignment contrast. The kinematics of 10 pellets (8 on the face and tongue, 2 references) was tracked over time using an electromagnetometer (EMA, Carstens). The phasing of several articulatory events relative to the L and H part of the early rise were examined. The preliminary results seem to point to some kind of fine alignment specification for the L and H target. Specifically, we hypothesize that tonal target commands of Neapolitan as well as French rises are phased with commands of the supralaryngeal articulator involved to produce the segments to which the tone is associated. Regarding the word segmentation issue for French, it is important to study alignment in a diachronic perspective since we know of case of speech segmentation errors that can lead to lexical reinterpretation and change (l'abondance "abundance" > la bondance, from Welby 2003). We also take these results to suggest that not all rises align in the same way with the associated syllable. Though the role of articulatory constraints is important, the exact phasing properties of prosodic events are language-specific. Since prosody has recently become the realm of investigation of the Task Dynamics program (Byrd and Saltzman 2003), our alignment work will be cast under such a perspective

    The role of time in phonetic spaces: Temporal resolution in Cantonese tone perception

    Get PDF
    The role of temporal resolution in speech perception (e.g. whether tones are parameterized with fundamental frequency sampled every 10 ms, or just twice in the syllable) is sometimes overlooked, and the temporal resolution relevant for tonal perception is still an open question. The choice of temporal resolution matters because how we understand the recognition, dispersion, and learning of phonetic categories is entirely predicated on what parameters we use to define the phonetic space that they lie in. Here, we present a tonal perception experiment in Cantonese where we used interrupted speech in trisyllabic stimuli to study the effect of temporal resolution on human tonal identification. We also performed acoustic classification of the stimuli with support vector machines. Our results show that just a few samples per syllable are enough for humans and machines to classify Cantonese tones with reasonable accuracy, without much difference in performance from having the full speech signal available. The confusion patterns and machine classification results suggest that loss of detailed information about the temporal alignment and shape of fundamental frequency contours was a major cause of decreasing accuracy as resolution decreased. Moreover, machine classification experiments show that for accurate identification of rising tones in Cantonese, it is crucial to extend the temporal window for sampling to the following syllable, due to peak delay

    Early and Late Stage Mechanisms for Vocalization Processing in the Human Auditory System

    Get PDF
    The human auditory system is able to rapidly process incoming acoustic information, actively filtering, categorizing, or suppressing different elements of the incoming acoustic stream. Vocalizations produced by other humans (conspecifics) likely represent the most ethologically-relevant sounds encountered by hearing individuals. Subtle acoustic characteristics of these vocalizations aid in determining the identity, emotional state, health, intent, etc. of the producer. The ability to assess vocalizations is likely subserved by a specialized network of structures and functional connections that are optimized for this stimulus class. Early elements of this network would show sensitivity to the most basic acoustic features of these sounds; later elements may show categorically-selective response patterns that represent high-level semantic organization of different classes of vocalizations. A combination of functional magnetic resonance imaging and electrophysiological studies were performed to investigate and describe some of the earlier and later stage mechanisms of conspecific vocalization processing in human auditory cortices. Using fMRI, cortical representations of harmonic signal content were found along the middle superior temporal gyri between primary auditory cortices along Heschl\u27s gyri and the superior temporal sulci, higher-order auditory regions. Additionally, electrophysiological findings also demonstrated a parametric response profile to harmonic signal content. Utilizing a novel class of vocalizations, human-mimicked versions of animal vocalizations, we demonstrated the presence of a left-lateralized cortical vocalization processing hierarchy to conspecific vocalizations, contrary to previous findings describing similar bilateral networks. This hierarchy originated near primary auditory cortices and was further supported by auditory evoked potential data that suggests differential temporal processing dynamics of conspecific human vocalizations versus those produced by other species. Taken together, these results suggest that there are auditory cortical networks that are highly optimized for processing utterances produced by the human vocal tract. Understanding the function and structure of these networks will be critical for advancing the development of novel communicative therapies and the design of future assistive hearing devices

    Die Verarbeitung von Eigennamen im Mandarin-Chinesischen : eine Verhaltens- und Neuroimaging-Studie

    Get PDF
    Yen H-L. Processing of proper names in Mandarin Chinese : a behavioral and neuroimaging study. Bielefeld (Germany): Bielefeld University; 2006.Proper names have been considered as a universal language class (Bright, 2003; Müller 2004). The distinction between proper names and common nouns has been postulated since several thousand years in ancient Greek and in Chinese philosophy of language (e.g., Kripke, 1972; Wu, 1997). Furthermore, this dissociation has been supported by experimental data (e.g., Müller & Kutas, 1996) and case studies (e.g., Lyons, 2002). This dissertation explores the processing of proper names in Mandarin Chinese in which the morphology of proper names and the tradition of name giving differ greatly from Indo-European languages such as German and English. It aims to figure out whether the theoretically based dissociation has a neurocognitive reality. Besides a behavior study (category decision task), we carried out an auditory and a visual functional magnetic resonance imaging (fMRI) experiment. Forty native speakers of Mandarin Chinese took part in the behavior study whereas twelve different participants participated the fMRI study. For both modalities, forty personal names, forty geographical names and forty common nouns were used as experimental stimuli. Different words were applied in the auditory and the visual experiment. The behavior study tested 20 brand names in addition. In general, Mandarin Chinese speakers of the present study were able to recognize proper names as type (here: personal names and geographical names) significantly faster than common nouns in both auditory and visual modality. In contrast, brand names did not exhibit the faster reaction time than common nouns. The reaction time of brand names was significantly longer than personal names and geographical names. In our fMRI study, processing of proper names and processing of common nouns revealed partially different brain activation patterns. Contrasts between personal names and common nouns as well as contrasts between geographical names and common nouns revealed significant activation. In auditory modality, proper names revealed more activation in bilateral anterior temporal cortices, premotor area and anterior precuneus. In visual modality, proper names evoked significant activation in the frontal lobe including frontal eye fields and premotor area. Despite of modality, common nouns yielded significant activation in the left posterior temporal cortex. Further characteristic of processing of common nouns was activation in occipital area and temporo-parietal junction. Our fMRI findings support the view that processing of proper names and common nouns involve different brain areas. This may due to the difference between general semantic processing and identity-specific semantic processing. Moreover, the corresponding cognitive mechanism of proper name processing may differ from common nouns. The present finding in Mandarin Chinese is also supported by previous research of Indo-European languages. Corresponding to the hypothesis which has been discussed in the philosophy of language over two thousand years, the special status of proper names can be supported by our neurocognitive evidence

    Loan Phonology

    Get PDF
    For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native language’s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena
    corecore