152 research outputs found

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    A syllable-based investigation of coarticulation

    Get PDF
    Coarticulation has been long investigated in Speech Sciences and Linguistics (Kühnert & Nolan, 1999). This thesis explores coarticulation through a syllable based model (Y. Xu, 2020). First, it is hypothesised that consonant and vowel are synchronised at the syllable onset for the sake of reducing temporal degrees of freedom, and such synchronisation is the essence of coarticulation. Previous efforts in the examination of CV alignment mainly report onset asynchrony (Gao, 2009; Shaw & Chen, 2019). The first study of this thesis tested the synchrony hypothesis using articulatory and acoustic data in Mandarin. Departing from conventional approaches, a minimal triplet paradigm was applied, in which the CV onsets were determined through the consonant and vowel minimal pairs, respectively. Both articulatory and acoustical results showed that CV articulation started in close temporal proximity, supporting the synchrony hypothesis. The second study extended the research to English and syllables with cluster onsets. By using acoustic data in conjunction with Deep Learning, supporting evidence was found for co-onset, which is in contrast to the widely reported c-center effect (Byrd, 1995). Secondly, the thesis investigated the mechanism that can maximise synchrony – Dimension Specific Sequential Target Approximation (DSSTA), which is highly relevant to what is commonly known as coarticulation resistance (Recasens & Espinosa, 2009). Evidence from the first two studies show that, when conflicts arise due to articulation requirements between CV, the CV gestures can be fulfilled by the same articulator on separate dimensions simultaneously. Last but not least, the final study tested the hypothesis that resyllabification is the result of coarticulation asymmetry between onset and coda consonants. It was found that neural network based models could infer syllable affiliation of consonants, and those inferred resyllabified codas had similar coarticulatory structure with canonical onset consonants. In conclusion, this thesis found that many coarticulation related phenomena, including local vowel to vowel anticipatory coarticulation, coarticulation resistance, and resyllabification, stem from the articulatory mechanism of the syllable

    The analysis of breathing and rhythm in speech

    Get PDF
    Speech rhythm can be described as the temporal patterning by which speech events, such as vocalic onsets, occur. Despite efforts to quantify and model speech rhythm across languages, it remains a scientifically enigmatic aspect of prosody. For instance, one challenge lies in determining how to best quantify and analyse speech rhythm. Techniques range from manual phonetic annotation to the automatic extraction of acoustic features. It is currently unclear how closely these differing approaches correspond to one another. Moreover, the primary means of speech rhythm research has been the analysis of the acoustic signal only. Investigations of speech rhythm may instead benefit from a range of complementary measures, including physiological recordings, such as of respiratory effort. This thesis therefore combines acoustic recording with inductive plethysmography (breath belts) to capture temporal characteristics of speech and speech breathing rhythms. The first part examines the performance of existing phonetic and algorithmic techniques for acoustic prosodic analysis in a new corpus of rhythmically diverse English and Mandarin speech. The second part addresses the need for an automatic speech breathing annotation technique by developing a novel function that is robust to the noisy plethysmography typical of spontaneous, naturalistic speech production. These methods are then applied in the following section to the analysis of English speech and speech breathing in a second, larger corpus. Finally, behavioural experiments were conducted to investigate listeners' perception of speech breathing using a novel gap detection task. The thesis establishes the feasibility, as well as limits, of automatic methods in comparison to manual annotation. In the speech breathing corpus analysis, they help show that speakers maintain a normative, yet contextually adaptive breathing style during speech. The perception experiments in turn demonstrate that listeners are sensitive to the violation of these speech breathing norms, even if unconsciously so. The thesis concludes by underscoring breathing as a necessary, yet often overlooked, component in speech rhythm planning and production

    Re-examining Phonological and Lexical Correlates of Second Language Comprehensibility:The Role of Rater Experience

    Get PDF
    Few researchers and teachers would disagree that some linguistic aspects of second language (L2) speech are more crucial than others for successful communication. Underlying this idea is the assumption that communicative success can be broadly defined in terms of speakers’ ability to convey the intended meaning to the interlocutor, which is frequently captured through a listener-based rating of comprehensibility or ease of understanding (e.g. Derwing & Munro, 2009; Levis, 2005). Previous research has shown that communicative success – for example, as defined through comprehensible L2 speech – depends on several linguistic dimensions of L2 output, including its segmental and suprasegmental pronunciation, fluency-based characteristics, lexical and grammatical content, as well as discourse structure (e.g. Field, 2005; Hahn, 2004; Kang et al., 2010; Trofimovich & Isaacs, 2012). Our chief objective in the current study was to explore the L2 comprehensibility construct from a language assessment perspective (e.g. Isaacs & Thomson, 2013), by targeting rater experience as a possible source of variance influencing the degree to which raters use various characteristics of speech in judging L2 comprehensibility. In keeping with this objective, we asked the following question: What is the extent to which linguistic aspects of L2 speech contributing to comprehensibility ratings depend on raters’ experience

    A Companion to Language and Linguistics

    Get PDF
    The study guide contains educational material on the main topics of the course of theoretical linguistics: the nature of language and the central branches and approaches of language study; the phonetic, morphological, syntactic, semantic, pragmatic, and sociolinguistic aspects of English; theories of language acquisition; writing system types and historical linguistics. Questions and practical tasks for each unit provide an opportunity for self-study of educational material. Meant for students, graduate students, teachers, and all interested in language and linguistics

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Determining normal and abnormal lip shapes during movement for use as a surgical outcome measure

    Get PDF
    Craniofacial assessment for diagnosis, treatment planning and outcome has traditionally relied on imaging techniques that provide a static image of the facial structure. Objective measures of facial movement are however becoming increasingly important for clinical interventions where surgical repositioning of facial structures can influence soft tissue mobility. These applications include the management of patients with cleft lip, facial nerve palsy and orthognathic surgery. Although technological advances in medical imaging have now enabled three-dimensional (3D) motion scanners to become commercially available their clinical application to date has been limited. Therefore, the aim of this study is to determine normal and abnormal lip shapes during movement for use as a clinical outcome measure using such a scanner. Lip movements were captured from an average population using a 3D motion scanner. Consideration was given to the type of facial movement captured (i.e. verbal or non-verbal) and also the method of feature extraction (i.e. manual or semi-automatic landmarking). Statistical models of appearance (Active Shape Models) were used to convert the video motion sequences into linear data and identify reproducible facial movements via pattern recognition. Average templates of lip movement were created based on the most reproducible lip movements using Geometric Morphometrics (GMM) incorporating Generalised Procrustes Analysis (GPA) and Principal Component Analysis (PCA). Finally lip movement data from a patient group undergoing orthognathic surgery was incorporated into the model and Discriminant Analysis (DA) employed in an attempt to statistically distinguish abnormal lip movement. The results showed that manual landmarking was the preferred method of feature extraction. Verbal facial gestures (i.e. words) were significantly more reproducible/repeatable over time when compared to non-verbal gestures (i.e. facial expressions). It was possible to create average templates of lip movement from the control group, which acted as an outcome measure, and from which abnormalities in movement could be discriminated pre-surgery. These abnormalities were found to normalise post-surgery. The concepts of this study form the basis of analysing facial movement in the clinical context. The methods are transferrable to other patient groups. Specifically, patients undergoing orthognathic surgery have differences in lip shape/movement when compared to an average population. Correcting the position of the basal bones in this group of patients appears to normalise lip mobility

    Studying dialects to understand human language

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (leaves 65-71).This thesis investigates the study of dialect variations as a way to understand how humans might process speech. It evaluates some of the important research in dialect identification and draws conclusions about how their results can give insights into human speech processing. A study clustering dialects using k-means clustering is done. Self-organizing maps are proposed as a tool for dialect research, and a self-organizing map is implemented for the purposes of testing this. Several areas for further research are identified, including how dialects are stored in the brain, more detailed descriptions of how dialects vary, including contextual effects, and more sophisticated visualization tools. Keywords: dialect, accent, identification, recognition, self-organizing maps, words, lexical sets, clustering.by Akua Afriyie Nti.M.Eng

    The Attentional Control of Reading: Insights from Behavior, Imaging and Development

    Get PDF
    The process by which the initially attention-requiring task of transforming scribbles into meaningful concepts eventually becomes facile remains a central riddle of cognitive neuroscience. This body of work represents an effort to provide forward movement in answering the question of how attentional control mediates the process of reading, both by considering different stages of reading competence (development) and by seeking convergence between types of evidence (behavior and imaging). Inspired by a study published by Balota and colleagues in 2000, the paradigm used throughout this work involves comparing a simple speeded reading task vs. a regularize ( sound out ) task (Balota et al. 2000). In the first data chapter, I replicate the essential findings of the Balota et al. study in 2 young adult cohorts, confirming that stimulus characteristics, including lexicality and frequency, influence reading task performance in a manner that is modulated by top-down attentional control. I furthermore argue that the reaction time (RT) patterns are consistent with 2 distinct mechanisms by which top-down attentional control interacts with reading processes, pathway control and response checking. I then present evidence, motivated by the 2-mechanism hypothesis, that 2 sets of brain regions, including members of previously defined attentional control networks, show separable activity patterns that map nicely onto roles reflecting pathway control and response checking. In the second data chapter, I show that 8-10 year old children, like young adults, can perform the regularize task. Unexpectedly, the early readers are faster than the experienced readers to regularize, and this speed advantage for children holds for both words and pseudowords. Because children are slower than adults across a range of cognitive tasks (e.g., Kail 1991) - with children showing particular immaturity with regard to inhibiting prepotent responses (e.g., Davidson et al. 2006) - the developmental observation is remarkable in and of itself. Complemented by a cadre of post hoc analyses, the age groups differences can also be interpreted as additional support for the 2-mechanism interaction of attention and reading. Together, these results suggest that dissociable subcomponents of attentional control interact with subcomponents of reading processing, and that these interactions are dynamic across skill development and across task demands
    corecore