136 research outputs found
Acoustic articulatory evidence for quantal vowel categories : the features [low] and [back]
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2009.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 139-142).In recent years, research in human speech communication suggested that the inventory of sound units that are observed in vowels across languages is strongly influenced by the acoustic properties of the human subglottal system. That is, there is a discrete set of possible vowel features that are constrained by the interaction of the acoustic/articulatory properties of the vowels and a small set of attributes that are observed in the subglottal region. This thesis tests the hypothesis that subglottal resonances govern vowel feature boundaries for three populations: adult speakers of English; adult speakers of Korean; and children learning English. First, we explored the relations among F1 of vowels, the first subglottal resonances (SubF1) and the feature [low] in English. For the diphthong [??], F1 peaks for vowels showed an acoustic irregularity near the speaker' s SubF1. For monophthongs, analysis of F1 frequency distributions shows a boundary between [+low] and [-low] vowels at the speakers' SubF1. Second, we studied the relations among F2 of Korean vowels, SubF2 and the feature [back], to test whether the relation between subglottal resonances and the feature boundary, demonstrated earlier for English, also can be applied to other languages. Results show that the F2 boundary between [back] and [front] vowels was placed near SubF2 in Korean, as in English. Third, we explored the development of vowel formants in relation to subglottal resonances for 10 children in the age range of 2;6-3;9 years using the database of Imbrie (2005). Results show that at the earlier ages, formant values deviated from the expected relations, but during the six month period in which the measurements were made, there was considerable movement toward the expected values.(cont.)The transition to the expected relations appeared to occur by the age of 3 years for most of these children, in a developmental pattern that was inconsistent with an account in terms of simple anatomical increase. These three sets of observations provide evidence that subglottal resonances play a role in defining vowel feature boundaries, as predicted by Stevens' (1972) hypothesis that contrastive phonological features in human languages have arisen from quantal discontinuities in articulatory-acoustic space.by Youngsook Jung.Ph.D
The role of lower airway resonances in defining vowel feature contrasts.
Thesis (Ph. D.)—Harvard University--MIT Division of Health Sciences and Technology, 2006.Includes bibliographical references (p. 139-145).This electronic version was prepared by the author. The certified thesis is available in the Institute Archives and Special Collections.Ph. D
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis
Mássalhangzó-magánhangzó kapcsolatok automatikus osztályozása szubglottális rezonanciák alapján
A nemzetközi szakirodalom az elmĂşlt Ă©vekben kezdett intenzĂven foglalkozni a szubglottális rezonanciák vizsgálatával, melyek az alsĂł lĂ©gutak rezonanciái. Korábbi kutatásokban kimutatták, hogy ezek a magánhangzĂłkat termĂ©szetes osztályokra tagolják. A mássalhangzĂłmagánhangzĂł kapcsolatokban a magánhangzĂł formánsĂ©rtĂ©kei nem állandĂłak a koartikuláciĂł miatt. A zárhangok pĂ©ldául kĂ©pzĂ©si helyĂĽktĹ‘l fĂĽggĹ‘en mĂłdosĂtják a szomszĂ©dos magánhangzĂł formánsait. A mássalhangzĂł vĂ©gĂ©n Ă©s a magánhangzĂł közepĂ©n mĂ©rhetĹ‘ második formáns Ă©rtĂ©kĂ©t összevetve rajzolhatĂł meg a locus egyenlet tĂ©r, melyben az egyes beszĂ©dhangosztályok az artikuláciĂłs helyĂĽk szerint elkĂĽlönĂĽlve jelennek meg. HipotĂ©ziseink szerint a csoportok elkĂĽlönĂĽlĂ©sĂ©hez a szubglottális rezonanciák is hozzájárulnak, hasonlĂłan a magánhangzĂłkban okozott kategorikus elválasztáshoz. Jelen kutatás során egy magyar anyanyelvű beszĂ©lĹ‘ alapján tovább vizsgáljuk a mássalhangzĂł-magánhangzĂł kapcsolatok helyĂ©t a locus egyenlet tĂ©rben, valamint a szubglottális rezonanciák csoportelválasztĂł szerepĂ©t is elemezzĂĽk. Bemutatjuk egy automatikus osztályozĂł működĂ©sĂ©t, amely a szubglottális rezonanciák Ă©s a második formáns viszonya alapján csoportosĂtja a mássalhangzĂł-magánhangzĂł beszĂ©dhangkapcsolatokat
The phonetics of speech breathing : pauses, physiology, acoustics, and perception
Speech is made up of a continuous stream of speech sounds that is interrupted by pauses and breathing. As phoneticians are primarily interested in describing the segments of the speech stream, pauses and breathing are often neglected in phonetic studies, even though they are vital for speech. The present work adds to a more detailed view of both pausing and speech breathing with a special focus on the latter and the resulting breath noises, investigating their acoustic, physiological, and perceptual aspects. We present an overview of how a selection of corpora annotate pauses and pause-internal particles, as well as a recording setup that can be used for further studies on speech breathing. For pauses, this work emphasized their optionality and variability under different tempos, as well as the temporal composition of silence and breath noise in breath pauses. For breath noises, we first focused on acoustic and physiological characteristics: We explored alignment between the onsets and offsets of audible breath noises with the start and end of expansion of both rib cage and abdomen. Further, we found similarities between speech breath noises and aspiration phases of /k/, as well as that breath noises may be produced with a more open and slightly more front place of articulation than realizations of schwa. We found positive correlations between acoustic and physiological parameters, suggesting that when speakers inhale faster, the resulting breath noises were more intense and produced more anterior in the mouth. Inspecting the entire spectrum of speech breath noises, we showed relatively flat spectra and several weak peaks. These peaks largely overlapped with resonances reported for inhalations produced with a central vocal tract configuration. We used 3D-printed vocal tract models representing four vowels and four fricatives to simulate in- and exhalations by reversing airflow direction. We found the direction to not have a general effect for all models, but only for those with high-tongue configurations, as opposed to those that were more open. Then, we compared inhalations produced with the schwa-model to human inhalations in an attempt to approach the vocal tract configuration in speech breathing. There were some similarities, however, several complexities of human speech breathing not captured in the models complicated comparisons. In two perception studies, we investigated how much information listeners could auditorily extract from breath noises. First, we tested categorizing different breath noises into six different types, based on airflow direction and airway usage, e.g. oral inhalation. Around two thirds of all answers were correct. Second, we investigated how well breath noises could be used to discriminate between speakers and to extract coarse information on speaker characteristics, such as age (old/young) and sex (female/male). We found that listeners were able to distinguish between two breath noises coming from the same or different speakers in around two thirds of all cases. Hearing one breath noise, classification of sex was successful in around 64%, while for age it was 50%, suggesting that sex was more perceivable than age in breath noises.Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 418659027: "Pause-internal phonetic particles in speech communication
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
Speaker- and Age-Invariant Training for Child Acoustic Modeling Using Adversarial Multi-Task Learning
One of the major challenges in acoustic modelling of child speech is the
rapid changes that occur in the children's articulators as they grow up, their
differing growth rates and the subsequent high variability in the same age
group. These high acoustic variations along with the scarcity of child speech
corpora have impeded the development of a reliable speech recognition system
for children. In this paper, a speaker- and age-invariant training approach
based on adversarial multi-task learning is proposed. The system consists of
one generator shared network that learns to generate speaker- and age-invariant
features connected to three discrimination networks, for phoneme, age, and
speaker. The generator network is trained to minimize the
phoneme-discrimination loss and maximize the speaker- and age-discrimination
losses in an adversarial multi-task learning fashion. The generator network is
a Time Delay Neural Network (TDNN) architecture while the three discriminators
are feed-forward networks. The system was applied to the OGI speech corpora and
achieved a 13% reduction in the WER of the ASR.Comment: Submitted to ICASSP202
Optimization and automation of relative fundamental frequency for objective assessment of vocal hyperfunction
The project objective is to improve clinical assessment and diagnosis of the voice disorder, vocal hyperfunction (VH). VH is a condition characterized by excessive laryngeal and paralaryngeal tension, and is assumed to be the underlying cause of the majority of voice disorders. Current clinical assessment of VH is subjective and demonstrates poor inter-rater reliability. Recent work indicates that a new acoustic measure, relative fundamental frequency (RFF) is sensitive to the maladaptive functional behaviors associated with VH and can potentially be used to objectively characterize VH.
Here, we explored and enhanced the potential for RFF as a measure of VH in three ways. First, the current protocol for RFF estimation was optimized to simplify the recording procedure and reduce estimation time. Second, RFF was compared with the current state-of-the-art measures of VH – listener perception of vocal effort and the aerodynamic ratio of sound pressure level to subglottal pressure level. Third, an automated algorithm that utilized the optimized recording protocol was developed and validated against manual estimation methods and listener perception. This work enables large-scale studies on RFF to determine the specific physiological elements that contribute to the measure’s ability to capture VH and may potentially provide a non-invasive and readily implemented solution for this long-standing clinical issue
Recommended from our members
A novel framework for high-quality voice source analysis and synthesis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified
speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate
- …