19 research outputs found

    Compensating hyperarticulation for automatic speech recognition

    Get PDF

    Identifying and describing prosodic domain interaction with duration and hyperarticulation

    Get PDF
    Motivated by the ambiguities of prosodic constituency and prosodic domain interaction, this study asks whether pitch accent acts upon non-segmental features (specifically right-edge word boundaries), as well as whether or not right-edge word boundaries induce hyperarticulation in the preceeding syllable. By looking at the duration of diphthongs in both word-initial and word-final positions, my research shows that pitch accent does indeed appear to hyperarticulate word boundaries, giving evidence to prosodic interactions across different phonological domains. Additionally, with few exceptions, the data collected in this study support the hypothesis that right-edge word boundaries do not hyperarticulate preceding diphthongs. These results contribute to current discourse regarding prosodic domain interactions. Finally, this work proposes and employs a method of measuring hyperarticulation in diphthongs, a process yet unexplored, using first and second formant values

    Articulatory features for conversational speech recognition

    Get PDF

    Combining research methods for an experimental study of West Central Bavarian vowels in adults and children

    Get PDF
    The overall goal of this thesis was to systematically measure defining vowel characteristics of the West Central Bavarian (WCB) dialect for an acoustically based analysis of the Bavarian vowel system and simultaneously investigate to what extent these characteristics are being preserved across generations and if there is a sound change in progress observable in which young speakers show more characteristics of Standard German (SG) than old on some Bavarian vowel attributes. In order to address these aims we conducted acoustic recordings of WCB speaking adults and WCB speaking primary school children which were then compared to each other with an apparent-time analysis. For a more accurate view of changes in progress we combined this apparent-time comparison with longitudinal data from the WCB children, obtained at annually intervals expanding over three years. The acoustic data was enhanced by articulatory data gained from ultrasound recordings of a subset of the same WCB speaking children at two timepoints with one year interval. Analyses of the acoustic data revealed both adult/child and longitudinal changes in the direction of the standard in the children’s tendency towards a merger of two open vowels and a collapse of a long/short consonant contrast, neither of which exist in SG. There was some evidence that children in comparison with adults were beginning to develop both tensity and rounding contrasts which occur in SG but not WCB. There were no observed changes to the pattern of opening and closing diphthongs which differ markedly between the two varieties. Also, within the WCB front vowel that resulted historically from /l/-vocalization and for which articulatory data from a subset of the children was put into relation with the acoustic measures no changes were observed. The general conclusion is that WCB change is most likely to occur as a consequence of exaggerating phonetic variation that already happens to be in the direction of the standard and therefore internal factors motivated by general principles of vowel change might play a more decisive role in inducing a shift than external factors like dialect contact

    Three-dimensional point-cloud room model in room acoustics simulations

    Get PDF

    Prototype modeling of vowel perception and production in a quantity language

    Get PDF
    Vowel prototypes refer to the psychological memory representations of the best exemplars of a vowel category. This thesis examines the role of prototypes in the perception and production of Finnish short and long vowels. A comparison with German as a linguistically different language with a similar vowel system is also made. The thesis reports on a series of four experiments in which prototypes are examined by means of behavioral psychoacoustic measurements and compared with vowel productions in quiet and in noise. In the perception experiments, Finnish and German listeners were asked to identify and evaluate the goodness of synthesized vowels representing either the entire vowel space or selected subareas of the space. In the production experiments, only Finnish speakers were recruited, but earlier reported production data were used for the comparison of Finnish and German. The new concept of the weighted prototype (Pω) is introduced in Study I, and its usability in contrast to absolute prototypes (Pa) and category centroids (Pc) is examined in Study IV. Generally, the results support the finding that vowel categories are not homogenous in quality, but have an internal structure, and that there are significant quality differences between category members in terms of goodness ratings. The results of Studies I, II and III support the identity group interpretation of the Finnish quantity opposition by showing that the differences in the perceived quality and in the produced short and long vowels are not demonstrably dependent on the physical duration of the stimuli, although the production experiments in Studies I and III indicated that the short peripheral vowels, especially /u/ in Study III, are more centralized in the vowel space than the long vowels. On the basis of the results of Study II, the spectral and durational local effective vowel indicators of the initial auditory theory of vowel perception appear to be independent of each other, thus suggesting that the auditory vowel space (AVS) is orthogonal in terms of the measures used in the experiment. Furthermore, the reaction time results of Study II indicate that stimulus typicality in terms of vowel quantity affects the categorization process of quality but not its end result. The noise masking of production in Study III indicated that both of the noise types applied in the experiment, pink noise and babble noise, resulted in a prolongation of all vowel durations as reported earlier on the Lombard effect. However, the noise masking did not affect the Euclidean distances between the short and long vowels, but caused a minor systematic drift on F1–F2 space in both vowel types. The minor differences suggest that prototypes act as articulatory targets in a fire-and-forget manner without the auditory feedback affecting the immediate articulation. The results concerning the different prototype measures indicated that the Pa and Pω differ significantly from the Pc, with the Pa being most peripheral. This gives some support to the adaptive dispersion effect in perception. The individual variations of the measures were normally distributed, with some exceptions for Pa in Finnish, and were, in terms of the coefficient of variation (CV), of the order of difference limen (DL) of frequency. These results suggest that, for normally distributed prototypes, and especially for Pω, which showed the least variation, two thirds of the subjects detected the best category representatives from a subset of stimuli that lie within the limits of DL of frequency from each other in the F1–F2 space. This finding can be regarded as a strong evidence for prototype theories, in other words, the best category representatives play a role by acting as templates in vowel perception. The listeners were able to recognize quality differences between and within vowel categories, but the majority of them ranked the best category exemplars from a subset of stimuli that were hardly distinguishable from each other. There were some minor differences in the vowel systems of Finnish and German as indicated by the different prototype measures: the absolute prototypes showed the largest differences between the languages in /e/, / ø/ and /u/. This is in line with the earlier investigations on produced vowels in Finnish and German. Generally, the vowel systems of these two linguistically unrelated languages were strikingly similar, especially in the light of the Pω measure. As presented in this thesis, the prototype approach provides a feasible tool for research and the results lend support to the idea that speech comprehension on the auditory, phonetic, and even on phonological processing levels is based on the memory representations of typical speech sounds of one’s native tongue, formed during the early language acquisition phase, and these representations may be similar for the speakers and listeners of two different languages with comparable vowel systems

    Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh

    Get PDF
    The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore, those who require a computer to speak - due to surgery or a degenerative disease - are limited to unnatural-sounding voices that lack expressive control and may not match the user's gender, age or accent. It is evident that natural, personalised and controllable synthetic speech systems are required. A three-dimensional digital waveguide model of the vocal tract, based on magnetic resonance imaging data, is proposed here in order to address these issues. The model uses a heterogeneous digital waveguide mesh method to represent the vocal tract airway and surrounding tissues, facilitating dynamic movement and hence speech output. The accuracy of the method is validated by comparison with audio recordings of natural speech, and perceptual tests are performed which confirm that the proposed model sounds significantly more natural than simpler digital waveguide mesh vocal tract models. Control of such a model is also considered, and a proof-of-concept study is presented using a deep neural network to control the parameters of a two-dimensional vocal tract model, resulting in intelligible speech output and paving the way for extension of the control system to the proposed three-dimensional vocal tract model. Future improvements to the system are also discussed in detail. This project considers both the naturalness and control issues associated with synthetic speech and therefore represents a significant step towards improved synthetic speech for use across society

    Vowel nasalization in German

    Get PDF
    corecore