3,300 research outputs found

    Segmental Duration Control Based on an Articulatory Model

    Get PDF
    This paper proposes a new method that determines segmental duration for text-to-speech conversion based on the movement of articulatory organs which compose an articulatory model. The articulatory model comprises four time-variable articulatory parameters representing the conditions of articulatory organs whose physical restriction seems to significantly influence the segmental duration. The parameters are controlled according to an input sequence of phonetic symbols, following which segmental duration is determined based on the variation of the articulatory parameters. The proposed method is evaluated through an experiment using a Japanese speech database that consists of 150 phonetically balanced sentences. The results indicate that the mean square error of predicted segmental duration is approximately 15[ms] for the closed set and 15-17[ms] for the open set. The error is within 20[ms], the level of acceptability for distortion of segmental duration without loss of naturalness, and hence the method is proved to effectively predict segmental duration

    Speech synthesis, Speech simulation and speech science

    Get PDF
    Speech synthesis research has been transformed in recent years through the exploitation of speech corpora - both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production

    Pauses and the temporal structure of speech

    Get PDF
    Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated

    Speaking Rate Effects on Locus Equation Slope

    Get PDF
    A locus equation describes a 1st order regression fit to a scatter of vowel steady-state frequency values predicting vowel onset frequency values. Locus equation coefficients are often interpreted as indices of coarticulation. Speaking rate variations with a constant consonant–vowel form are thought to induce changes in the degree of coarticulation. In the current work, the hypothesis that locus slope is a transparent index of coarticulation is examined through the analysis of acoustic samples of large-scale, nearly continuous variations in speaking rate. Following the methodological conventions for locus equation derivation, data pooled across ten vowels yield locus equation slopes that are mostly consistent with the hypothesis that locus equations vary systematically with coarticulation. Comparable analyses between different four-vowel pools reveal variations in the locus slope range and changes in locus slope sensitivity to rate change. Analyses across rate but within vowels are substantially less consistent with the locus hypothesis. Taken together, these findings suggest that the practice of vowel pooling exerts a non-negligible influence on locus outcomes. Results are discussed within the context of articulatory accounts of locus equations and the effects of speaking rate change

    Feedforward and feedback control in apraxia of speech: effects of noise masking on vowel production

    Full text link
    PURPOSE: This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. METHOD: The authors used noise masking to minimize auditory feedback during speech. Six speakers with AOS and aphasia, 4 with aphasia without AOS, and 2 groups of speakers without impairment (younger and older adults) participated. Acoustic measures of vowel contrast, variability, and duration were analyzed. RESULTS: Younger, but not older, speakers without impairment showed significantly reduced vowel contrast with noise masking. Relative to older controls, the AOS group showed longer vowel durations overall (regardless of masking condition) and a greater reduction in vowel contrast under masking conditions. There were no significant differences in variability. Three of the 6 speakers with AOS demonstrated the group pattern. Speakers with aphasia without AOS did not differ from controls in contrast, duration, or variability. CONCLUSION: The greater reduction in vowel contrast with masking noise for the AOS group is consistent with the feedforward system deficit hypothesis but not with the feedback system deficit hypothesis; however, effects were small and not present in all individual speakers with AOS. Theoretical implications and alternative interpretations of these findings are discussed.R01 DC002852 - NIDCD NIH HHS; R01 DC007683 - NIDCD NIH HH

    Augev Method and an Innovative Use of Vocal Spectroscopy in Evaluating and Monitoring the Rehabilitation Path of Subjects Showing Severe Communication Pathologies

    Get PDF
    A strongly connotative element of developmental disorders (DS) is the total or partial impairment of verbal communication and, more generally, of social interaction. The method of Vocal-verb self-management (Augev) is a systemic organicistic method able to intervene in problems regarding verbal, spoken and written language development successfully. This study intends to demonstrate that it is possible to objectify these progresses through a spectrographic examination of vocal signals, which detects voice phonetic-acoustic parameters. This survey allows an objective evaluation of how effective an educational-rehabilitation intervention is. This study was performed on a population of 40 subjects (34 males and 6 females) diagnosed with developmental disorders (DS), specifically with a diagnosis of the autism spectrum disorders according to the DSM-5. The 40 subjects were treated in “la Comunicazione” centers, whose headquarters are near Bari, Brindisi and Rome. The results demonstrate a statistical significance in a correlation among the observed variables: supervisory status, attention, general dynamic coordination, understanding and execution of orders, performing simple unshielded rhythmic beats, word rhythm, oral praxies, phono-articulatory praxies, pronunciation of vowels, execution of graphemes, visual perception, acoustic perception, proprioceptive sensitivity, selective attention, short-term memory, segmental coordination, performance of simple rhythmic beatings, word rhythm, voice setting, intonation of sounds within a fifth, vowel pronunciation, consonant pronunciation, graphematic decoding, syllabic decoding, pronunciation of caudate syllables, coding of final syllable consonant, lexical decoding, phoneme-grapheme conversion, homographic grapheme decoding, homogeneous grapheme decoding, graphic stroke

    Defective neural motor speech mappings as a source for apraxia of speech : evidence from a quantitative neural model of speech processing

    Get PDF
    This unique resource reviews research evidence pertaining to best practice in the clinical assessment of established areas such as intelligibility and physiological functioning, as well as introducing recently developed topics such as conversational analysis, participation measures, and telehealth. In addition, new and established research methods from areas such as phonetics, kinematics, imaging, and neural modeling are reviewed in relation to their applicability and value for the study of disordered speech. Based on the broad coverage of topics and methods, the textbook represents a valuable resource for a wide ranging audience, including clinicians, researchers, as well as students with an interest in speech pathology and clinical phonetics

    Phonetic variability and grammatical knowledge: an articulatory study of Korean place assimilation.

    Get PDF
    The study reported here uses articulatory data to investigate Korean place assimilation of coronal stops followed by labial or velar stops, both within words and across words. The results show that this place-assimilation process is highly variable, both within and across speakers, and is also sensitive to factors such as the place of articulation of the following consonant, the presence of a word boundary and, to some extent, speech rate. Gestures affected by the process are generally reduced categorically (deleted), while sporadic gradient reduction of gestures is also observed. We further compare the results for coronals to our previous findings on the assimilation of labials, discussing implications of the results for grammatical models of phonological/phonetic competence. The results suggest that speakers’ language-particular knowledge of place assimilation has to be relatively detailed and context-sensitive, and has to encode systematic regularities about its obligatory/variable application as well as categorical/gradient realisation

    Context-dependent articulation of consonant gemination in Estonian

    Get PDF
    Creative Commons Attribution License (CC BY 4.0)The three-way quantity system is a well-known phonological feature of Estonian. In a number of studies it has been shown that quantity is realized in a disyllabic foot by the stressed-to-unstressed syllable rhyme duration ratio and also by pitch movement as the secondary cue. The stressed syllable rhyme duration is achieved by combining the length of the vowel and the coda consonant, which enables minimal septets of CVCV-sequences based on segmental duration. In this study we analyze articulatory (EMA) recordings from four native Estonian speakers producing all possible quantity combinations of intervocalic bilabial stops in two vocalic contexts (/alpha-i/ vs. /i-alpha/). The analysis shows that kinematic characteristics (gesture duration, spatial extent, and peak velocity) are primarily affected by quantity on the segmental level: Phonologically longer segments are produced by longer and larger lip closing gestures and, in reverse, shorter and smaller lip opening movements. Tongue transition gesture is consistently lengthened and slowed down by increasing consonant quantity. In general, both kinematic characteristics and intergestural coordination are influenced by non-linear interactions between segmental quantity levels as well as vocalic context.Peer reviewe

    Temporal markers of prosodic boundaries in children's speech production

    Get PDF
    It is often thought that the ability to use prosodic features accurately is mastered in early childhood. However, research to date has produced conflicting evidence, notably about the development of children's ability to mark prosodic boundaries. This paper investigates (i) whether, by the age of eight, children use temporal boundary features in their speech in a systematic way, and (ii) to what extent adult listeners are able to interpret their production accurately and unambiguously. The material consists of minimal pairs of utterances: one utterance includes a compound noun, in which there is no prosodic boundary after the first noun, e.g. ‘coffee-cake and tea’, while the other utterance includes simple nouns, separated by a prosodic boundary, e.g. ‘coffee, cake and tea’. Ten eight-year-old children took part, and their productions were rated by 23 adult listeners. Two phonetic exponents of prosodic boundaries were analysed: pause duration and phrase-final lengthening. The results suggest that, at the age of 8, there is considerable variability among children in their ability to mark phrase boundaries of the kind analysed in the experiment, with some children failing to differentiate between the members of the minimal pairs reliably. The differences between the children in their use of boundary features were reflected in the adults' perceptual judgements. Both temporal cues to prosodic boundaries significantly affected the perceptual ratings, with pause being a more salient determinant of ratings than phrase-final lengthening
    corecore