3,300 research outputs found
Segmental Duration Control Based on an Articulatory Model
This paper proposes a new method that determines segmental duration for text-to-speech conversion based on the movement of articulatory organs which compose an articulatory model. The articulatory model comprises four time-variable articulatory parameters representing the conditions of articulatory organs whose physical restriction seems to significantly influence the segmental duration. The parameters are controlled according to an input sequence of phonetic symbols, following which segmental duration is determined based on the variation of the articulatory parameters. The proposed method is evaluated through an experiment using a Japanese speech database that consists of 150 phonetically balanced sentences. The results indicate that the mean square error of predicted segmental duration is approximately 15[ms] for the closed set and 15-17[ms] for the open set. The error is within 20[ms], the level of acceptability for distortion of segmental duration without loss of naturalness, and hence the method is proved to effectively predict segmental duration
Speech synthesis, Speech simulation and speech science
Speech synthesis research has been transformed in recent years through the exploitation of speech corpora - both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production
Pauses and the temporal structure of speech
Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated
Speaking Rate Effects on Locus Equation Slope
A locus equation describes a 1st order regression fit to a scatter of vowel steady-state frequency values predicting vowel onset frequency values. Locus equation coefficients are often interpreted as indices of coarticulation. Speaking rate variations with a constant consonant–vowel form are thought to induce changes in the degree of coarticulation. In the current work, the hypothesis that locus slope is a transparent index of coarticulation is examined through the analysis of acoustic samples of large-scale, nearly continuous variations in speaking rate. Following the methodological conventions for locus equation derivation, data pooled across ten vowels yield locus equation slopes that are mostly consistent with the hypothesis that locus equations vary systematically with coarticulation. Comparable analyses between different four-vowel pools reveal variations in the locus slope range and changes in locus slope sensitivity to rate change. Analyses across rate but within vowels are substantially less consistent with the locus hypothesis. Taken together, these findings suggest that the practice of vowel pooling exerts a non-negligible influence on locus outcomes. Results are discussed within the context of articulatory accounts of locus equations and the effects of speaking rate change
Feedforward and feedback control in apraxia of speech: effects of noise masking on vowel production
PURPOSE: This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. METHOD: The authors used noise masking to minimize auditory feedback during speech. Six speakers with AOS and aphasia, 4 with aphasia without AOS, and 2 groups of speakers without impairment (younger and older adults) participated. Acoustic measures of vowel contrast, variability, and duration were analyzed. RESULTS: Younger, but not older, speakers without impairment showed significantly reduced vowel contrast with noise masking. Relative to older controls, the AOS group showed longer vowel durations overall (regardless of masking condition) and a greater reduction in vowel contrast under masking conditions. There were no significant differences in variability. Three of the 6 speakers with AOS demonstrated the group pattern. Speakers with aphasia without AOS did not differ from controls in contrast, duration, or variability. CONCLUSION: The greater reduction in vowel contrast with masking noise for the AOS group is consistent with the feedforward system deficit hypothesis but not with the feedback system deficit hypothesis; however, effects were small and not present in all individual speakers with AOS. Theoretical implications and alternative interpretations of these findings are discussed.R01 DC002852 - NIDCD NIH HHS; R01 DC007683 - NIDCD NIH HH
Augev Method and an Innovative Use of Vocal Spectroscopy in Evaluating and Monitoring the Rehabilitation Path of Subjects Showing Severe Communication Pathologies
A strongly connotative element of developmental disorders (DS) is the total
or partial impairment of verbal communication and, more generally, of social
interaction. The method of Vocal-verb self-management (Augev) is a systemic organicistic method able to intervene in problems regarding verbal, spoken
and written language development successfully. This study intends to demonstrate that it is possible to objectify these progresses through a spectrographic examination of vocal signals, which detects voice phonetic-acoustic
parameters. This survey allows an objective evaluation of how effective an
educational-rehabilitation intervention is. This study was performed on a
population of 40 subjects (34 males and 6 females) diagnosed with developmental disorders (DS), specifically with a diagnosis of the autism spectrum
disorders according to the DSM-5. The 40 subjects were treated in “la Comunicazione” centers, whose headquarters are near Bari, Brindisi and Rome.
The results demonstrate a statistical significance in a correlation among the
observed variables: supervisory status, attention, general dynamic coordination, understanding and execution of orders, performing simple unshielded
rhythmic beats, word rhythm, oral praxies, phono-articulatory praxies, pronunciation of vowels, execution of graphemes, visual perception, acoustic
perception, proprioceptive sensitivity, selective attention, short-term memory, segmental coordination, performance of simple rhythmic beatings, word
rhythm, voice setting, intonation of sounds within a fifth, vowel pronunciation, consonant pronunciation, graphematic decoding, syllabic decoding,
pronunciation of caudate syllables, coding of final syllable consonant, lexical decoding, phoneme-grapheme conversion, homographic grapheme decoding,
homogeneous grapheme decoding, graphic stroke
Defective neural motor speech mappings as a source for apraxia of speech : evidence from a quantitative neural model of speech processing
This unique resource reviews research evidence pertaining to best practice in the clinical assessment of established areas such as intelligibility and physiological functioning, as well as introducing recently developed topics such as conversational analysis, participation measures, and telehealth. In addition, new and established research methods from areas such as phonetics, kinematics, imaging, and neural modeling are reviewed in relation to their applicability and value for the study of disordered speech. Based on the broad coverage of topics and methods, the textbook represents a valuable resource for a wide ranging audience, including clinicians, researchers, as well as students with an interest in speech pathology and clinical phonetics
Phonetic variability and grammatical knowledge: an articulatory study of Korean place assimilation.
The study reported here uses articulatory data to investigate Korean place assimilation
of coronal stops followed by labial or velar stops, both within words and
across words. The results show that this place-assimilation process is highly
variable, both within and across speakers, and is also sensitive to factors such as the
place of articulation of the following consonant, the presence of a word boundary
and, to some extent, speech rate. Gestures affected by the process are generally
reduced categorically (deleted), while sporadic gradient reduction of gestures is
also observed. We further compare the results for coronals to our previous findings
on the assimilation of labials, discussing implications of the results for grammatical
models of phonological/phonetic competence. The results suggest that speakers’
language-particular knowledge of place assimilation has to be relatively
detailed and context-sensitive, and has to encode systematic regularities about its
obligatory/variable application as well as categorical/gradient realisation
Context-dependent articulation of consonant gemination in Estonian
Creative Commons Attribution License (CC BY 4.0)The three-way quantity system is a well-known phonological feature of Estonian. In a number of studies it has been shown that quantity is realized in a disyllabic foot by the stressed-to-unstressed syllable rhyme duration ratio and also by pitch movement as the secondary cue. The stressed syllable rhyme duration is achieved by combining the length of the vowel and the coda consonant, which enables minimal septets of CVCV-sequences based on segmental duration. In this study we analyze articulatory (EMA) recordings from four native Estonian speakers producing all possible quantity combinations of intervocalic bilabial stops in two vocalic contexts (/alpha-i/ vs. /i-alpha/). The analysis shows that kinematic characteristics (gesture duration, spatial extent, and peak velocity) are primarily affected by quantity on the segmental level: Phonologically longer segments are produced by longer and larger lip closing gestures and, in reverse, shorter and smaller lip opening movements. Tongue transition gesture is consistently lengthened and slowed down by increasing consonant quantity. In general, both kinematic characteristics and intergestural coordination are influenced by non-linear interactions between segmental quantity levels as well as vocalic context.Peer reviewe
Temporal markers of prosodic boundaries in children's speech production
It is often thought that the ability to use prosodic features accurately is mastered in early childhood. However, research to date has produced conflicting evidence, notably about the development of children's ability to mark prosodic boundaries. This paper investigates (i) whether, by the age of eight, children use temporal boundary features in their speech in a systematic way, and (ii) to what extent adult listeners are able to interpret their production accurately and unambiguously. The material consists of minimal pairs of utterances: one utterance includes a compound noun, in which there is no prosodic boundary after the first noun, e.g. ‘coffee-cake and tea’, while the other utterance includes simple nouns, separated by a prosodic boundary, e.g. ‘coffee, cake and tea’. Ten eight-year-old children took part, and their productions were rated by 23 adult listeners. Two phonetic exponents of prosodic boundaries were analysed: pause duration and phrase-final lengthening. The results suggest that, at the age of 8, there is considerable variability among children in their ability to mark phrase boundaries of the kind analysed in the experiment, with some children failing to differentiate between the members of the minimal pairs reliably. The differences between the children in their use of boundary features were reflected in the adults' perceptual judgements. Both temporal cues to prosodic boundaries significantly affected the perceptual ratings, with pause being a more salient determinant of ratings than phrase-final lengthening
- …