2,688 research outputs found

    A Timing Model for Fast French

    Get PDF
    Models of speech timing are of both fundamental and applied interest. At the fundamental level, the prediction of time periods occupied by syllables and segments is required for general models of speech prosody and segmental structure. At the applied level, complete models of timing are an essential component of any speech synthesis system. Previous research has established that a large number of factors influence various levels of speech timing. Statistical analysis and modelling can identify order of importance and mutual influences between such factors. In the present study, a three-tiered model was created by a modified step-wise statistical procedure. It predicts the temporal structure of French, as produced by a single, highly fluent speaker at a fast speech rate (100 phonologically balanced sentences, hand-scored in the acoustic signal). The first tier models segmental influences due to phoneme type and contextual interactions between phoneme types. The second tier models syllable-level influences of lexical vs. grammatical status of the containing word, presence of schwa and the position within the word. The third tier models utterance-final lengthening. The complete segmental-syllabic model correlated with the original corpus of 1204 syllables at an overall r = 0.846. Residuals were normally distributed. An examination of subsets of the data set revealed some variation in the closeness of fit of the model. The results are considered to be useful for an initial timing model, particularly in a speech synthesis context. However, further research is required to extend the model to other speech rates and to examine inter-speaker variability in greater detail

    Revisiting the Status of Speech Rhythm

    Get PDF
    Text-to-Speech synthesis offers an interesting manner of synthesising various knowledge components related to speech production. To a certain extent, it provides a new way of testing the coherence of our understanding of speech production in a highly systematic manner. For example, speech rhythm and temporal organisation of speech have to be well-captured in order to mimic a speaker correctly. The simulation approach used in our laboratory for two languages supports our original hypothesis of multidimensionality and non-linearity in the production of speech rhythm. This paper presents an overview of our approach towards this issue, as it has been developed over the last years. We conceive the production of speech rhythm as a multidimensional task, and the temporal organisation of speech as a key component of this task (i.e., the establishment of temporal boundaries and durations). As a result of this multidimensionality, text-to-speech systems have to accommodate a number of systematic transformations and computations at various levels. Our model of the temporal organisation of read speech in French and German emerges from a combination of quantitative and qualitative parameters, organised according to psycholinguistic and linguistic structures. (An ideal speech synthesiser would also take into account subphonemic as well as pragmatic parameters. However such systems are not yet available)

    Pauses and the temporal structure of speech

    Get PDF
    Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated

    Fast and Slow Speech Rate: a Characterisation for French

    Get PDF
    This paper is concerned with the evaluation of speech rate in French. Usually, this dynamic parameter is described as a unidimensional quantitative dimension. It is shown that the slowing down of speech has also major qualitative effects that must be taken into account. The theory on slowing down speech is thus revised

    Temporal Variability and Stability in Infant-Directed Sung Speech: Evidence for Language-specific Patterns.

    Get PDF
    In this paper, sung speech is used as a methodological tool to explore temporal variability in the timing of word-internal consonants and vowels. It is hypothesized that temporal variability/stability becomes clearer under the varying rhythmical conditions induced by song. This is explored crosslinguistically in German – a language that exhibits a potential vocalic quantity distinction – and the non-quantity languages French and Russian. Songs by non-professional singers, i.e. parents that sang to their infants aged 2 to 13 months in a non-laboratory setting, were recorded and analyzed. Vowel and consonant durations at syllable contacts of trochaic word types with ¦CVCV or ¦CVːCV structure were measured under varying rhythmical conditions. Evidence is provided that in German non-professional singing, the two syllable structures can be differentiated by two distinct temporal variability patterns: vocalic variability (and consonantal stability) was found to be dominant in ¦CVːCV structures whereas consonantal variability (and vocalic stability) was characteristic for ¦CVCV structures. In French and Russian, however, only vocalic variability seemed to apply. Additionally, findings suggest that the different temporal patterns found in German were also supported by the stability pattern at the tonal level. These results point to subtle (supra) segmental timing mechanisms in sung speech that affect temporal targets according to the specific prosodic nature of the language in question

    Comparing timing models of two Swiss German dialects

    Get PDF
    Research on dialectal varieties was for a long time concentrated on phonetic aspects of language. While there was a lot of work done on segmental aspects, suprasegmentals remained unexploited until the last few years, despite the fact that prosody was remarked as a salient aspect of dialectal variants by linguists and by naive speakers. Actual research on dialectal prosody in the German speaking area often deals with discourse analytic methods, correlating intonations curves with communicative functions (P. Auer et al. 2000, P. Gilles & R. Schrambke 2000, R. Kehrein & S. Rabanus 2001). The project I present here has another focus. It looks at general prosodic aspects, abstracted from actual situations. These global structures are modelled and integrated in a speech synthesis system. Today, mostly intonation is being investigated. However, rhythm, the temporal organisation of speech, is not a core of actual research on prosody. But there is evidence that temporal organisation is one of the main structuring elements of speech (B. Zellner 1998, B. Zellner Keller 2002). Following this approach developed for speech synthesis, I will present the modelling of the timing of two Swiss German dialects (Bernese and Zurich dialect) that are considered quite different on the prosodic level. These models are part of the project on the "development of basic knowledge for research on Swiss German prosody by means of speech synthesis modelling" founded by the Swiss National Science Foundation

    Language identification with suprasegmental cues: A study based on speech resynthesis

    Get PDF
    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and intonation (Condition 2), intonation only (Condition 3), or rhythm only (Condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well-suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output
    corecore