1,070 research outputs found

    A Timing Model for Fast French

    Get PDF
    Models of speech timing are of both fundamental and applied interest. At the fundamental level, the prediction of time periods occupied by syllables and segments is required for general models of speech prosody and segmental structure. At the applied level, complete models of timing are an essential component of any speech synthesis system. Previous research has established that a large number of factors influence various levels of speech timing. Statistical analysis and modelling can identify order of importance and mutual influences between such factors. In the present study, a three-tiered model was created by a modified step-wise statistical procedure. It predicts the temporal structure of French, as produced by a single, highly fluent speaker at a fast speech rate (100 phonologically balanced sentences, hand-scored in the acoustic signal). The first tier models segmental influences due to phoneme type and contextual interactions between phoneme types. The second tier models syllable-level influences of lexical vs. grammatical status of the containing word, presence of schwa and the position within the word. The third tier models utterance-final lengthening. The complete segmental-syllabic model correlated with the original corpus of 1204 syllables at an overall r = 0.846. Residuals were normally distributed. An examination of subsets of the data set revealed some variation in the closeness of fit of the model. The results are considered to be useful for an initial timing model, particularly in a speech synthesis context. However, further research is required to extend the model to other speech rates and to examine inter-speaker variability in greater detail

    Developing the modelling of Swedish prosody in spontaneous dialogue

    Get PDF
    The main goal of our current research is the development of the Swedish prosody model. In our analysis of discourse and dialogue intonation we are exploiting model-based resynthesis. By comparing synthesized default and fine-tuned pitch contours for dialogues under study we are able to isolate relevant intonation patterns. This analysis of intonation is related to an independent modelling of topic structure consisting of lexical-semantic analysis and text segmentation. Some results from our model-based acoustic analysis are presented, and the implementation in text-tospeech-synthesis is discussed. 1

    Removing micromelody from fundamental frequency contours

    Get PDF
    In this paper we describe a new method to diminish microprosodic components of fundamental frequency contours by applying weight functions linked to microprosodically classified phone combinations. For vowel segments in obstruent environments our algorithm outperforms standard smoothing algorithms like Moving-Average filtering, Savitzky-Golay filtering or MOMEL in diminishing F0 variations related to microprosodic factors while retaining significant differences related to macroprosody

    Prediction of intonation patterns of accented words in a corpus of read Swedish news

    Get PDF
    This paper describes an initial attempt at the construction of a data-driven model of Swedish intonation. The study is mainly concerned with model building and prediction of the intonation patterns of accented words in a corpus of read news in Swedish. Extraction of pitch information is achieved by performing a stylization of the pitch contours. The information is used to build a model for the prediction of pitch patterns using linguistic features such as accent type and position of stress. The model is tested against unseen data from the same corpus. The evaluation is done by numerical comparisons. The RMSE between predicted and original contours for the different categories ranges between 3.7 and 31.4 Hz. The results are quite promising for future studies

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Towards Hierarchical Prosodic Prominence Generation in TTS Synthesis

    Get PDF
    We address the problem of identification (from text) and generation of pitch accents in HMM-based English TTS synthesis. We show, through a large scale perceptual test, that a large improvement of the binary discrimination between pitch accented and non-accented words has no effect on the quality of the speech generated by the system. On the other side adding a third accent type that emphatically marks words that convey ”contrastive” focus (automatically identified from text) produces beneficial effects on the synthesized speech. These results support the accounts on prosodic prominence that consider the prosodic patterns of utterances as hierarchical structured and point out the limits of a flattening of such structure resulting from a simple accent/non-accent distinction. Index Terms: speech synthesis, HMM, pitch accents, focus detection 1
    corecore