407 research outputs found

    Automatic prosodic analysis for computer aided pronunciation teaching

    Get PDF
    Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..

    A Phonetic model of English intonation

    Get PDF
    This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F₀, descriptions of an utterance.It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operate. It is also argued that this model should be capable of both analysis (F₀ to phonology) and synthesis (phonology to F₀). Existing phonetic models are reviewed and it is shown that none meet the specification for the type of formal model required.A new phonetic model is presented that has three levels of description: the F₀ level, the intermediate level and the phonological level. The intermediate level uses the three basic elements of rise,fall and connection to model F₀ contours. A mathematical equation is specified for each of these elements so that a continuous lb contour can be created from a sequence of elements. The phonological system uses H and L to describe high and low pitch accents, C to describe connection elements and B to describe the rises that occur at phrase boundaries. A fully specified grammar is described which links the intermediate and F₀ levels. A grammar is specified for linking the phonological and intermediate levels, but this is only partly complete due to problems with the phonological level of description.A computer implementation of the model is described. Most of the implementation work concentrated on the relationship between the intermediate level and the F₀ level. Results are given showing that the computer analysis system labels F₀ contours quite accurately, but is significantly worse than a human labeller. It is shown that the synthesis system produces artificial F₀ contours that are very similar to naturally occurring F₀ contoursThe thesis concludes with some indications of further work and ideas on how the computer implementation of the model could be of practical benefit in speech synthesis and recognition

    Automatisation of intonation modelling and its linguistic anchoring

    Get PDF
    This paper presents a fully machine-driven approach for intonation description and its linguistic interpretation. For this purpose,a new intonation model for bottom-up F0 contour analysis and synthesis is introduced, the CoPaSul model which is designed in the tradition of parametric, contour-based, and superpositional approaches. Intonation is represented by a superposition of global and local contour classes that are derived from F0 parameterisation. These classes were linguistically anchored with respect to information status by aligning them with a text which had been coarsely analysed for this purpose by means of NLP techniques. To test the adequacy of this data-driven interpretation a perception experiment was carried out, which confirmed 80% of the findings

    Modelling prosodic and dialogue information for automatic speech recognition

    Get PDF

    Tagging Prosody and Discourse Structure in Elicited Spontaneous Speech

    Get PDF
    This paper motivates and describes the annotation and analysis of prosody and discourse structure for several large spoken language corpora. The annotation schema are of two types: tags for prosody and intonation, and tags for several aspects of discourse structure. The choice of the particular tagging schema in each domain is based in large part on the insights they provide in corpus-based studies of the relationship between discourse structure and the accenting of referring expressions in American English. We first describe these results and show that the same models account for the accenting of pronouns in an extended passage from one of the Speech Warehouse hotel-booking dialogues. We then turn to corpora described in Venditti [Ven00], which adapts the same models to Tokyo Japanese. Japanese is interesting to compare to English, because accent is lexically specified and so cannot mark discourse focus in the same way. Analyses of these corpora show that local pitch range expansion serves the analogous focusing function in Japanese. The paper concludes with a section describing several outstanding questions in the annotation of Japanese intonation which corpus studies can help to resolve.Work reported in this paper was supported in part by a grant from the Ohio State University Office of Research, to Mary E. Beckman and co-principal investigators on the OSU Speech Warehouse project, and by an Ohio State University Presidential Fellowship to Jennifer J. Venditti

    Phonetics of segmental FO and machine recognition of Korean speech

    Get PDF

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    Recognizing intonational patterns in English speech

    Get PDF
    Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 44-47).by Erin Marie Panttaja.S.B.and M.Eng
    • …
    corecore