185 research outputs found

    Automatic prosodic analysis for computer aided pronunciation teaching

    Get PDF
    Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..

    On the Use of Wavelets and Cepstrum Excitation for Pitch Determination in Real-Time

    Get PDF
    International audienceIn the current paper, we propose a new pitch tracking technique based on a wavelet transform in the temporal domain. Our algorithm is designed to determine the pitch frequency of the speech signal using a simple voicing decision algorithm. The pitch period is extracted from the cepstrum excitation signal processed by a wavelet transform; then the pitch contour is refined by thresholding and correction algorithms without any post-processing. The results obtained show that the proposed algorithm provides very good pitch contours compared to those furnished by the Bagshaw database

    Using Deep Neural Networks for Smoothing Pitch Profiles in Connected Speech

    Get PDF
    This paper presents a new pitch tracking smoother based on deep neural networks (DNN). It leverages Long Short-Term Memories, a particular kind of recurrent neural network, for correcting pitch detection errors produced by state-of-the-art Pitch Detection Algorithms. The proposed system has been extensively tested using two reference benchmarks for English and exhibited very good performances in correcting pitch detection algorithms outputs when compared with the gold standard obtained with laryngographs

    Exploring Speech Technologies for Language Learning

    Get PDF
    The teaching of the pronunciation of any foreign language must encompass both segmental and suprasegmental aspects of speech. In computational terms, the two levels of language learning activities can be decomposed at least into phonemic aspects, which include the correct pronunciation of single phonemes and the co-articulation of phonemes into higher phonological units; as well as prosodic aspects which include  the correct position of stress at word level;  the alternation of stress and unstressed syllables in terms of compensation and vowel reduction;  the correct position of sentence accent;  the generation of the adequate rhymth from the interleaving of stress, accent, and phonological rules;  the generation of adequate intonational pattern for each utterance related to communicative functions; As appears from above, for a student to communicate intelligibly and as close as possible to native-speaker's pronunciation, prosody is very important [3]. We also assume that an incorrect prosody may hamper communication from taking place and this may be regarded a strong motivation for having the teaching of Prosody as an integral part of any language course. From our point of view it is much more important to stress the achievement of successful communication as the main objective of a second language learner rather than the overcoming of what has been termed “foreign accent”, which can be deemed as a secondary goal. In any case, the two goals are certainly not coincident even though they may be overlapping in some cases. We will discuss about these matter in the following sections. All prosodic questions related to “rhythm” will be discussed in the first section of this chapter. In [4] the author argues in favour of prosodic aids, in particular because a strong placement of word stress may impair understanding from the listener’s point of view of the word being pronounced. He also argues in favour of acquiring correct timing of phonological units to overcome the impression of “foreign accent” which may ensue from an incorrect distribution of stressed vs. unstressed stretches of linguistic units such as syllables or metric feet. Timing is not to be confused with speaking rate which need not be increased forcefully to give the impression of a good fluency: trying to increase speaking rate may result in lower intelligibility. The question of “foreign accent” is also discussed at length in (Jilka M., 1999). This work is particularly relevant as far as intonational features of a learner of a second language which we will address in the second section of this chapter. Correcting the Intonational Foreign Accent (hence IFA) is an important component of a Prosodic Module for self-learning activities, as categorical aspects of the intonation of the two languages in contact, L1 and L2 are far apart and thus neatly distinguishable. Choice of the two languages in contact is determined mainly by the fact that the distance in prosodic terms between English and Italian is maximal, according to (Ramus, F. and J. Mehler, 1999; Ramus F., et al., 1999)

    Prosodic tools for language learning

    Get PDF
    In this paper we will be concerned with the role played by prosody in language learning and by the speech technology already available as commercial product or as prototype, capable to cope with the task of helping language learner in improving their knowledge of a second language from the prosodic point of view. The paper has been divided into two separate sections: Section One, dealing with Rhythm and all related topics; Section Two dealing with Intonation. In the Introduction we will argue that the use of ASR (Automatic Speech Recognition) as Teaching Aid should be under-utilized and should be targeted to narrowly focussed spoken exercises, disallowing open-ended dialogues, in order to ensure consistency of evaluation. Eventually, we will support the conjoined use of ASR technology and prosodic tools to produce GOP useable for linguistically consistent and adequate feedback to the student. This will be illustrated by presenting State of the Art for both sections, with systems well documented in the scientific literature of the respective field. In order to discuss the scientific foundations of prosodic analysis we will present data related to English and Italian and make comparisons to clarify the issues at hand. In this context, we will also present the Prosodic Module of a courseware for computer-assisted foreign language learning called SLIM—an acronym for Multimedia Interactive Linguistic Software, developed at the University of Venice (Delmonte et al. in Convegno GFS-AIA, pp. 47–58, 1996a; Ed-Media 96, AACE, pp. 326–333, 1996b). The Prosodic Module has been created in order to deal with the problem of improving a student’s performance both in the perception and production of prosodic aspects of spoken language activities. It is composed of two different sets of Learning Activities, the first one dealing with phonetic and prosodic problems at word level and at syllable level; the second one dealing with prosodic aspects at phonological phrase and utterance suprasegmental level. The main goal of Prosodic Activities is to ensure consistent and pedagogically sound feedback to the student intending to improve his/her pronunciation in a foreign language

    Directions for the future of technology in pronunciation research and teaching

    Get PDF
    This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2) pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider how these insights pave the way for researchers and developers working to create research-informed, computer-assisted pronunciation teaching resources. We conclude with predictions for future developments

    Beszéd alapfrekvencia követés hatékony zöngésség detektålåssal

    Get PDF
    A beszĂ©djel alapfrekvenciĂĄt meghatĂĄrozĂł algoritmusok, mĂĄs nĂ©ven pitch detektorok helyes mƱködĂ©se csak Ășgy lehetsĂ©ges, ha az automatikus zöngĂ©s-zöngĂ©tlen megkĂŒlönböztetĂ©s is megbĂ­zhatĂł. Az alĂĄbbiakban ismertetjĂŒk pitch detektorunkat, melyben a zöngĂ©ssĂ©g detektĂĄlĂĄsa a konkurens mĂłdszereknĂ©l alacsonyabb hiba szĂĄzalĂ©kkal mƱködik. Algoritmusunk a jĂłl ismert autokorrelĂĄciƑs mĂłdszeren alapszik. Algoritmusunk zöngĂ©ssĂ©g detektĂĄlĂł erejĂ©t egy olyan adatbĂĄzison vizsgĂĄltuk, melyben a beszĂ©ddel szinkronban laryngogrĂĄf jelet is rögzĂ­tettek
