4,498 research outputs found

    The effect of speech rhythm and speaking rate on assessment of pronunciation in a second language

    Get PDF
    Published online: 24 April 2019The study explores the effect of deviations from native speech rhythm and rate norms on the assessement of pronunciation mastery of a second language (L2) when the native language of the learner is either rhythmically similar to or different from the target language. Using the concatenative speech synthesis technique, different versions of the same sentence were created in order to produce segmentally and intonationally identical utterances that differed only in rhythmic patterns and/or speaking rate. Speech rhythm and tempo patterns modeled those from the speech of French or German native learners of English at different proficiency levels. Native British English speakers rated the original sentences and the synthesized utterances for accentedness. The analysis shows that (a) differences in speech rhythm and speaking tempo influence the perception of accentedness; (b) idiosyncratic differences in speech rhythm and speech rate are sufficient to differentiate between the proficiency levels of L2 learners; (c) the relative salience of rhythm and rate on perceived accentedness in L2 speech is modulated by the native language of the learners; and (d) intonation facilitates the perception of finer differences in speech rhythm between otherwise identical utterances. These results emphasize the importance of prosodic timing patterns for the perception of speech delivered by L2 learners.L.P. was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) via Juan de la Cierva fellowship. M.O. was supported by the IKERBASQUE–Basque Foundation for Science. The research institution was supported through the “Severo Ochoa” Programme for Centres/Units of Excellence in R&D (SEV-2015-490)

    A prototype for a conversational companion for reminiscing about images

    Get PDF
    This work was funded by the COMPANIONS project sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434. Companions demonstrators can be seen at: http://www.dcs.shef.ac.uk/∼roberta/companions/Web/.This paper describes an initial prototype of the Companions project (www.companions-project.org): the Senior Companion (SC), designed to be a platform to display novel approaches to: (1) The use of Information Extraction (IE) techniques to extract the content of incoming dialogue utterances after an ASR phase. (2) The conversion of the input to RDF form to allow the generation of new facts from existing ones, under the control of a Dialogue Manager (DM), that also has access to stored knowledge and knowledge accessed in real time from the web, all in RDF form. (3) A DM expressed as a stack and network virtual machine that models mixed initiative in dialogue control. (4) A tuned dialogue act detector based on corpus evidence. The prototype platform was evaluated, and we describe this; it is also designed to support more extensive forms of emotion detection carried by both speech and lexical content, as well as extended forms of machine learning. We describe preliminary studies and results for these, in particular a novel approach to enabling reinforcement learning for open dialogue systems through the detection of emotion in the speech signal and its deployment as a form of a learned DM, at a higher level than the DM virtual machine and able to direct the SC’s responses to a more emotionally appropriate part of its repertoire. © 2010 Elsevier Ltd. All rights reserved.peer-reviewe

    Simulating emotional reactions in medical dramas

    Get PDF
    Presenting information on emotionally charged topics is a delicate task: if bare facts alone are conveyed, there is a risk of boring the audience, or coming across as cold and unfeeling; on the other hand, emotional presentation can be appropriate when carefully handled, but when overdone or mishandled risks being perceived as patronising or in poor taste. When Natural Language Generation (NLG) systems present emotionally charged information linguistically, by generating scripts for embodied agents, emotional/affective aspects cannot be ignored. It is important to ensure that viewers consider the presentation appropriate and sympathetic. We are investigating the role of affect in communicating medical information in the context of an NLG system that generates short medical dramas enacted by embodied agents. The dramas have both an informational and an educational purpose in that they help patients review their medical histories whilst receiving explanations of less familiar medical terms and demonstrations of their usage. The dramas are also personalised since they are generated from the patients' own medical records. We view generation of natural/appropriate emotional language as a way to engage and maintain the viewers' attention. For our medical setting, we hypothesize that viewers will consider dialogues more natural when they have an enthusiastic and sympathetic emotional tone. Our second hypothesis proposes that such dialogues are also better for engaging the viewers' attention. As well as describing our NLG system for generating natural emotional language in medical dialogue, we present a pilot study with which we investigate our two hypotheses. Our results were not quite as unequivocal as we had hoped. Firstly, our participants did notice whether a character sympathised with the patient and was enthusiastic. This did not, however, lead them to judge such a character as behaving more naturally or the dialogue as being more engaging. However, when pooling data from our two conditions, dialogues with versus dialogues without emotionally appropriate language use, we discovered, somewhat surprisingly, that participants did consider a dialogue more engaging if they believed that the characters showed sympathy towards the patient, were not cold and unfeeling, and were natural (true for the female agent only)

    Speech Perception in “Bubble” Noise: Korean Fricatives and Affricates By Native and Non-native Korean Listeners

    Full text link
    The current study examines acoustic cues used by second language learners of Korean to discriminate between Korean fricatives and affricates in noise and how these cues relate to those used by native Korean listeners. Stimuli consist of naturally-spoken consonant-vowel-consonant-vowel (CVCV) syllables: /sɑdɑ/, /s*ɑdɑ/, /tʃɑdɑ/, /tʃhɑdɑ/, and /tʃ*ɑdɑ/. In this experiment, the “bubble noise” methodology of Mandel at al. (2016) was used to identify the time-frequency locations of important cues in each utterance, i.e., where audibility of the location is significantly correlated with correct identification of the utterance in noise. Results show that non-native Korean listeners can discriminate between Korean fricatives and affricates in noise after training with the specific utterances. However, the acoustic cues used by L2 Korean listeners are different from those used by native Korean listeners. There were explicit differences in the use of the acoustic cues between the two groups for identifying tenseness. The results of this study contribute to a better understanding of how second language learners of Korean process language. Furthermore, the current study helps us to better understand how people learning a second language process speech perception in noisy environments

    The perception of isochrony and phonetic synchronisation in dubbing. : An introduction to how Spanish cinema-goers perceive French and English dubbed films in terms of the audio-visual matching experience

    Get PDF
    The McGurk-MacDonald effect explains the perception of speech as a duality separately perceived by the cognitive system. Dubbing combines two stimuli of different linguistic origin. The study is an analysis of the perception of the stimuli in speech (auditory and visual) and the dyschronies in the matching in dubbing.English and French scenes dubbed into Spanish were selected. The experiment reveals that Spanish viewers develop a great acceptance to dyschronies in dubbing. Furthermore, subjects recognised English articulatory features as more natural than those of French, even though Spanish and French share by nature more resemblance.L'efecte McGurk-MacDonald explica la percepció del discurs com una dualitat percebuda per separat en el nostre sistema cognitiu. El doblatge combina dos estímuls de diferent origen lingüístic. Aquest treball és un anàlisi de la percepció dels estímuls (auditiu i visual) i les discronies en l'adaptació d'aquests en el doblatge. Es van seleccionar escenes en anglès i francès doblades a l'espanyol. L'experiment suggereix que el públic espanyol desenvolupa una gran acceptació de les discronies en el doblatge. Altrament, els subjectes van percebre les característiques articulatòries de l'anglès més naturals que les del francès, tot i que l'espanyol i el francès comparteixen més trets per naturales

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    Fine phonetic detail and intonational meaning

    Get PDF
    International audienceThe development of theories about form-function relations in intonation should be informed by a better understanding of the dependencies that hold among different phonetic parameters. Fine phonetic detail encodes both linguistically structured meaning and paralinguistic meaning. <BR /

    Music, Language, and Rhythmic Timing

    Get PDF
    Neural, perceptual, and cognitive oscillations synchronize with rhythmic events in both speech (Luo & Poeppel, 2007) and music (Snyder & Large, 2005). This synchronization decreases perceptual thresholds to temporally predictable events (Lawrance et al., 2014), improves task performance (Ellis & Jones, 2010), and enables speech intelligibility (Peelle & Davis, 2012). Despite implications of music-language transfer effects for improving language outcomes (Gordon et al., 2015), proposals that shared neural and cognitive resources underlie music and speech rhythm perception (e.g., Tierney & Kraus, 2014) are not yet substantiated. The present research aimed to explore this potential overlap by testing whether music-induced oscillations affect metric speech tempo perception, and vice versa. We presented in each of 432 trials a prime sequence (seven repetitions of either a metric speech utterance or analogous musical phrase) followed by a standard-comparison pair (either two identical speech utterances or two identical musical phrases). Twenty-two participants judged whether the comparison was slower than, faster than, or the same tempo as the standard. We manipulated whether the prime was slower than, faster than, or the same tempo as the standard. Tempo discrimination accuracy was higher when the standard tempo was the same as, compared to slower or faster than, the prime tempo. These findings support the shared-resources view more than the independent-resources view, and they have implications for music-language transfer effects showing improvements in verbal memory (Chan et al., 1998), speech-in-noise perception (Strait et al., 2012), and reading ability in children and adults (Tierney & Kraus, 2013)
    corecore