2,095 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    The effect of speech rhythm and speaking rate on assessment of pronunciation in a second language

    Get PDF
    Published online: 24 April 2019The study explores the effect of deviations from native speech rhythm and rate norms on the assessement of pronunciation mastery of a second language (L2) when the native language of the learner is either rhythmically similar to or different from the target language. Using the concatenative speech synthesis technique, different versions of the same sentence were created in order to produce segmentally and intonationally identical utterances that differed only in rhythmic patterns and/or speaking rate. Speech rhythm and tempo patterns modeled those from the speech of French or German native learners of English at different proficiency levels. Native British English speakers rated the original sentences and the synthesized utterances for accentedness. The analysis shows that (a) differences in speech rhythm and speaking tempo influence the perception of accentedness; (b) idiosyncratic differences in speech rhythm and speech rate are sufficient to differentiate between the proficiency levels of L2 learners; (c) the relative salience of rhythm and rate on perceived accentedness in L2 speech is modulated by the native language of the learners; and (d) intonation facilitates the perception of finer differences in speech rhythm between otherwise identical utterances. These results emphasize the importance of prosodic timing patterns for the perception of speech delivered by L2 learners.L.P. was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) via Juan de la Cierva fellowship. M.O. was supported by the IKERBASQUE–Basque Foundation for Science. The research institution was supported through the “Severo Ochoa” Programme for Centres/Units of Excellence in R&D (SEV-2015-490)

    PoeticTTS -- Controllable Poetry Reading for Literary Studies

    Full text link
    Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements for literary studies, we resynthesise poems by cloning prosodic values from a human reference recitation, and afterwards make use of fine-grained prosody control to manipulate the synthetic speech in a human-in-the-loop setting to alter the recitation w.r.t. specific phenomena. We find that finetuning our TTS model on poetry captures poetic intonation patterns to a large extent which is beneficial for prosody cloning and manipulation and verify the success of our approach both in an objective evaluation as well as in human studies.Comment: Accepted to Interspeech 202

    Segmental and prosodic improvements to speech generation

    Get PDF

    Correlates of linguistic rhythm in the speech signal

    Get PDF
    Spoken languages have been classified by linguists according to their rhythmic properties, and psycholinguists have relied on this classification to account for infants’ capacity to discriminate languages. Although researchers have measured many speech signal properties, they have failed to identify reliable acoustic characteristics for language classes. This paper presents instrumental measurements based on a consonant/vowel segmentation for eight languages. The measurements suggest that intuitive rhythm types reflect specific phonological properties, which in turn are signaled by the acoustic/phonetic properties of speech. The data support the notion of rhythm classes and also allow the simulation of infant language discrimination, consistent with the hypothesis that newborns rely on a coarse segmentation of speech. A hypothesis is proposed regarding the role of rhythm perception in language acquisition

    Engaging adolescents with Down syndrome in an educational video game

    Get PDF
    Producción CientíficaThis article describes the design, implementation and evaluation of an educational video game that helps individuals with Down syndrome to improve their speech skills, specifically those related to prosody. Special attention has been paid to the design of the user interface, taking into account the cognitive, learning, and attentional limitations of people with Down syndrome. The learning content is conveyed by activities of production and perception of prosodic phenomena, aimed at increasing their communicative competence. These activities are introduced within the narrative of a video game so that the players do not conceive the tool as a mere succession of learning activities, but so that they learn and improve their speech while playing. The evaluation strategy that has been followed involves real users and combines different evaluation activities. Results show a high level of acceptance by participants and also by professionals, speech therapists, and special education teachers.2018-09-01MEC-FEDER Grant TIN2014-59852-R y la Junta de Castilla y León Regional Grant VA145U1

    Directions for the future of technology in pronunciation research and teaching

    Get PDF
    This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2) pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider how these insights pave the way for researchers and developers working to create research-informed, computer-assisted pronunciation teaching resources. We conclude with predictions for future developments

    Prosodic detail in Neapolitan Italian

    Get PDF
    Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody
    corecore