11,822 research outputs found
Pauses and the temporal structure of speech
Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated
Comparing timing models of two Swiss German dialects
Research on dialectal varieties was for a long time concentrated on phonetic aspects of language. While there was a lot of work done on segmental aspects, suprasegmentals remained unexploited until the last few years, despite the fact that prosody was remarked as a salient aspect of dialectal variants by linguists and by naive speakers. Actual research on dialectal prosody in the German speaking area often deals with discourse analytic methods, correlating intonations curves with communicative functions (P. Auer et al. 2000, P. Gilles & R. Schrambke 2000, R. Kehrein & S. Rabanus 2001). The project I present here has another focus. It looks at general prosodic aspects, abstracted from actual situations. These global structures are modelled and integrated in a speech synthesis system. Today, mostly intonation is being investigated. However, rhythm, the temporal organisation of speech, is not a core of actual research on prosody. But there is evidence that temporal organisation is one of the main structuring elements of speech (B. Zellner 1998, B. Zellner Keller 2002). Following this approach developed for speech synthesis, I will present the modelling of the timing of two Swiss German dialects (Bernese and Zurich dialect) that are considered quite different on the prosodic level. These models are part of the project on the "development of basic knowledge for research on Swiss German prosody by means of speech synthesis modelling" founded by the Swiss National Science Foundation
Intonation in neurogenic foreign accent syndrome
Foreign accent syndrome (FAS) is a motor speech disorder in which changes to segmental as well as suprasegmental aspects lead to the perception of a foreign accent in speech. This paper focuses on one suprasegmental aspect, namely that of intonation. It provides an in-depth analysis of the intonation system of four speakers with FAS with the aim of establishing the intonational changes that have taken place as well as their underlying origin. Using the autosegmental-metrical framework of intonational analysis, four different levels of intonation, i.e. inventory, distribution, realisation and function, were examined. Results revealed that the speakers with FAS had the same structural inventory at their disposal as the control speakers, but that they differed from the latter in relation to the distribution, implementation and functional use of their inventory. In contrast to previous findings, the current results suggest that these intonational changes cannot be entirely attributed to an underlying intonation deficit but also reflect secondary manifestations of physiological constraints affecting speech support systems and compensatory strategies. These findings have implications for the debate surrounding intonational deficits in FAS, advocating a reconsideration of current assumptions regarding the underlying nature of intonation impairment in FAS
Speech synthesis, Speech simulation and speech science
Speech synthesis research has been transformed in recent years through the exploitation of speech corpora - both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production
Recommended from our members
Time as a strand of the dance medium
Time and space are at the core of our aesthetic experiences of dance performances, yet dance has been frequently categorised as a space-based art. In this paper I revise the choreological perspective developed by Preston-Dunlop and Sánchez-Colberg that conceives dance as an embodied performative art articulated in a multistranded medium (performer, movement, sound, space). I argue that time should be allowed a distinct place in the choreological discourse since its presence is key to the expressivity of a dance piece. I conceptualise the meaning of the time strand and expose how different substrands emerge, connect with others and become expressive in dance performances. My investigation considers in particular the aesthetics of time in live performances in the theatre compared to dances created for the camera, focusing specifically on instances of contemporary transpositions from one context to the other
Playing with Cases: Rendering Expressive Music with Case-Based Reasoning
This article surveys long-term research on the problem of rendering expressive music by means of AI techniques
with an emphasis on case-based reasoning (CBR). Following a brief overview discussing why people prefer listening to
expressive music instead of nonexpressive synthesized music, we examine a representative selection of well-known approaches
to expressive computer,music performance with an emphasis on AI-related approaches. In the main part of the article we focus
on the existing CBR approaches to the problem of synthesizing expressive music, and particularly on Tempo-Express, a
case-based reasoning system developed at our Institute, for applying musically acceptable tempo transformations to
monophonic audio recordings of musical performances. Finally we briefly describe an ongoing extension of our previous work
consisting of complementing audio information with information about the gestures of the musician. Music is played through
our bodies, therefore capturing the gesture of the performer is a fundamental aspect that has to be taken into account in future
expressive music renderings. This article is based on the >2011 Robert S. Engelmore Memorial Lecture> given by the first
author at AAAI/IAAI 2011.This research is partially supported by the Ministry of Science and Innovation of Spain under the project NEXT-CBR (TIN2009-13692-C03-01) and the Generalitat de Catalunya AGAUR Grant 2009-SGR-1434Peer Reviewe
Bridging the divide : embedding voice-leading analysis in string pedagogy and performance.
Experience as a music lecturer in higher/further education and as an instrumental teacher suggests that instrumental pedagogy – focused on strings – and music analysis could usefully be brought closer together to enhance performance. The benefits of linkage include stimulating intellectual enquiry and creative interpretation, as well as honing improvisatory skills; voice-leading analysis, particularly, may even aid technical issues of pitching, fingering, shifting and bowing. This article details an experimental curriculum, entitled ‘Voice-leading for Strings’, which combines voice-leading principles with approaches to string teaching developed from Nelson, Rolland and Suzuki, supplemented by Kodály's hand-signs. Findings from informal trials at Lancaster University (1995–7), which also adapted material for other melody instruments and keyboard, strongly support this perceived symbiotic relationship
- …