1,394 research outputs found

    Continuous Interaction with a Virtual Human

    Get PDF
    Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

    Speech Processing and Prosody

    Get PDF
    International audienceThe prosody of the speech signal conveys information over the linguistic content of the message: prosody structures the utterance, and also brings information on speaker's attitude and speaker's emotion. Duration of sounds, energy and fundamental frequency are the prosodic features. However their automatic computation and usage are not obvious. Sound duration features are usually extracted from speech recognition results or from a force speech-text alignment. Although the resulting segmentation is usually acceptable on clean native speech data, performance degrades on noisy or not non-native speech. Many algorithms have been developed for computing the fundamental frequency, they lead to rather good performance on clean speech, but again, performance degrades in noisy conditions. However, in some applications, as for example in computer assisted language learning, the relevance of the prosodic features is critical; indeed, the quality of the diagnostic on the learner's pronunciation will heavily depend on the precision and reliability of the estimated prosodic parameters. The paper considers the computation of prosodic features, shows the limitations of automatic approaches, and discusses the problem of computing confidence measures on such features. Then the paper discusses the role of prosodic features and how they can be handled for automatic processing in some tasks such as the detection of discourse particles, the characterization of emotions, the classification of sentence modalities, as well as in computer assisted language learning and in expressive speech synthesis

    Impact of visual based prosody training on listening micro-skills.

    Get PDF
    El fomento de la habilidad de escucha en el aprendizaje del idioma inglés ha sido directamente acercado al desarrollo de estrategias para mejorar la escucha comprensión. Sin embargo, incluso abrazar tal habilidad receptiva en clase, el lenguaje educadores han prestado poca atención al desarrollo de las habilidades de escucha micro necesaria para permitir la comunicación genuina. Como Ylinen (2010) sostiene, la correcta la comprensión del habla idioma extranjero requiere un reconocimiento adecuado de los sonidos del habla. Con esto en mente, este estudio trata de determinar hasta qué punto formación basada visual en la prosodia de la lengua facilitaría escucha los alumnos el desarrollo de micro-habilidades. Esta investigación involucró a 29 pre-servicio de idiomas Inglés profesores de una universidad pública que se inscribieron en la fonética y por supuesto fonología. Dentro de la clase de instrucción y actividades fuera de clase eran implementado por implique la utilización del software de análisis de realimentación visual acústico.Fostering the listening skill in English language learners has been directly approached to the development of strategies for enhancing listening comprehension. However, even embracing such receptive skill in class, language educators have paid scant regard to the development of listening micro-skills needed to enable genuine communication. As Ylinen (2010) contends, the correct comprehension of foreign language speech requires an adequate recognition of the speech sounds. With this in mind, this study sought to determine to what extent visual based training in prosody of the language would facilitate learners’ listening micro-skills development. This research involved 29 pre-service English language teachers from a public university who were enrolled in the Phonetics and Phonology course. In-class instruction and outside class activities were implemented by entailing the use of the acoustic analysis software visual feedback

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Seeing sentence boundaries: the production and perception of visual markers signalling boundaries in signed languages

    Get PDF
    Current definitions of prosody present a problem for signed languages since they are based on languages that exist in the oral-aural modality. Despite this, researchers have illustrated that although signed languages are produced in a different modality, a prosodic system exists whereby a signed stream can be structured into prosodic constituents and are marked by systematic manual and non-manual phenomena (see Nespor & Sandler, 1999; Wilbur, 1999, 2000). However, there is little research examining prosody in British Sign Language (BSL). This thesis represents the first serious attempt to address this gap in the literature by investigating the type and frequency of a number of visual markers at intonational phrase (IP) boundaries in BSL narratives. An analysis of 418 IP boundaries shows linguistic visual markers are not frequently observed. The most frequent marker observed were single head movements (46%) followed by holds (30%) and brow movement (22%) and head nods (21%). This finding suggests that none of the visual markers included in this study can be considered a consistent marker to IP boundaries in BSL narratives. As well as examining the production of markers at IP boundaries, the perception of boundaries by different groups in a series of online segmentation experiments is investigated. Results from both experiments indicate that boundaries can be identified in a reliable way even when watching an unknown signed language. In addition, an analysis of responses suggests that participants identified a boundary corresponding to a discourse level (such as when a new theme is established). The results suggest that visual markers (to these boundaries at least) are informative in the absence of cues that can only be perceived by native users of a language (such as cues deriving from lexical and grammatical information). Following presentation of results, directions for future research in this area are suggested

    Automatic prosodic analysis for computer aided pronunciation teaching

    Get PDF
    Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..
    corecore