2,131 research outputs found

    Multimodal corpora for silent speech interaction

    Get PDF
    A Silent Speech Interface (SSI) allows for speech communication to take place in the absence of an acoustic signal. This type of interface is an alternative to conventional Automatic Speech Recognition which is not adequate for users with some speech impairments or in the presence of environmental noise. The work presented here produces the conditions to explore and analyze complex combinations of input modalities applicable in SSI research. By exploring non-invasive and promising modalities, we have selected the following sensing technologies used in human-computer interaction: Video and Depth input, Ultrasonic Doppler sensing and Surface Electromyography. This paper describes a novel data collection methodology where these independent streams of information are synchronously acquired with the aim of supporting research and development of a multimodal SSI. The reported recordings were divided into two rounds: a first one where the acquired data was silently uttered and a second round where speakers pronounced the scripted prompts in an audible and normal tone. In the first round of recordings, a total of 53.94 minutes were captured where 30.25% was estimated to be silent speech. In the second round of recordings, a total of 30.45 minutes were obtained and 30.05% of the recordings were audible speech.info:eu-repo/semantics/publishedVersio

    Continuous Interaction with a Virtual Human

    Get PDF
    Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

    Towards a silent speech interface for Portuguese: Surface electromyography and the nasality challenge

    Get PDF
    A Silent Speech Interface (SSI) aims at performing Automatic Speech Recognition (ASR) in the absence of an intelligible acoustic signal. It can be used as a human-computer interaction modality in high-background-noise environments, such as living rooms, or in aiding speech-impaired individuals, increasing in prevalence with ageing. If this interaction modality is made available for users own native language, with adequate performance, and since it does not rely on acoustic information, it will be less susceptible to problems related to environmental noise, privacy, information disclosure and exclusion of speech impaired persons. To contribute to the existence of this promising modality for Portuguese, for which no SSI implementation is known, we are exploring and evaluating the potential of state-of-the-art approaches. One of the major challenges we face in SSI for European Portuguese is recognition of nasality, a core characteristic of this language Phonetics and Phonology. In this paper a silent speech recognition experiment based on Surface Electromyography is presented. Results confirmed recognition problems between minimal pairs of words that only differ on nasality of one of the phones, causing 50% of the total error and evidencing accuracy performance degradation, which correlates well with the exiting knowledge.info:eu-repo/semantics/acceptedVersio

    Can ultrasonic doppler help detecting nasality for silent speech interfaces?: An exploratory analysis based on alignement of the doppler signal with velum aperture information from real-time MRI

    Get PDF
    This paper describes an exploratory analysis on the usefulness of the information made available from Ultrasonic Doppler signal data collected from a single speaker, to detect velum movement associated to European Portuguese nasal vowels. This is directly related to the unsolved problem of detecting nasality in silent speech interfaces. The applied procedure uses Real-Time Magnetic Resonance Imaging (RT-MRI), collected from the same speaker providing a method to interpret the reflected ultrasonic data. By ensuring compatible scenario conditions and proper time alignment between the Ultrasonic Doppler signal data and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement under a nasal vowel occurrence. The combination of these two sources revealed a moderate relation between the average energy of frequency bands around the carrier, indicating a probable presence of velum information in the Ultrasonic Doppler signalinfo:eu-repo/semantics/acceptedVersio

    Creating and exploiting multimodal annotated corpora

    Get PDF
    International audienceThe paper presents a project of the Laboratoire Parole et Langage which aims at collecting, annotating and exploiting a corpus of spoken French in a multimodal perspective. The project directly meets the present needs in linguistics where a growing number of researchers become aware of the fact that a theory of communication which aims at describing real interactions should take into account the complexity of these interactions. However, in order to take into account such a complexity, linguists should have access to spoken corpora annotated in different fields. The paper presents the annotation schemes used in phonetics, morphology and syntax, prosody, gestuality at the LPL together with the type of linguistic description made from the annotations seen in two examples

    An XML Coding Scheme for Multimodal Corpus Annotation

    No full text
    International audienceMultimodality has become one of today's most crucial challenges both for linguistics and computer science, entailing theoretical issues as well as practical ones (verbal interaction description, human-machine dialogues, virtual reality etc...). Understanding interaction processes is one of the main targets of these sciences, and requires to take into account the whole set of modalities and the way they interact.From a linguistic standpoint, language and speech analysis are based on studies of distinct research fields, such as phonetics, phonemics, syntax, semantics, pragmatics or gesture studies. Each of them have been investigated in the past either separately or in relation with another field that was considered as closely connected (e.g. syntax and semantics, prosody and syntax, etc.). The perspective adopted by modern linguistics is a considerably broader one: even though each domain reveals a certain degree of autonomy, it cannot be accounted for independently from its interactions with the other domains. Accordingly, the study of the interaction between the fields appears to be as important as the study of each distinct field. This is a pre-requisite for an elaboration of a valid theory of language. However, as important as the needs in this area might be, high level multimodal resources and adequate methods in order to construct them are scarce and unequally developed. Ongoing projects mainly focus on one modality as a main target, with an alternate modality as an optional complement. Moreover, coding standards in this field remain very partial and do not cover all the needs in terms of multimodal annotation. One of the first issues we have to face is the definition of a coding scheme providing adequate responses to the needs of the various levels encompassed, from phonetics to pragmatics or syntax. While working in the general context of international coding standards, we plan to create a specific coding standard designed to supply proper responses to the specific needs of multimodal annotation, as available solutions in the area do not seem to be totally satisfactory. <BR /

    An XML Coding Scheme for Multimodal Corpus Annotation

    No full text
    International audienceMultimodality has become one of today's most crucial challenges both for linguistics and computer science, entailing theoretical issues as well as practical ones (verbal interaction description, human-machine dialogues, virtual reality etc...). Understanding interaction processes is one of the main targets of these sciences, and requires to take into account the whole set of modalities and the way they interact.From a linguistic standpoint, language and speech analysis are based on studies of distinct research fields, such as phonetics, phonemics, syntax, semantics, pragmatics or gesture studies. Each of them have been investigated in the past either separately or in relation with another field that was considered as closely connected (e.g. syntax and semantics, prosody and syntax, etc.). The perspective adopted by modern linguistics is a considerably broader one: even though each domain reveals a certain degree of autonomy, it cannot be accounted for independently from its interactions with the other domains. Accordingly, the study of the interaction between the fields appears to be as important as the study of each distinct field. This is a pre-requisite for an elaboration of a valid theory of language. However, as important as the needs in this area might be, high level multimodal resources and adequate methods in order to construct them are scarce and unequally developed. Ongoing projects mainly focus on one modality as a main target, with an alternate modality as an optional complement. Moreover, coding standards in this field remain very partial and do not cover all the needs in terms of multimodal annotation. One of the first issues we have to face is the definition of a coding scheme providing adequate responses to the needs of the various levels encompassed, from phonetics to pragmatics or syntax. While working in the general context of international coding standards, we plan to create a specific coding standard designed to supply proper responses to the specific needs of multimodal annotation, as available solutions in the area do not seem to be totally satisfactory. <BR /
    corecore