2,483 research outputs found

    Gesture and sign language recognition with temporal residual networks

    Get PDF

    Semi-automation of gesture annotation by machine learning and human collaboration

    Get PDF
    none6siGesture and multimodal communication researchers typically annotate video data manually, even though this can be a very time-consuming task. In the present work, a method to detect gestures is proposed as a fundamental step towards a semi-automatic gesture annotation tool. The proposed method can be applied to RGB videos and requires annotations of part of a video as input. The technique deploys a pose estimation method and active learning. In the experiment, it is shown that if about 27% of the video is annotated, the remaining parts of the video can be annotated automatically with an F-score of at least 0.85. Users can run this tool with a small number of annotations first. If the predicted annotations for the remainder of the video are not satisfactory, users can add further annotations and run the tool again. The code has been released so that other researchers and practitioners can use the results of this research. This tool has been confirmed to work in conjunction with ELAN.openIenaga, Naoto; Cravotta, Alice; Terayama, Kei; Scotney, Bryan W.; Saito, Hideo; BusĂ , M. GraziaIenaga, Naoto; Cravotta, Alice; Terayama, Kei; Scotney, Bryan W.; Saito, Hideo; BusĂ , M. Grazi

    Data-based analysis of speech and gesture: the Bielefeld Speech and Gesture Alignment corpus (SaGA) and its applications

    Get PDF
    LĂŒcking A, Bergmann K, Hahn F, Kopp S, Rieser H. Data-based analysis of speech and gesture: the Bielefeld Speech and Gesture Alignment corpus (SaGA) and its applications. Journal on Multimodal User Interfaces. 2013;7(1-2):5-18.Communicating face-to-face, interlocutors frequently produce multimodal meaning packages consisting of speech and accompanying gestures. We discuss a systematically annotated speech and gesture corpus consisting of 25 route-and-landmark-description dialogues, the Bielefeld Speech and Gesture Alignment corpus (SaGA), collected in experimental face-to-face settings. We first describe the primary and secondary data of the corpus and its reliability assessment. Then we go into some of the projects carried out using SaGA demonstrating the wide range of its usability: on the empirical side, there is work on gesture typology, individual and contextual parameters influencing gesture production and gestures’ functions for dialogue structure. Speech-gesture interfaces have been established extending unification-based grammars. In addition, the development of a computational model of speech-gesture alignment and its implementation constitutes a research line we focus on

    An XML Coding Scheme for Multimodal Corpus Annotation

    No full text
    International audienceMultimodality has become one of today's most crucial challenges both for linguistics and computer science, entailing theoretical issues as well as practical ones (verbal interaction description, human-machine dialogues, virtual reality etc...). Understanding interaction processes is one of the main targets of these sciences, and requires to take into account the whole set of modalities and the way they interact.From a linguistic standpoint, language and speech analysis are based on studies of distinct research fields, such as phonetics, phonemics, syntax, semantics, pragmatics or gesture studies. Each of them have been investigated in the past either separately or in relation with another field that was considered as closely connected (e.g. syntax and semantics, prosody and syntax, etc.). The perspective adopted by modern linguistics is a considerably broader one: even though each domain reveals a certain degree of autonomy, it cannot be accounted for independently from its interactions with the other domains. Accordingly, the study of the interaction between the fields appears to be as important as the study of each distinct field. This is a pre-requisite for an elaboration of a valid theory of language. However, as important as the needs in this area might be, high level multimodal resources and adequate methods in order to construct them are scarce and unequally developed. Ongoing projects mainly focus on one modality as a main target, with an alternate modality as an optional complement. Moreover, coding standards in this field remain very partial and do not cover all the needs in terms of multimodal annotation. One of the first issues we have to face is the definition of a coding scheme providing adequate responses to the needs of the various levels encompassed, from phonetics to pragmatics or syntax. While working in the general context of international coding standards, we plan to create a specific coding standard designed to supply proper responses to the specific needs of multimodal annotation, as available solutions in the area do not seem to be totally satisfactory. <BR /

    An XML Coding Scheme for Multimodal Corpus Annotation

    No full text
    International audienceMultimodality has become one of today's most crucial challenges both for linguistics and computer science, entailing theoretical issues as well as practical ones (verbal interaction description, human-machine dialogues, virtual reality etc...). Understanding interaction processes is one of the main targets of these sciences, and requires to take into account the whole set of modalities and the way they interact.From a linguistic standpoint, language and speech analysis are based on studies of distinct research fields, such as phonetics, phonemics, syntax, semantics, pragmatics or gesture studies. Each of them have been investigated in the past either separately or in relation with another field that was considered as closely connected (e.g. syntax and semantics, prosody and syntax, etc.). The perspective adopted by modern linguistics is a considerably broader one: even though each domain reveals a certain degree of autonomy, it cannot be accounted for independently from its interactions with the other domains. Accordingly, the study of the interaction between the fields appears to be as important as the study of each distinct field. This is a pre-requisite for an elaboration of a valid theory of language. However, as important as the needs in this area might be, high level multimodal resources and adequate methods in order to construct them are scarce and unequally developed. Ongoing projects mainly focus on one modality as a main target, with an alternate modality as an optional complement. Moreover, coding standards in this field remain very partial and do not cover all the needs in terms of multimodal annotation. One of the first issues we have to face is the definition of a coding scheme providing adequate responses to the needs of the various levels encompassed, from phonetics to pragmatics or syntax. While working in the general context of international coding standards, we plan to create a specific coding standard designed to supply proper responses to the specific needs of multimodal annotation, as available solutions in the area do not seem to be totally satisfactory. <BR /

    Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French

    Get PDF
    International audienceSeveral studies have described the links between gesture and speech in terms of timing, most of them concentrating on the production of hand gestures during speech or during pauses (Beattie & Aboudan, 1994; Nobe, 2000). Other studies have focused on the anticipation, synchronization or delay of gestures regarding their co-occurrence with speech (Schegloff, 1984; McNeill, 1992, 2005; Kipp, 2003; Loehr, 2004; Chui, 2005; Kida & Faraco, 2008; Leonard and Cummins, 2009) and we would like to participate in the debate in the present paper. We studied the timing relationships between iconic gestures and their lexical affiliates (Kipp, Neff et al., 2001) in a corpus of French conversational speech involving 6 speakers and annotated both in Praat (Boersma & Weenink, 2009) and Anvil (Kipp, 2001). The timing relationships we observed concerned the position of the gesture stroke as compared to that of the lexical affiliate and the Intonation Phrase, as well as the position of the gesture Phrase as regards that of the Intonation Phrase. The main results show that although gesture and speech are co-occurring, gestures generally start before the related speech segment

    Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French

    Get PDF
    International audienceSeveral studies have described the links between gesture and speech in terms of timing, most of them concentrating on the production of hand gestures during speech or during pauses (Beattie & Aboudan, 1994; Nobe, 2000). Other studies have focused on the anticipation, synchronization or delay of gestures regarding their co-occurrence with speech (Schegloff, 1984; McNeill, 1992, 2005; Kipp, 2003; Loehr, 2004; Chui, 2005; Kida & Faraco, 2008; Leonard and Cummins, 2009) and we would like to participate in the debate in the present paper. We studied the timing relationships between iconic gestures and their lexical affiliates (Kipp, Neff et al., 2001) in a corpus of French conversational speech involving 6 speakers and annotated both in Praat (Boersma & Weenink, 2009) and Anvil (Kipp, 2001). The timing relationships we observed concerned the position of the gesture stroke as compared to that of the lexical affiliate and the Intonation Phrase, as well as the position of the gesture Phrase as regards that of the Intonation Phrase. The main results show that although gesture and speech are co-occurring, gestures generally start before the related speech segment
    • 

    corecore