9,303 research outputs found

    The Role of Gesture in Multimodal Referring Actions

    Get PDF
    When deictic gestures are produced on a touch screen, they can take forms which can lead to several sorts of ambiguities. Considering that the resolution of a multimodal reference requires the identification of the referents and of the context (“reference domain”) from which these referents are extracted, we focus on the linguistic, gestural, and visual clues that a dialogue system may exploit to comprehend the referring intention. We explore the links between words, gestures and perceptual groups, doing so in terms of the clues that delimit the reference domain. We also show the importance of taking the domain into account for dialogue management, particularly for the comprehension of further utterances, when they seem to implicitly use a pre-existing restriction to a subset of objects. We propose a strategy of multimodal reference resolution based on this notion of reference domain, and we illustrate its efficiency with prototypic examples built from a study of significant referring situations extracted from a corpus. We give at last the future directions of our works concerning some linguistic and task aspects that are not integrated here

    Salience and pointing in multimodal reference

    Get PDF
    Pointing combined with verbal referring is one of the most paradigmatic human multimodal behaviours. The aim of this paper is foundational: to uncover the central notions that are required for a computational model of human-generated multimodal referring acts. The paper draws on existing work on the generation of referring expressions and shows that in order to extend that work with pointing, the notion of salience needs to play a pivotal role. The paper investigates the role of salience in the generation of referring expressions and introduces a distinction between two opposing approaches: salience-first and salience-last accounts. The paper then argues that these differ not only in computational efficiency, as has been pointed out previously, but also lead to incompatible empirical predictions. The second half of the paper shows how a salience first account nicely meshes with a range of existing empirical findings on multimodal reference. A novel account of the circumstances under which speakers choose to point is proposed that directly links salience with pointing. Finally, a multidimensional model of salience is proposed to flesh this model out

    Electrophysiological and kinematic correlates of communicative intent in the planning and production of pointing gestures and speech

    Get PDF
    Acknowledgements We thank Albert Russel for assistance in setting up the experiments, and Charlotte Paulisse for help in data collection.Peer reviewedPublisher PD

    How Do I Address You? Modelling addressing behavior based on an analysis of a multi-modal corpora of conversational discourse

    Get PDF
    Addressing is a special kind of referring and thus principles of multi-modal referring expression generation will also be basic for generation of address terms and addressing gestures for conversational agents. Addressing is a special kind of referring because of the different (second person instead of object) role that the referent has in the interaction. Based on an analysis of addressing behaviour in multi-party face-to-face conversations (meetings, TV discussions as well as theater plays), we present outlines of a model for generating multi-modal verbal and non-verbal addressing behaviour for agents in multi-party interactions

    Directional adposition use in English, Swedish and Finnish

    Get PDF
    Directional adpositions such as to the left of describe where a Figure is in relation to a Ground. English and Swedish directional adpositions refer to the location of a Figure in relation to a Ground, whether both are static or in motion. In contrast, the Finnish directional adpositions edellĂ€ (in front of) and jĂ€ljessĂ€ (behind) solely describe the location of a moving Figure in relation to a moving Ground (Nikanne, 2003). When using directional adpositions, a frame of reference must be assumed for interpreting the meaning of directional adpositions. For example, the meaning of to the left of in English can be based on a relative (speaker or listener based) reference frame or an intrinsic (object based) reference frame (Levinson, 1996). When a Figure and a Ground are both in motion, it is possible for a Figure to be described as being behind or in front of the Ground, even if neither have intrinsic features. As shown by Walker (in preparation), there are good reasons to assume that in the latter case a motion based reference frame is involved. This means that if Finnish speakers would use edellĂ€ (in front of) and jĂ€ljessĂ€ (behind) more frequently in situations where both the Figure and Ground are in motion, a difference in reference frame use between Finnish on one hand and English and Swedish on the other could be expected. We asked native English, Swedish and Finnish speakers’ to select adpositions from a language specific list to describe the location of a Figure relative to a Ground when both were shown to be moving on a computer screen. We were interested in any differences between Finnish, English and Swedish speakers. All languages showed a predominant use of directional spatial adpositions referring to the lexical concepts TO THE LEFT OF, TO THE RIGHT OF, ABOVE and BELOW. There were no differences between the languages in directional adpositions use or reference frame use, including reference frame use based on motion. We conclude that despite differences in the grammars of the languages involved, and potential differences in reference frame system use, the three languages investigated encode Figure location in relation to Ground location in a similar way when both are in motion. Levinson, S. C. (1996). Frames of reference and Molyneux’s question: Crosslingiuistic evidence. In P. Bloom, M.A. Peterson, L. Nadel & M.F. Garrett (Eds.) Language and Space (pp.109-170). Massachusetts: MIT Press. Nikanne, U. (2003). How Finnish postpositions see the axis system. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space. Oxford, UK: Oxford University Press. Walker, C. (in preparation). Motion encoding in language, the use of spatial locatives in a motion context. Unpublished doctoral dissertation, University of Lincoln, Lincoln. United Kingdo

    Introduction: Multimodal interaction

    Get PDF
    That human social interaction involves the intertwined cooperation of different modalities is uncontroversial. Researchers in several allied ïŹelds have, however, only recently begun to document the precise ways in which talk, gesture, gaze, and aspects of the material surround are brought together to form coherent courses of action. The papers in this volume are attempts to develop this line of inquiry. Although the authors draw on a range of analytic, theoretical, and methodological traditions (conversation analysis, ethnography, distributed cognition, and workplace studies), all are concerned to explore and illuminate the inherently multimodal character of social interaction. Recent studies, including those collected in this volume, suggest that different modalities work together not only to elaborate the semantic content of talk but also to constitute coherent courses of action. In this introduction we present evidence for this position. We begin by reviewing some select literature focusing primarily on communicative functions and interactive organizations of speciïŹc modalities before turning to consider the integration of distinct modalities in interaction

    What and where : an empirical investigation of pointing gestures and descriptions in multimodal referring actions

    Get PDF
    Pointing gestures are pervasive in human referring actions, and are often combined with spoken descriptions. Combining gesture and speech naturally to refer to objects is an essential task in multimodal NLG systems. However, the way gesture and speech should be combined in a referring act remains an open question. In particular, it is not clear whether, in planning a pointing gesture in conjunction with a description, an NLG system should seek to minimise the redundancy between them, e.g. by letting the pointing gesture indicate locative information, with other, nonlocative properties of a referent included in the description. This question has a bearing on whether the gestural and spoken parts of referring acts are planned separately or arise from a common underlying computational mechanism. This paper investigates this question empirically, using machine-learning techniques on a new corpus of dialogues involving multimodal references to objects. Our results indicate that human pointing strategies interact with descriptive strategies. In particular, pointing gestures are strongly associated with the use of locative features in referring expressions.peer-reviewe

    Detecting Emotional Involvement in Professional News Reporters: An Analysis of Speech and Gestures

    Get PDF
    This study is aimed to investigate the extent to which reporters\u2019 voice and body behaviour may betray different degrees of emotional involvement when reporting on emergency situations. The hypothesis is that emotional involvement is associated with an increase in body movements and pitch and intensity variation. The object of investigation is a corpus of 21 10-second videos of Italian news reports on flooding taken from Italian nation-wide TV channels. The gestures and body movements of the reporters were first inspected visually. Then, measures of the reporters\u2019 pitch and intensity variations were calculated and related with the reporters' gestures. The effects of the variability in the reporters' voice and gestures were tested with an evaluation test. The results show that the reporters vary greatly in the extent to which they move their hands and body in their reportings. Two gestures seem to characterise reporters\u2019 communication of emergencies: beats and deictics. The reporters\u2019 use of gestures partially parallels the reporters\u2019 variations in pitch and intensity. The evaluation study shows that increased gesturing is associated with greater emotional involvement and less professionalism. The data was used to create an ontology of gestures for the communication of emergenc
    • 

    corecore