Search CORE

392 research outputs found

Recommended from our members

A corpus-based analysis of route instructions in human-robot interaction

Author: Koulouri T
Lauria S
Publication venue: University of Ulster
Publication date: 01/01/2009
Field of study

This paper investigates how users employ spatial descriptions to navigate a speech-enabled robot. We created a simulated environment in which users gave route instructions in a dialogic real-time interaction with a robot, which was operated by naïve participants. The ability of robot monitoring was also manipulated in two experimental conditions. The results provide evidence that the content of the instructions and strategies of the users vary depending on the conditions and demands of the interaction. As expected, the route instructions frequently were underspecified and arbitrary. The findings of this study elucidate the complexity in interpreting spatial language in HRI. However, they also point to the need for endowing mobile robots with richer dialogue resources to compensate for the uncertainties arising from language as well as the environment

Brunel University Research Archive

Visual Complexity and Its Effects on Referring Expression Generation

Author: Allopenna
Ariely
Arnold
Asher
Barr
Benjamini
Benjamini
Bock
Brown-Schmidt
Clark
Clarke
Clarke
Clarke
Clarke
Clarke
Cooper
Curry
Dale
Degen
Dell
Duan
Eckstein
Ehinger
Engelhardt
Fang
Ferreira
Ferreira
Ford
Fox Tree
Frank
Fukumura
Garrod
Gatt
Gatt
Gilchrist
Griffin
Hayhoe
Klippel
Koolen
Koolen
Koolen
Krahmer
Landy
Levelt
Levelt
Meyer
Mitchell
Mitchell
Paraboni
Pechmann
Pyykknen
Reali
Richter
Rosenholtz
Rubio-Fernández
Sacks
Sedivy
Sedivy
Simon
Sonnenschein
Spain
Spivey
Tanenhaus
Tatler
Treisman
Viethen
Viethen
Vigliocco
Vogel
Waller
Whitehurst
Wolfe
Wolfe
Xu
Yuan
Publication venue: 'Wiley'
Publication date: 26/06/2017
Field of study

Speakers’ perception of a visual scene influences the language they use to describe it—which objects they choose to mention and how they characterize the relationships between them. We show that visual complexity can either delay or facilitate description generation, depending on how much disambiguating information is required and how useful the scene’s complexity can be in providing, for example, helpful landmarks. To do so, we measure speech onset times, eye gaze, and utterance content in a reference production experiment in which the target object is either unique or non-unique in a visual scene of varying size and complexity. Speakers delay speech onset if the target object is non-unique and requires disambiguation, and we argue that this reflects the cost of deciding on a high-level strategy for describing it. The eye-tracking data demonstrates that these delays increase when the speaker is able to conduct an extensive early visual search, implying that when a speaker scans too little of the scene early on, they may decide to begin speaking before becoming aware that their description is underspecified. Speak- ers’ content choices reflect the visual makeup of the scene—the number of distractors present and the availability of useful landmarks. Our results highlight the complex role of visual perception in reference production, showing that speakers can make good use of complexity in ways that reflect their visual processing of the scene

University of Essex Research Repository

Crossref

Edinburgh Research Explorer

Zoom : a corpus of natural language descriptions of map locations

Author: Altamirano Ivana Romina
Benotti Luciana
Ferreira Thiago
Paraboni Ivandré
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

This paper describes an experiment to elicit referring expressions from human subjects for research in natural language generation and related fields, and preliminary results of a computational model for the generation of these expressions. Unlike existing resources of this kind, the resulting data set -the Zoom corpus of natural language descriptions of map locations- takes into account a domain that is significantly closer to real-world applications than what has been considered in previous work, and addresses more complex situations of reference, including contexts with different levels of detail, and instances of singular and plural reference produced by speakers of Spanish and Portuguese.Fil: Altamirano, Ivana Romina. Universidad Nacional de Córdoba, Facultad de Matemática, Astronomía y Física; Argentina.Fil: Ferreira, Thiago. Universidade de São Paulo. Escola de Artes, Ciências e Humanidades; Brasil.Fil: Paraboni, Ivandré. Universidade de São Paulo. Escola de Artes, Ciências e Humanidades; Brasil.Fil: Benotti, Luciana . Universidad Nacional de Córdoba, Facultad de Matemática, Astronomía y Física; Argentina.Ciencias de la Computació

Repositorio Digital de la Universidad Nacional de Córdoba

Spatial Relations and Natural-Language Semantics for Indoor Scenes

Author: Doore Stacy A
Publication venue: DigitalCommons@UMaine
Publication date: 01/08/2017
Field of study

Over the past 15 years, there have been increased efforts to represent and communicate spatial information about entities within indoor environments. Automated annotation of information about indoor environments is needed for natural-language processing tasks, such as spatially anchoring events, tracking objects in motion, scene descriptions, and interpretation of thematic places in relationship to confirmed locations. Descriptions of indoor scenes often require a fine granularity of spatial information about the meaning of natural-language spatial utterances to improve human-computer interactions and applications for the retrieval of spatial information. The development needs of these systems provide a rationale as to why—despite an extensive body of research in spatial cognition and spatial linguistics—it is still necessary to investigate basic understandings of how humans conceptualize and communicate about objects and structures in indoor space. This thesis investigates the alignment of conceptual spatial relations and naturallanguage (NL) semantics in the representation of indoor space. The foundation of this work is grounded in spatial information theory as well as spatial cognition and spatial linguistics. In order to better understand how to align computational models and NL expressions about indoor space, this dissertation used an existing dataset of indoor scene descriptions to investigate patterns in entity identification, spatial relations, and spatial preposition use within vista-scale indoor settings. Three human-subject experiments were designed and conducted within virtual indoor environments. These experiments investigate alignment of human-subject NL expressions for a sub-set of conceptual spatial relations (contact, disjoint, and partof) within a controlled virtual environment. Each scene was designed to focus participant attention on a single relation depicted in the scene and elicit a spatial preposition term(s) to describe the focal relationship. The major results of this study are the identification of object and structure categories, spatial relationships, and patterns of spatial preposition use in the indoor scene descriptions that were consistent across both open response, closed response and ranking type items. There appeared to be a strong preference for describing scene objects in relation to the structural objects that bound the room depicted in the indoor scenes. Furthermore, for each of the three relations (contact, disjoint, and partof), a small set of spatial prepositions emerged that were strongly preferred by participants at statistically significant levels based on the overall frequency of response, image sorting, and ranking judgments. The use of certain spatial prepositions to describe relations between room structures suggests there may be differences in how indoor vista-scale space is understood in relation to tabletop and geographic scales. Finally, an indoor scene description corpus was developed as a product of this work, which should provide researchers with new human-subject based datasets for training NL algorithms used to generate more accurate and intuitive NL descriptions of indoor scenes

University of Maine

The Role of Perception in Situated Spatial Reference

Author: Kelleher John D.
Publication venue: Dublin Institute of Technology
Publication date: 03/10/2016
Field of study

This position paper set out the argument that an interesting avenue of exploration and study of universals and variation in spatial reference is to address this topic in termsa of the universals in human perception and attention and to explore how these universals impact on spatial reference across cultures and languages

Arrow@TUDublin

Augmenting Situated Spoken Language Interaction with Listener Gaze

Author: Mitev Nikolina
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

Collaborative task solving in a shared environment requires referential success. Human speakers follow the listener’s behavior in order to monitor language comprehension (Clark, 1996). Furthermore, a natural language generation (NLG) system can exploit listener gaze to realize an effective interaction strategy by responding to it with verbal feedback in virtual environments (Garoufi, Staudte, Koller, & Crocker, 2016). We augment situated spoken language interaction with listener gaze and investigate its role in human-human and human-machine interactions. Firstly, we evaluate its impact on prediction of reference resolution using a mulitimodal corpus collection from virtual environments. Secondly, we explore if and how a human speaker uses listener gaze in an indoor guidance task, while spontaneously referring to real-world objects in a real environment. Thirdly, we consider an object identification task for assembly under system instruction. We developed a multimodal interactive system and two NLG systems that integrate listener gaze in the generation mechanisms. The NLG system “Feedback” reacts to gaze with verbal feedback, either underspecified or contrastive. The NLG system “Installments” uses gaze to incrementally refer to an object in the form of installments. Our results showed that gaze features improved the accuracy of automatic prediction of reference resolution. Further, we found that human speakers are very good at producing referring expressions, and showing listener gaze did not improve performance, but elicited more negative feedback. In contrast, we showed that an NLG system that exploits listener gaze benefits the listener’s understanding. Specifically, combining a short, ambiguous instruction with con- trastive feedback resulted in faster interactions compared to underspecified feedback, and even outperformed following long, unambiguous instructions. Moreover, alternating the underspecified and contrastive responses in an interleaved manner led to better engagement with the system and an effcient information uptake, and resulted in equally good performance. Somewhat surprisingly, when gaze was incorporated more indirectly in the generation procedure and used to trigger installments, the non-interactive approach that outputs an instruction all at once was more effective. However, if the spatial expression was mentioned first, referring in gaze-driven installments was as efficient as following an exhaustive instruction. In sum, we provide a proof of concept that listener gaze can effectively be used in situated human-machine interaction. An assistance system using gaze cues is more attentive and adapts to listener behavior to ensure communicative success

Universaar

Acronym

Reference Production as Search:The Impact of Domain Size on the Production of Distinguishing Descriptions

Author: Gatt Albert
Krahmer Emiel
van Deemter Kees
van Gompel Roger P G
Publication venue: 'Wiley'
Publication date: 06/06/2016
Field of study

When producing a description of a target referent in a visual context, speakers need to choose a set of properties that distinguish it from its distractors. Computational models of language production/generation usually model this as a search process and predict that the time taken will increase both with the number of distractors in a scene and with the number of properties required to distinguish the target. These predictions are reminiscent of classic ndings in visual search; however, unlike models of reference production, visual search models also predict that search can become very e cient under certain conditions, something that reference production models do not consider. This paper investigates the predictions of these models empirically. In two experiments, we show that the time taken to plan a referring expression { as re ected by speech onset latencies { is in uenced by distractor set size and by the number of properties required, but this crucially depends on the discriminability of the properties under consideration. We discuss the implications for current models of reference production and recent work on the role of salience in visual search.peer-reviewe

OAR@UM

University of Dundee Online Publications

Tilburg University Repository

What is not where: the challenge of integrating spatial representations into deep learning architectures

Author: Dobnik Simon
Kelleher John D.
Publication venue
Publication date: 21/11/2017
Field of study

This paper examines to what degree current deep learning architectures for image caption generation capture spatial language. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the captions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric relations between objects.Comment: 15 pages, 10 figures, Appears in CLASP Papers in Computational Linguistics Vol 1: Proceedings of the Conference on Logic and Machine Learning in Natural Language (LaML 2017), pp. 41-5

arXiv.org e-Print Archive

Arrow@TUDublin

What Is Not Where: the Challenge of Integrating Spatial Representations Into Deep Learning Architectures

Author: Dobnik Simon
Kelleher John D.
Publication venue: Dublin Institute of Technology
Publication date: 21/11/2017
Field of study

This paper examines to what degree current deep learning architectures for image caption generation capture spatial lan- guage. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the cap- tions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric rela- tions between objects

Arrow@TUDublin