524 research outputs found

    The Role of Perception in Situated Spatial Reference

    Get PDF
    This position paper set out the argument that an interesting avenue of exploration and study of universals and variation in spatial reference is to address this topic in termsa of the universals in human perception and attention and to explore how these universals impact on spatial reference across cultures and languages

    Generating text descriptions for geographically distributed sensors

    Get PDF
    Sensor networks, with thousands of geographically distributed sensors and different types of quantitative measures, need software tools to help users understand the meaning of measures. In this paper we pay attention to the problem of automatic generation of geographic descriptions in natural language for geographically distributed sensors. We describe this problem in the context of a web application in the domain of hydrology which is part of a more complex multimedia presentation system that combines text and graphics. We describe the web application and the algorithm that we designed to generate the geographic descriptions for sensors. Besides GIS data files, our method uses two information sources: an online server for geographic names (Geonames) and a specific knowledge base with text patterns that we constructed to process sensor identifiers. The evaluation results confirm that online geographic information resources such as Geonames are useful to generate names for sensors but they need to be combined with other more specific information sources (such as our knowledge base) to obtain good descriptions. We also compare our method with related work and show future lines of work

    What is not where: the challenge of integrating spatial representations into deep learning architectures

    Get PDF
    This paper examines to what degree current deep learning architectures for image caption generation capture spatial language. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the captions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric relations between objects.Comment: 15 pages, 10 figures, Appears in CLASP Papers in Computational Linguistics Vol 1: Proceedings of the Conference on Logic and Machine Learning in Natural Language (LaML 2017), pp. 41-5

    What Is Not Where: the Challenge of Integrating Spatial Representations Into Deep Learning Architectures

    Get PDF
    This paper examines to what degree current deep learning architectures for image caption generation capture spatial lan- guage. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the cap- tions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric rela- tions between objects

    A perceptually based computational framework for the interpretation of spatial language

    Get PDF
    The goal of this work is to develop a semantic framework to underpin the development of natural language (NL) interfaces for 3 Dimensional (3-D) simulated environments. The thesis of this work is that the computational interpretation of language in such environments should be based on a framework that integrates a model of visual perception with a model of discourse. When interacting with a 3-D environment, users have two main goals the first is to move around in the simulated environment and the second is to manipulate objects in the environment. In order to interact with an object through language, users need to be able to refer to the object. There are many different types of referring expressions including definite descriptions, pronominals, demonstratives, one-anaphora, other-expressions, and locative-expressions Some of these expressions are anaphoric (e g , pronominals, oneanaphora, other-expressions). In order to computationally interpret these, it is necessary to develop, and implement, a discourse model. Interpreting locative expressions requires a semantic model for prepositions and a mechanism for selecting the user’s intended frame of reference. Finally, many of these expressions presuppose a visual context. In order to interpret them this context must be modelled and utilised. This thesis develops a perceptually grounded discourse-based computational model of reference resolution capable of handling anaphoric and locative expressions. There are three novel contributions in this framework a visual saliency algorithm, a semantic model for locative expressions containing projective prepositions, and a discourse model. The visual saliency algorithm grades the prominence of the objects in the user's view volume at each frame. This algorithm is based on the assumption that objects which are larger and more central to the user's view are more prominent than objects which are smaller or on the periphery of their view. The resulting saliency ratings for each frame are stored in a data structure linked to the NL system’s context model. This approach gives the system a visual memory that may be drawn upon in order to resolve references. The semantic model for locative expressions defines a computational algorithm for interpreting locatives that contain a projective preposition. Specifically, the prepositions in front of behind, to the right of, and to the left of. There are several novel components within this model. First, there is a procedure for handling the issue of frame of reference selection. Second, there is an algorithm for modelling the spatial templates of projective prepositions. This algonthm integrates a topological model with visual perceptual cues. This approach allows us to correctly define the regions described by projective preposition in the viewer-centred frame of reference, in situations that previous models (Yamada 1993, Gapp 1994a, Olivier et al 1994, Fuhr et al 1998) have found problematic. Thirdly, the abstraction used to represent the candidate trajectors of a locative expression ensures that each candidate is ascribed the highest rating possible. This approach guarantees that the candidate trajector that occupies the location with the highest applicability in the prepositions spatial template is selected as the locative’s referent. The context model extends the work of Salmon-Alt and Romary (2001) by integrating the perceptual information created by the visual saliency algonthm with a model of discourse. Moreover, the context model defines an interpretation process that provides an explicit account of how the visual and linguistic information sources are utilised when attributing a referent to a nominal expression. It is important to note that the context model provides the set of candidate referents and candidate trajectors for the locative expression interpretation algorithm. These are restncted to those objects that the user has seen. The thesis shows that visual salience provides a qualitative control in NL interpretation for 3-D simulated environments and captures interesting and significant effects such as graded judgments. Moreover, it provides an account for how object occlusion impacts on the semantics of projective prepositions that are canonically aligned with the front-back axis in the viewer-centred frame of reference

    Using Open Geographic Data to Generate Natural Language Descriptions for Hydrological Sensor Networks

    Get PDF
    Providing descriptions of isolated sensors and sensor networks in natural language, understandable by the general public, is useful to help users find relevant sensors and analyze sensor data. In this paper, we discuss the feasibility of using geographic knowledge from public databases available on the Web (such as OpenStreetMap, Geonames, or DBpedia) to automatically construct such descriptions. We present a general method that uses such information to generate sensor descriptions in natural language. The results of the evaluation of our method in a hydrologic national sensor network showed that this approach is feasible and capable of generating adequate sensor descriptions with a lower development effort compared to other approaches. In the paper we also analyze certain problems that we found in public databases (e.g., heterogeneity, non-standard use of labels, or rigid search methods) and their impact in the generation of sensor descriptions
    corecore