27,312 research outputs found

    A semantic and language-based representation of an environmental scene

    Get PDF
    The modeling of a landscape environment is a cognitive activity that requires appropriate spatial representations. The research presented in this paper introduces a structural and semantic categorization of a landscape view based on panoramic photographs that act as a substitute of a given natural environment. Verbal descriptions of a landscape scene provide themodeling input of our approach. This structure-based model identifies the spatial, relational, and semantic constructs that emerge from these descriptions. Concepts in the environment are qualified according to a semantic classification, their proximity and direction to the observer, and the spatial relations that qualify them. The resulting model is represented in a way that constitutes a modeling support for the study of environmental scenes, and a contribution for further research oriented to the mapping of a verbal description onto a geographical information system-based representation

    Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

    Get PDF
    Recognising objects according to a pre-defined fixed set of class labels has been well studied in the Computer Vision. There are a great many practical applications where the subjects that may be of interest are not known beforehand, or so easily delineated, however. In many of these cases natural language dialog is a natural way to specify the subject of interest, and the task achieving this capability (a.k.a, Referring Expression Comprehension) has recently attracted attention. To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs. The PLAN network has two attention mechanisms that relate parts of the expressions to both the global visual content and also directly to object candidates. Furthermore, the attention mechanisms are recurrent, making the referring process visualizable and explainable. The attended information from these dual sources are combined to reason about the referred object. These two attention mechanisms can be trained in parallel and we find the combined system outperforms the state-of-art on several benchmarked datasets with different length language input, such as RefCOCO, RefCOCO+ and GuessWhat?!.Comment: 11 page
    • …
    corecore