42,936 research outputs found

    Spatial Reasoning for Image Retrieval

    Get PDF

    Spatial Reasoning for Image Retrieval

    Get PDF

    Spatial Reasoning for Image Retrieval

    Get PDF

    A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation

    Full text link
    Over the last two decades we have witnessed strong progress on modeling visual object classes, scenes and attributes that have significantly contributed to automated image understanding. On the other hand, surprisingly little progress has been made on incorporating a spatial representation and reasoning in the inference process. In this work, we propose a pooling interpretation of spatial relations and show how it improves image retrieval and annotations tasks involving spatial language. Due to the complexity of the spatial language, we argue for a learning-based approach that acquires a representation of spatial relations by learning parameters of the pooling operator. We show improvements on previous work on two datasets and two different tasks as well as provide additional insights on a new dataset with an explicit focus on spatial relations

    Scale and Orientation-invariant Scene Similarity Metrics for Image Queries

    Get PDF
    In this paper we extend our previous work on shape-based queries to support queries on configurations of image objects. Here we consider spatial reasoning, especially directional and metric object relationships. Existing models for spatial reasoning tend to rely on pre-identified cardinal directions and minimal scale variations, assumption that cannot be considered as given in our image applications, where orientations and scale may vary substantially, and are often unknown. Accordingly, we have developed the method of varying baselines to identify similarities in direction and distance relations. Our method allows us to evaluate directional similarities without a priori knowledge of cardinal directions, and to compare distance relations even when query scene and database content differ in scale by unknown amounts. We use our method to evaluate similarity between a user-defined query scene and object configurations. Here we present this new method, and discuss its role within a broader image retrieval framework

    Structured Knowledge Representation for Image Retrieval

    Full text link
    We propose a structured approach to the problem of retrieval of images by content and present a description logic that has been devised for the semantic indexing and retrieval of images containing complex objects. As other approaches do, we start from low-level features extracted with image analysis to detect and characterize regions in an image. However, in contrast with feature-based approaches, we provide a syntax to describe segmented regions as basic objects and complex objects as compositions of basic ones. Then we introduce a companion extensional semantics for defining reasoning services, such as retrieval, classification, and subsumption. These services can be used for both exact and approximate matching, using similarity measures. Using our logical approach as a formal specification, we implemented a complete client-server image retrieval system, which allows a user to pose both queries by sketch and queries by example. A set of experiments has been carried out on a testbed of images to assess the retrieval capabilities of the system in comparison with expert users ranking. Results are presented adopting a well-established measure of quality borrowed from textual information retrieval

    Reasoning about Cardinal Directions between Extended Objects

    Get PDF
    Direction relations between extended spatial objects are important commonsense knowledge. Recently, Goyal and Egenhofer proposed a formal model, known as Cardinal Direction Calculus (CDC), for representing direction relations between connected plane regions. CDC is perhaps the most expressive qualitative calculus for directional information, and has attracted increasing interest from areas such as artificial intelligence, geographical information science, and image retrieval. Given a network of CDC constraints, the consistency problem is deciding if the network is realizable by connected regions in the real plane. This paper provides a cubic algorithm for checking consistency of basic CDC constraint networks, and proves that reasoning with CDC is in general an NP-Complete problem. For a consistent network of basic CDC constraints, our algorithm also returns a 'canonical' solution in cubic time. This cubic algorithm is also adapted to cope with cardinal directions between possibly disconnected regions, in which case currently the best algorithm is of time complexity O(n^5)

    Post processing of multimedia information - concepts, problems, and techniques

    Full text link
    Currently, most research work on multimedia information processing is focused on multimedia information storage and retrieval, especially indexing and content-based access of multimedia information. We consider multimedia information processing should include one more level-post-processing. Here &quot;post-processing&quot; means further processing of retrieved multimedia information, which includes fusion of multimedia information and reasoning with multimedia information to reach new conclusions. In this paper, the three levels of multimedia information processing storage, retrieval, and post-processing- are discussed. The concepts and problems of multimedia information post-processing are identified. Potential techniques that can be used in post-processing are suggested, By highlighting the problems in multimedia information post-processing, hopefully this paper will stimulate further research on this important but ignored topic.<br /

    A lightweight web video model with content and context descriptions for integration with linked data

    Get PDF
    The rapid increase of video data on the Web has warranted an urgent need for effective representation, management and retrieval of web videos. Recently, many studies have been carried out for ontological representation of videos, either using domain dependent or generic schemas such as MPEG-7, MPEG-4, and COMM. In spite of their extensive coverage and sound theoretical grounding, they are yet to be widely used by users. Two main possible reasons are the complexities involved and a lack of tool support. We propose a lightweight video content model for content-context description and integration. The uniqueness of the model is that it tries to model the emerging social context to describe and interpret the video. Our approach is grounded on exploiting easily extractable evolving contextual metadata and on the availability of existing data on the Web. This enables representational homogeneity and a firm basis for information integration among semantically-enabled data sources. The model uses many existing schemas to describe various ontology classes and shows the scope of interlinking with the Linked Data cloud

    SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval

    Get PDF
    This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods
    • …
    corecore