5 research outputs found

    Efficient image retrieval using multi neural hash codes and bloom filters

    Full text link
    This paper aims to deliver an efficient and modified approach for image retrieval using multiple neural hash codes and limiting the number of queries using bloom filters by identifying false positives beforehand. Traditional approaches involving neural networks for image retrieval tasks tend to use higher layers for feature extraction. But it has been seen that the activations of lower layers have proven to be more effective in a number of scenarios. In our approach, we have leveraged the use of local deep convolutional neural networks which combines the powers of both the features of lower and higher layers for creating feature maps which are then compressed using PCA and fed to a bloom filter after binary sequencing using a modified multi k-means approach. The feature maps obtained are further used in the image retrieval process in a hierarchical coarse-to-fine manner by first comparing the images in the higher layers for semantically similar images and then gradually moving towards the lower layers searching for structural similarities. While searching, the neural hashes for the query image are again calculated and queried in the bloom filter which tells us whether the query image is absent in the set or maybe present. If the bloom filter doesn't necessarily rule out the query, then it goes into the image retrieval process. This approach can be particularly helpful in cases where the image store is distributed since the approach supports parallel querying.Comment: 2020 IEEE International Conference for Innovation in Technology. Asian Journal for Convergence in Technology(AJCT) Volume VI Issue II

    Bloom filters and compact hash codes for efficient and distributed image retrieval

    Get PDF

    Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval

    Get PDF
    We address the problem of image-to-video retrieval. Given a query image, the aim is to identify the frame or scene within a collection of videos that best matches the visual input. Matching images to videos is an asymmetric task in which specific features for capturing the visual information in images and, at the same time, compacting the temporal correlation from videos are needed. Methods proposed so far are based on the temporal aggregation of hand-crafted features. In this work, we propose a deep learning architecture for learning specific asymmetric spatio-temporal embeddings for image-tovideo retrieval. Our method learns non-linear projections from training data for both images and videos and projects their visual content into a common latent space, where they can be easily compared with a standard similarity function. Experiments conducted here show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-art in standard image-to-video retrieval datasets

    Spatial and temporal representations for multi-modal visual retrieval

    Get PDF
    This dissertation studies the problem of finding relevant content within a visual collection according to a specific query by addressing three key modalities: symmetric visual retrieval, asymmetric visual retrieval and cross-modal retrieval, depending on the kind of data to be processed. In symmetric visual retrieval, the query object and the elements in the collection are from the same kind of visual data, i.e. images or videos. Inspired by the human visual perception system, we propose new techniques to estimate visual similarity in image-to-image retrieval datasets based on non-metric functions, improving image retrieval performance on top of state-of-the-art methods. On the other hand, asymmetric visual retrieval is the problem in which queries and elements in the dataset are from different types of visual data. We propose methods to aggregate the temporal information of video segments so that imagevideo comparisons can be computed using similarity functions. When compared in image-to-video retrieval datasets, our algorithms drastically reduce memory storage while maintaining high accuracy rates. Finally, we introduce new solutions for cross-modal retrieval, which is the task in which either the queries or the elements in the collection are non-visual objects. In particular, we study text-image retrieval in the domain of art by introducing new models for semantic art understanding, obtaining results close to human performance. Overall, this thesis advances the state-of-the-art in visual retrieval by presenting novel solutions for some of the key tasks in the field. The contributions derived from this work have potential direct applications in the era of big data, as visual datasets are growing exponentially every day and new techniques for storing, accessing and managing large-scale visual collections are required
    corecore