5 research outputs found
Efficient image retrieval using multi neural hash codes and bloom filters
This paper aims to deliver an efficient and modified approach for image
retrieval using multiple neural hash codes and limiting the number of queries
using bloom filters by identifying false positives beforehand. Traditional
approaches involving neural networks for image retrieval tasks tend to use
higher layers for feature extraction. But it has been seen that the activations
of lower layers have proven to be more effective in a number of scenarios. In
our approach, we have leveraged the use of local deep convolutional neural
networks which combines the powers of both the features of lower and higher
layers for creating feature maps which are then compressed using PCA and fed to
a bloom filter after binary sequencing using a modified multi k-means approach.
The feature maps obtained are further used in the image retrieval process in a
hierarchical coarse-to-fine manner by first comparing the images in the higher
layers for semantically similar images and then gradually moving towards the
lower layers searching for structural similarities. While searching, the neural
hashes for the query image are again calculated and queried in the bloom filter
which tells us whether the query image is absent in the set or maybe present.
If the bloom filter doesn't necessarily rule out the query, then it goes into
the image retrieval process. This approach can be particularly helpful in cases
where the image store is distributed since the approach supports parallel
querying.Comment: 2020 IEEE International Conference for Innovation in Technology.
Asian Journal for Convergence in Technology(AJCT) Volume VI Issue II
Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval
We address the problem of image-to-video retrieval. Given a query image, the aim is to identify the frame or scene within a collection of videos that best matches the visual input. Matching images to videos is an asymmetric task in which specific features for capturing the visual information in images and, at the same time, compacting the temporal correlation from videos are needed. Methods proposed so far are based on the temporal aggregation of hand-crafted features. In this work, we propose a deep learning architecture for learning specific asymmetric spatio-temporal embeddings for image-tovideo retrieval. Our method learns non-linear projections from training data for both images and videos and projects their visual content into a common latent space, where they can be easily compared with a standard similarity function. Experiments conducted here show that our proposed asymmetric spatio-temporal embeddings outperform stateof-the-art in standard image-to-video retrieval datasets
Spatial and temporal representations for multi-modal visual retrieval
This dissertation studies the problem of finding relevant content within a visual collection according to a specific query by addressing three key modalities: symmetric visual retrieval, asymmetric visual retrieval and cross-modal retrieval, depending on the kind of data to be processed. In symmetric visual retrieval, the query object and the elements in the collection are from the same kind of visual data, i.e. images or videos. Inspired by the human visual perception system, we propose new techniques to estimate visual similarity in image-to-image retrieval datasets based on non-metric functions, improving image retrieval performance on top of state-of-the-art methods. On the other hand, asymmetric visual retrieval is the problem in which queries and elements in the dataset are from different types of visual data. We propose methods to aggregate the temporal information of video segments so that imagevideo comparisons can be computed using similarity functions. When compared in image-to-video retrieval datasets, our algorithms drastically reduce memory storage while maintaining high accuracy rates. Finally, we introduce new solutions for cross-modal retrieval, which is the task in which either the queries or the elements in the collection are non-visual objects. In particular, we study text-image retrieval in the domain of art by introducing new models for semantic art understanding, obtaining results close to human performance. Overall, this thesis advances the state-of-the-art in visual retrieval by presenting novel solutions for some of the key tasks in the field. The contributions derived from this work have potential direct applications in the era of big data, as visual datasets are growing exponentially every day and new techniques for storing, accessing and managing large-scale visual collections are required