90,713 research outputs found
Aggregating Deep Features For Image Retrieval
Measuring visual similarity between two images is useful in several multimedia applications such as visual search and image retrieval. However, measuring visual similarity between two images is an ill-posed problem which makes it a challenging task.This problem has been tackled extensively by the computer vision and machine learning communities. Nevertheless, with the recent advancements in deep learning, it is now possible to design novel image representations that allow systems to measure visual similarity more accurately than existing and widely adopted approaches, such as Fisher vectors. Unfortunately, deep-learning-based visual similarity approaches typically require post-processing stages that can be computationally expensive. To alleviate this issue, this thesis describes deep-learning-based visual image representations that allow a system to measure visual similarity without requiring post-processing stages. Specifically, this thesis describes max-pooling-based aggregation layers that combined with a convolutional-neural-network-based produce rich image representations for image retrieval without requiring an expensive post-processing stages. Moreover, the proposed max-pooling-based aggregation layers are general and can be seamlessly integrated with any existing and pre-trained networks. The experiments on large-scale image retrieval datasets confirm that the introduced image representations yield visual similarity measures that achieve a comparable or better retrieval performance than state-of-the art approaches, without requiring expensive post-processing operations
Hierarchy-based Image Embeddings for Semantic Image Retrieval
Deep neural networks trained for classification have been found to learn
powerful image representations, which are also often used for other tasks such
as comparing images w.r.t. their visual similarity. However, visual similarity
does not imply semantic similarity. In order to learn semantically
discriminative features, we propose to map images onto class embeddings whose
pair-wise dot products correspond to a measure of semantic similarity between
classes. Such an embedding does not only improve image retrieval results, but
could also facilitate integrating semantics for other tasks, e.g., novelty
detection or few-shot learning. We introduce a deterministic algorithm for
computing the class centroids directly based on prior world-knowledge encoded
in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds,
and ImageNet show that our learned semantic image embeddings improve the
semantic consistency of image retrieval results by a large margin.Comment: Accepted at WACV 2019. Source code:
https://github.com/cvjena/semantic-embedding
Learning Non-Metric Visual Similarity for Image Retrieval
Measuring visual similarity between two or more instances within a data distribution is a fundamental task in image retrieval. Theoretically, non-metric distances are able to generate a more complex and accurate similarity model than metric distances, provided that the non-linear data distribution is precisely captured by the system. In this work, we explore neural networks models for learning a non-metric similarity function for instance search. We argue that non-metric similarity functions based on neural networks can build a better model of human visual perception than standard metric distances. As our proposed similarity function is differentiable, we explore a real end-to-end trainable approach for image retrieval, i.e. we learn the weights from the input image pixels to the final similarity score. Experimental evaluation shows that non-metric similarity networks are able to learn visual similarities between images and improve performance on top of state-of-the-art image representations, boosting results in standard image retrieval datasets with respect standard metric distances
Instance-weighted Central Similarity for Multi-label Image Retrieval
Deep hashing has been widely applied to large-scale image retrieval by
encoding high-dimensional data points into binary codes for efficient
retrieval. Compared with pairwise/triplet similarity based hash learning,
central similarity based hashing can more efficiently capture the global data
distribution. For multi-label image retrieval, however, previous methods only
use multiple hash centers with equal weights to generate one centroid as the
learning target, which ignores the relationship between the weights of hash
centers and the proportion of instance regions in the image. To address the
above issue, we propose a two-step alternative optimization approach,
Instance-weighted Central Similarity (ICS), to automatically learn the center
weight corresponding to a hash code. Firstly, we apply the maximum entropy
regularizer to prevent one hash center from dominating the loss function, and
compute the center weights via projection gradient descent. Secondly, we update
neural network parameters by standard back-propagation with fixed center
weights. More importantly, the learned center weights can well reflect the
proportion of foreground instances in the image. Our method achieves the
state-of-the-art performance on the image retrieval benchmarks, and especially
improves the mAP by 1.6%-6.4% on the MS COCO dataset.Comment: 10 pages, 6 figure
- …