29,252 research outputs found
Unsupervised Triplet Hashing for Fast Image Retrieval
Hashing has played a pivotal role in large-scale image retrieval. With the
development of Convolutional Neural Network (CNN), hashing learning has shown
great promise. But existing methods are mostly tuned for classification, which
are not optimized for retrieval tasks, especially for instance-level retrieval.
In this study, we propose a novel hashing method for large-scale image
retrieval. Considering the difficulty in obtaining labeled datasets for image
retrieval task in large scale, we propose a novel CNN-based unsupervised
hashing method, namely Unsupervised Triplet Hashing (UTH). The unsupervised
hashing network is designed under the following three principles: 1) more
discriminative representations for image retrieval; 2) minimum quantization
loss between the original real-valued feature descriptors and the learned hash
codes; 3) maximum information entropy for the learned hash codes. Extensive
experiments on CIFAR-10, MNIST and In-shop datasets have shown that UTH
outperforms several state-of-the-art unsupervised hashing methods in terms of
retrieval accuracy
Semantic bottleneck for computer vision tasks
This paper introduces a novel method for the representation of images that is
semantic by nature, addressing the question of computation intelligibility in
computer vision tasks. More specifically, our proposition is to introduce what
we call a semantic bottleneck in the processing pipeline, which is a crossing
point in which the representation of the image is entirely expressed with
natural language , while retaining the efficiency of numerical representations.
We show that our approach is able to generate semantic representations that
give state-of-the-art results on semantic content-based image retrieval and
also perform very well on image classification tasks. Intelligibility is
evaluated through user centered experiments for failure detection
CNN Features off-the-shelf: an Astounding Baseline for Recognition
Recent results indicate that the generic descriptors extracted from the
convolutional neural networks are very powerful. This paper adds to the
mounting evidence that this is indeed the case. We report on a series of
experiments conducted for different recognition tasks using the publicly
available code and model of the \overfeat network which was trained to perform
object classification on ILSVRC13. We use features extracted from the \overfeat
network as a generic image representation to tackle the diverse range of
recognition tasks of object image classification, scene recognition, fine
grained recognition, attribute detection and image retrieval applied to a
diverse set of datasets. We selected these tasks and datasets as they gradually
move further away from the original task and data the \overfeat network was
trained to solve. Astonishingly, we report consistent superior results compared
to the highly tuned state-of-the-art systems in all the visual classification
tasks on various datasets. For instance retrieval it consistently outperforms
low memory footprint methods except for sculptures dataset. The results are
achieved using a linear SVM classifier (or distance in case of retrieval)
applied to a feature representation of size 4096 extracted from a layer in the
net. The representations are further modified using simple augmentation
techniques e.g. jittering. The results strongly suggest that features obtained
from deep learning with convolutional nets should be the primary candidate in
most visual recognition tasks.Comment: version 3 revisions: 1)Added results using feature processing and
data augmentation 2)Referring to most recent efforts of using CNN for
different visual recognition tasks 3) updated text/captio
- …