374 research outputs found
Unsupervised Triplet Hashing for Fast Image Retrieval
Hashing has played a pivotal role in large-scale image retrieval. With the
development of Convolutional Neural Network (CNN), hashing learning has shown
great promise. But existing methods are mostly tuned for classification, which
are not optimized for retrieval tasks, especially for instance-level retrieval.
In this study, we propose a novel hashing method for large-scale image
retrieval. Considering the difficulty in obtaining labeled datasets for image
retrieval task in large scale, we propose a novel CNN-based unsupervised
hashing method, namely Unsupervised Triplet Hashing (UTH). The unsupervised
hashing network is designed under the following three principles: 1) more
discriminative representations for image retrieval; 2) minimum quantization
loss between the original real-valued feature descriptors and the learned hash
codes; 3) maximum information entropy for the learned hash codes. Extensive
experiments on CIFAR-10, MNIST and In-shop datasets have shown that UTH
outperforms several state-of-the-art unsupervised hashing methods in terms of
retrieval accuracy
Unsupervised Learning of Visual Representations using Videos
Is strong supervision necessary for learning a good visual representation? Do
we really need millions of semantically-labeled images to train a Convolutional
Neural Network (CNN)? In this paper, we present a simple yet surprisingly
powerful approach for unsupervised learning of CNN. Specifically, we use
hundreds of thousands of unlabeled videos from the web to learn visual
representations. Our key idea is that visual tracking provides the supervision.
That is, two patches connected by a track should have similar visual
representation in deep feature space since they probably belong to the same
object or object part. We design a Siamese-triplet network with a ranking loss
function to train this CNN representation. Without using a single image from
ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train
an ensemble of unsupervised networks that achieves 52% mAP (no bounding box
regression). This performance comes tantalizingly close to its
ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4%. We
also show that our unsupervised network can perform competitively in other
tasks such as surface-normal estimation
- …