38,502 research outputs found
Near-Duplicate Image Retrieval Based on Contextual Descriptor
The state of the art of technology for near-duplicate image retrieval is mostly based on the Bag-of-Visual-Words model. However, visual words are easy to result in mismatches because of quantization errors of the local features the words represent. In order to improve the precision of visual words matching, contextual descriptors are designed to strengthen their discriminative power and measure the contextual similarity of visual words. This paper presents a new contextual descriptor that measures the contextual similarity of visual words to immediately discard the mismatches and reduce the count of candidate images. The new contextual descriptor encodes the relationships of dominant orientation and spatial position between the referential visual words and their context. Experimental results on benchmark Copydays dataset demonstrate its efficiency and effectiveness for near-duplicate image retrieval
Benchmarking unsupervised near-duplicate image detection
Unsupervised near-duplicate detection has many practical applications ranging from social media analysis and web-scale retrieval, to digital image forensics. It entails running a threshold-limited query on a set of descriptors extracted from the images, with the goal of identifying all possible near-duplicates, while limiting the false positives due to visually similar images. Since the rate of false alarms grows with the dataset size, a very high specificity is thus required, up to 1-10^-9 for realistic use cases; this important requirement, however, is often overlooked in literature. In recent years, descriptors based on deep convolutional neural networks have matched or surpassed traditional feature extraction methods in content-based image retrieval tasks. To the best of our knowledge, ours is the first attempt to establish the performance range of deep learning-based descriptors for unsupervised near-duplicate detection on a range of datasets, encompassing a broad spectrum of near-duplicate definitions. We leverage both established and new benchmarks, such as the Mir-Flick Near-Duplicate (MFND) dataset, in which a known ground truth is provided for all possible pairs over a general, large scale image collection. To compare the specificity of different descriptors, we reduce the problem of unsupervised detection to that of binary classification of near-duplicate vs. not-near-duplicate images. The latter can be conveniently characterized using Receiver Operating Curve (ROC). Our findings in general favor the choice of fine-tuning deep convolutional networks, as opposed to using off-the-shelf features, but differences at high specificity settings depend on the dataset and are often small. The best performance was observed on the MFND benchmark, achieving 96% sensitivity at a false positive rate of 1.43x10^-6
Nearest Neighbors Using Compact Sparse Codes
International audienceIn this paper, we propose a novel scheme for approximate nearest neighbor (ANN) retrieval based on dictionary learning and sparse coding. Our key innovation is to build compact codes, dubbed SpANN codes, using the active set of sparse coded data. These codes are then used to index an inverted file table for fast retrieval. The active sets are often found to be sensitive to small differences among data points, resulting in only near duplicate retrieval. We show that this sensitivity is related to the coherence of the dictionary; small coherence resulting in better retrieval. To this end, we propose a novel dictionary learning formulation with incoherence constraints and an efficient method to solve it. Experiments are conducted on two state-of-the-art computer vision datasets with 1M data points and show an order of magnitude improvement in retrieval accuracy without sacrificing memory and query time compared to the state-of-the-art methods
Do We Train on Test Data? Purging CIFAR of Near-Duplicates
The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked
datasets in computer vision and are often used to evaluate novel methods and
model architectures in the field of deep learning. However, we find that 3.3%
and 10% of the images from the test sets of these datasets have duplicates in
the training set. These duplicates are easily recognizable by memorization and
may, hence, bias the comparison of image recognition techniques regarding their
generalization capability. To eliminate this bias, we provide the "fair CIFAR"
(ciFAIR) dataset, where we replaced all duplicates in the test sets with new
images sampled from the same domain. We then re-evaluate the classification
performance of various popular state-of-the-art CNN architectures on these new
test sets to investigate whether recent research has overfitted to memorizing
data instead of learning abstract concepts. We find a significant drop in
classification accuracy of between 9% and 14% relative to the original
performance on the duplicate-free test set. The ciFAIR dataset and pre-trained
models are available at https://cvjena.github.io/cifair/, where we also
maintain a leaderboard.Comment: Journal of Imagin
- …