46 research outputs found

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Learning binary codes for maximum inner product search

    Get PDF
    Binary coding or hashing techniques are recognized to accomplish efficient near neighbor search, and have thus attracted broad interests in the recent vision and learning studies. However, such studies have rarely been dedicated to Maximum Inner Product Search (MIPS), which plays a critical role in various vision applications. In this paper, we investigate learning binary codes to exclusively handle the MIPS problem. Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting. Specifically, two sets of coding functions are learned such that the inner products between their generated binary codes can reveal the inner products between original data vectors. We also propose an alternative simpler objective which maximizes the correlations between the inner products of the produced binary codes and raw data vectors. In both objectives, the binary codes and coding functions are simultaneously learned without continuous relaxations, which is the key to achieving high-quality binary codes. We evaluate the proposed method, dubbed Asymmetric Inner-product Binary Coding (AIBC), relying on the two objectives on several large-scale image datasets. Both of them are superior to the state-of-the-art binary coding and hashing methods in performing MIPS tasks

    Instance-based Bird Species Identification with Undiscriminant Features Pruning

    Get PDF
    International audienceThis paper reports the participation of Inria to the audiobasedbird species identication challenge of LifeCLEF 2014 campaign.Inspired by recent works on ne-grained image classication, we introducean instance-based classication scheme based on the dense indexingof MFCC features and the pruning of the non-discriminant ones. To makesuch strategy scalable to the 30M of MFCC features extracted from thetens of thousands audio recordings of the training set, we used highdimensionalhashing techniques coupled with an ecient approximatenearest neighbors search algorithm with controlled quality. Further improvementsare obtained by (i) using a sliding classier with max pooling(ii) weighting the query features according to their semantic coherence(iii) making use of the metadata to lter incoherent species. Results showthe eectiveness of the proposed technique which ranked 3rd among the10 participating groups

    Scalable Image Retrieval by Sparse Product Quantization

    Get PDF
    Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional feature indexing and retrieval is the crux of large-scale image retrieval. A recent promising technique is Product Quantization, which attempts to index high-dimensional image features by decomposing the feature space into a Cartesian product of low dimensional subspaces and quantizing each of them separately. Despite the promising results reported, their quantization approach follows the typical hard assignment of traditional quantization methods, which may result in large quantization errors and thus inferior search performance. Unlike the existing approaches, in this paper, we propose a novel approach called Sparse Product Quantization (SPQ) to encoding the high-dimensional feature vectors into sparse representation. We optimize the sparse representations of the feature vectors by minimizing their quantization errors, making the resulting representation is essentially close to the original data in practice. Experiments show that the proposed SPQ technique is not only able to compress data, but also an effective encoding technique. We obtain state-of-the-art results for ANN search on four public image datasets and the promising results of content-based image retrieval further validate the efficacy of our proposed method.Comment: 12 page

    Floristic participation at LifeCLEF 2016 Plant Identification Task

    Get PDF
    International audienceThis paper describes the participation of the Floristic consortium to the LifeCLEF 2016 plant identification challenge[18]. The aim of the task was to produce a list of relevant species for a large set of plant images related to 1000 species of trees, herbs and ferns living in Western Europe, knowing that some of these images belonged to unseen categories in the training set like plant species from other areas, horticultural plants or even off topic images (people, keyboards, animals, etc). To address this challenge, we first experimented as a baseline, without any rejection procedure, a Convolutional Neural Network (CNN) approach based on a slightly modified GoogLeNet model. In a second run, we applied a simple rejection criteria based on probability threshold estimation on the output of the CNN, one for each species, for removing automatically species propositions judged irrelevant. In the third run, rather than definitely eliminating some species predictions with the risk to remove false negative propositions, we applied various attenuation factors in order to revise the probability distributions given by the CNN as confident score expressing how much a query was related or not to the known species. More precisely, for this last run we used the geographical information and several cohesion measures in terms of observation, "organ" tags and taxonomy (genus and family levels) based on a knn similarity search results within the training set

    Participation of INRIA & Pl@ntNet to ImageCLEF 2011 plant images classification task

    Get PDF
    International audienceThis paper presents the participation of INRIA IMEDIA group and the Pl@ntNet project to ImageCLEF 2011 plant identification task. ImageCLEF's plant identification task provides a testbed for the system-oriented evaluation of tree species identification based on leaf images. The aim is to investigate image retrieval approaches in the context of crowdsourced images of leaves collected in a collaborative manner. IMEDIA submitted two runs to this task and obtained the best evaluation score for two of the three image categories addressed within the benchmark. The paper presents the two approaches employed, and provides an analysis of the obtained evaluation results
    corecore