8,438 research outputs found

    MIHash: Online Hashing with Mutual Information

    Full text link
    Learning-based hashing methods are widely used for nearest neighbor retrieval, and recently, online hashing methods have demonstrated good performance-complexity trade-offs by learning hash functions from streaming data. In this paper, we first address a key challenge for online hashing: the binary codes for indexed data must be recomputed to keep pace with updates to the hash functions. We propose an efficient quality measure for hash functions, based on an information-theoretic quantity, mutual information, and use it successfully as a criterion to eliminate unnecessary hash table updates. Next, we also show how to optimize the mutual information objective using stochastic gradient descent. We thus develop a novel hashing method, MIHash, that can be used in both online and batch settings. Experiments on image retrieval benchmarks (including a 2.5M image dataset) confirm the effectiveness of our formulation, both in reducing hash table recomputations and in learning high-quality hash functions.Comment: International Conference on Computer Vision (ICCV), 201

    Online hashing for fast similarity search

    Full text link
    In this thesis, the problem of online adaptive hashing for fast similarity search is studied. Similarity search is a central problem in many computer vision applications. The ever-growing size of available data collections and the increasing usage of high-dimensional representations in describing data have increased the computational cost of performing similarity search, requiring search strategies that can explore such collections in an efficient and effective manner. One promising family of approaches is based on hashing, in which the goal is to map the data into the Hamming space where fast search mechanisms exist, while preserving the original neighborhood structure of the data. We first present a novel online hashing algorithm in which the hash mapping is updated in an iterative manner with streaming data. Being online, our method is amenable to variations of the data. Moreover, our formulation is orders of magnitude faster to train than state-of-the-art hashing solutions. Secondly, we propose an online supervised hashing framework in which the goal is to map data associated with similar labels to nearby binary representations. For this purpose, we utilize Error Correcting Output Codes (ECOCs) and consider an online boosting formulation in learning the hash mapping. Our formulation does not require any prior assumptions on the label space and is well-suited for expanding datasets that have new label inclusions. We also introduce a flexible framework that allows us to reduce hash table entry updates. This is critical, especially when frequent updates may occur as the hash table grows larger and larger. Thirdly, we propose a novel mutual information measure to efficiently infer the quality of a hash mapping and retrieval performance. This measure has lower complexity than standard retrieval metrics. With this measure, we first address a key challenge in online hashing that has often been ignored: the binary representations of the data must be recomputed to keep pace with updates to the hash mapping. Based on our novel mutual information measure, we propose an efficient quality measure for hash functions, and use it to determine when to update the hash table. Next, we show that this mutual information criterion can be used as an objective in learning hash functions, using gradient-based optimization. Experiments on image retrieval benchmarks confirm the effectiveness of our formulation, both in reducing hash table recomputations and in learning high-quality hash functions

    ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks

    Full text link
    Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1' for the visited tree leaf, and `0' for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201

    Towards Optimal Discrete Online Hashing with Balanced Similarity

    Full text link
    When facing large-scale image datasets, online hashing serves as a promising solution for online retrieval and prediction tasks. It encodes the online streaming data into compact binary codes, and simultaneously updates the hash functions to renew codes of the existing dataset. To this end, the existing methods update hash functions solely based on the new data batch, without investigating the correlation between such new data and the existing dataset. In addition, existing works update the hash functions using a relaxation process in its corresponding approximated continuous space. And it remains as an open problem to directly apply discrete optimizations in online hashing. In this paper, we propose a novel supervised online hashing method, termed Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above problems in a unified framework. BSODH employs a well-designed hashing algorithm to preserve the similarity between the streaming data and the existing dataset via an asymmetric graph regularization. We further identify the "data-imbalance" problem brought by the constructed asymmetric graph, which restricts the application of discrete optimization in our problem. Therefore, a novel balanced similarity is further proposed, which uses two equilibrium factors to balance the similar and dissimilar weights and eventually enables the usage of discrete optimizations. Extensive experiments conducted on three widely-used benchmarks demonstrate the advantages of the proposed method over the state-of-the-art methods.Comment: 8 pages, 11 figures, conferenc
    • …
    corecore