1,211 research outputs found

    Dual Asymmetric Deep Hashing Learning

    Full text link
    Due to the impressive learning power, deep learning has achieved a remarkable performance in supervised hash function learning. In this paper, we propose a novel asymmetric supervised deep hashing method to preserve the semantic structure among different categories and generate the binary codes simultaneously. Specifically, two asymmetric deep networks are constructed to reveal the similarity between each pair of images according to their semantic labels. The deep hash functions are then learned through two networks by minimizing the gap between the learned features and discrete codes. Furthermore, since the binary codes in the Hamming space also should keep the semantic affinity existing in the original space, another asymmetric pairwise loss is introduced to capture the similarity between the binary codes and real-value features. This asymmetric loss not only improves the retrieval performance, but also contributes to a quick convergence at the training phase. By taking advantage of the two-stream deep structures and two types of asymmetric pairwise functions, an alternating algorithm is designed to optimize the deep features and high-quality binary codes efficiently. Experimental results on three real-world datasets substantiate the effectiveness and superiority of our approach as compared with state-of-the-art.Comment: 12 pages, 6 figures, 7 tables, 37 conference

    Collaborative Learning for Extremely Low Bit Asymmetric Hashing

    Full text link
    Hashing techniques are in great demand for a wide range of real-world applications such as image retrieval and network compression. Nevertheless, existing approaches could hardly guarantee a satisfactory performance with the extremely low-bit (e.g., 4-bit) hash codes due to the severe information loss and the shrink of the discrete solution space. In this paper, we propose a novel \textit{Collaborative Learning} strategy that is tailored for generating high-quality low-bit hash codes. The core idea is to jointly distill bit-specific and informative representations for a group of pre-defined code lengths. The learning of short hash codes among the group can benefit from the manifold shared with other long codes, where multiple views from different hash codes provide the supplementary guidance and regularization, making the convergence faster and more stable. To achieve that, an asymmetric hashing framework with two variants of multi-head embedding structures is derived, termed as Multi-head Asymmetric Hashing (MAH), leading to great efficiency of training and querying. Extensive experiments on three benchmark datasets have been conducted to verify the superiority of the proposed MAH, and have shown that the 8-bit hash codes generated by MAH achieve 94.3%94.3\% of the MAP (Mean Average Precision (MAP)) score on the CIFAR-10 dataset, which significantly surpasses the performance of the 48-bit codes by the state-of-the-arts in image retrieval tasks

    Deep Supervised Hashing leveraging Quadratic Spherical Mutual Information for Content-based Image Retrieval

    Full text link
    Several deep supervised hashing techniques have been proposed to allow for efficiently querying large image databases. However, deep supervised image hashing techniques are developed, to a great extent, heuristically often leading to suboptimal results. Contrary to this, we propose an efficient deep supervised hashing algorithm that optimizes the learned codes using an information-theoretic measure, the Quadratic Mutual Information (QMI). The proposed method is adapted to the needs of large-scale hashing and information retrieval leading to a novel information-theoretic measure, the Quadratic Spherical Mutual Information (QSMI). Apart from demonstrating the effectiveness of the proposed method under different scenarios and outperforming existing state-of-the-art image hashing techniques, this paper provides a structured way to model the process of information retrieval and develop novel methods adapted to the needs of each application

    Learning to Hash for Indexing Big Data - A Survey

    Full text link
    The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, Approximate Nearest Neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros and cons of the emerging techniques. We provide a comprehensive survey of the learning to hash framework and representative techniques of various types, including unsupervised, semi-supervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area

    Recent Advance in Content-based Image Retrieval: A Literature Survey

    Full text link
    The explosive increase and ubiquitous accessibility of visual data on the Web have led to the prosperity of research activity in image search or retrieval. With the ignorance of visual content as a ranking clue, methods with text search techniques for visual retrieval may suffer inconsistency between the text words and visual content. Content-based image retrieval (CBIR), which makes use of the representation of visual content to identify relevant images, has attracted sustained attention in recent two decades. Such a problem is challenging due to the intention gap and the semantic gap problems. Numerous techniques have been developed for content-based image retrieval in the last decade. The purpose of this paper is to categorize and evaluate those algorithms proposed during the period of 2003 to 2016. We conclude with several promising directions for future research.Comment: 22 page

    Deep Cross-Modal Hashing

    Full text link
    Due to its low storage cost and fast query speed, cross-modal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, almost all existing CMH methods are based on hand-crafted features which might not be optimally compatible with the hash-code learning procedure. As a result, existing CMH methods with handcrafted features may not achieve satisfactory performance. In this paper, we propose a novel cross-modal hashing method, called deep crossmodal hashing (DCMH), by integrating feature learning and hash-code learning into the same framework. DCMH is an end-to-end learning framework with deep neural networks, one for each modality, to perform feature learning from scratch. Experiments on two real datasets with text-image modalities show that DCMH can outperform other baselines to achieve the state-of-the-art performance in cross-modal retrieval applications.Comment: 12 page

    Clustering is Efficient for Approximate Maximum Inner Product Search

    Full text link
    Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes. Solutions based on locality-sensitive hashing (LSH) as well as tree-based solutions have been investigated in the recent literature, to perform approximate MIPS in sublinear time. In this paper, we compare these to another extremely simple approach for solving approximate MIPS, based on variants of the k-means clustering algorithm. Specifically, we propose to train a spherical k-means, after having reduced the MIPS problem to a Maximum Cosine Similarity Search (MCSS). Experiments on two standard recommendation system benchmarks as well as on large vocabulary word embeddings, show that this simple approach yields much higher speedups, for the same retrieval precision, than current state-of-the-art hashing-based and tree-based methods. This simple method also yields more robust retrievals when the query is corrupted by noise.Comment: 10 pages, Under review at ICLR 201

    Extreme Classification in Log Memory

    Full text link
    We present Merged-Averaged Classifiers via Hashing (MACH) for K-classification with ultra-large values of K. Compared to traditional one-vs-all classifiers that require O(Kd) memory and inference cost, MACH only need O(d log K) (d is dimensionality )memory while only requiring O(K log K + d log K) operation for inference. MACH is a generic K-classification algorithm, with provably theoretical guarantees, which requires O(log K) memory without any assumption on the relationship between classes. MACH uses universal hashing to reduce classification with a large number of classes to few independent classification tasks with small (constant) number of classes. We provide theoretical quantification of discriminability-memory tradeoff. With MACH we can train ODP dataset with 100,000 classes and 400,000 features on a single Titan X GPU, with the classification accuracy of 19.28%, which is the best-reported accuracy on this dataset. Before this work, the best performing baseline is a one-vs-all classifier that requires 40 billion parameters (160 GB model size) and achieves 9% accuracy. In contrast, MACH can achieve 9% accuracy with 480x reduction in the model size (of mere 0.3GB). With MACH, we also demonstrate complete training of fine-grained imagenet dataset (compressed size 104GB), with 21,000 classes, on a single GPU. To the best of our knowledge, this is the first work to demonstrate complete training of these extreme-class datasets on a single Titan X

    Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval

    Full text link
    With the increasing number of online stores, there is a pressing need for intelligent search systems to understand the item photos snapped by customers and search against large-scale product databases to find their desired items. However, it is challenging for conventional retrieval systems to match up the item photos captured by customers and the ones officially released by stores, especially for garment images. To bridge the customer- and store- provided garment photos, existing studies have been widely exploiting the clothing attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to learn a common embedding space for garment representations. Unfortunately they omit the sequential correlation of attributes and consume large quantity of human labors to label the landmarks. In this paper, we propose a deep multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain embedding and sequential attribute learning are modeled simultaneously. Sequential attribute learning not only provides the semantic guidance for embedding, but also generates rich attention on discriminative local details (\textit{e.g.,} black buttons) of clothing items without requiring extra landmark labels. This leads to promising performance and 306×\times boost on efficiency when compared with the state-of-the-art models, which is demonstrated through rigorous experiments on two public fashion datasets

    CNN-VWII: An Efficient Approach for Large-Scale Video Retrieval by Image Queries

    Full text link
    This paper aims to solve the problem of large-scale video retrieval by a query image. Firstly, we define the problem of top-kk image to video query. Then, we combine the merits of convolutional neural networks(CNN for short) and Bag of Visual Word(BoVW for short) module to design a model for video frames information extraction and representation. In order to meet the requirements of large-scale video retrieval, we proposed a visual weighted inverted index(VWII for short) and related algorithm to improve the efficiency and accuracy of retrieval process. Comprehensive experiments show that our proposed technique achieves substantial improvements (up to an order of magnitude speed up) over the state-of-the-art techniques with similar accuracy.Comment: submitted to Pattern Recognition Letter
    • …
    corecore