2,108 research outputs found

    Deep Discrete Supervised Hashing

    Full text link
    Hashing has been widely used for large-scale search due to its low storage cost and fast query speed. By using supervised information, supervised hashing can significantly outperform unsupervised hashing. Recently, discrete supervised hashing and deep hashing are two representative progresses in supervised hashing. On one hand, hashing is essentially a discrete optimization problem. Hence, utilizing supervised information to directly guide discrete (binary) coding procedure can avoid sub-optimal solution and improve the accuracy. On the other hand, deep hashing, which integrates deep feature learning and hash-code learning into an end-to-end architecture, can enhance the feedback between feature learning and hash-code learning. The key in discrete supervised hashing is to adopt supervised information to directly guide the discrete coding procedure in hashing. The key in deep hashing is to adopt the supervised information to directly guide the deep feature learning procedure. However, there have not existed works which can use the supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same framework. In this paper, we propose a novel deep hashing method, called deep discrete supervised hashing (DDSH), to address this problem. DDSH is the first deep hashing method which can utilize supervised information to directly guide both discrete coding procedure and deep feature learning procedure, and thus enhance the feedback between these two important procedures. Experiments on three real datasets show that DDSH can outperform other state-of-the-art baselines, including both discrete hashing and deep hashing baselines, for image retrieval

    Attribute-Guided Network for Cross-Modal Zero-Shot Hashing

    Full text link
    Zero-Shot Hashing aims at learning a hashing model that is trained only by instances from seen categories but can generate well to those of unseen categories. Typically, it is achieved by utilizing a semantic embedding space to transfer knowledge from seen domain to unseen domain. Existing efforts mainly focus on single-modal retrieval task, especially Image-Based Image Retrieval (IBIR). However, as a highlighted research topic in the field of hashing, cross-modal retrieval is more common in real world applications. To address the Cross-Modal Zero-Shot Hashing (CMZSH) retrieval task, we propose a novel Attribute-Guided Network (AgNet), which can perform not only IBIR, but also Text-Based Image Retrieval (TBIR). In particular, AgNet aligns different modal data into a semantically rich attribute space, which bridges the gap caused by modality heterogeneity and zero-shot setting. We also design an effective strategy that exploits the attribute to guide the generation of hash codes for image and text within the same network. Extensive experimental results on three benchmark datasets (AwA, SUN, and ImageNet) demonstrate the superiority of AgNet on both cross-modal and single-modal zero-shot image retrieval tasks.Comment: 9 pages, 8 figure

    Hadamard Matrix Guided Online Hashing

    Full text link
    Online image hashing has attracted increasing research attention recently, which receives large-scale data in a streaming manner to update the hash functions on-the-fly. Its key challenge lies in the difficulty of balancing the learning timeliness and model accuracy. To this end, most works follow a supervised setting, i.e., using class labels to boost the hashing performance, which defects in two aspects: First, strong constraints, e.g., orthogonal or similarity preserving, are used, which however are typically relaxed and lead to large accuracy drop. Second, large amounts of training batches are required to learn the up-to-date hash functions, which largely increase the learning complexity. To handle the above challenges, a novel supervised online hashing scheme termed Hadamard Matrix Guided Online Hashing (HMOH) is proposed in this paper. Our key innovation lies in introducing Hadamard matrix, which is an orthogonal binary matrix built via Sylvester method. In particular, to release the need of strong constraints, we regard each column of Hadamard matrix as the target code for each class label, which by nature satisfies several desired properties of hashing codes. To accelerate the online training, LSH is first adopted to align the lengths of target code and to-be-learned binary code. We then treat the learning of hash functions as a set of binary classification problems to fit the assigned target code. Finally, extensive experiments demonstrate the superior accuracy and efficiency of the proposed method over various state-of-the-art methods. Codes are available at https://github.com/lmbxmu/mycode

    Image Super-Resolution Using TV Priori Guided Convolutional Network

    Full text link
    We proposed a TV priori information guided deep learning method for single image super-resolution(SR). The new alogorithm up-sample method based on TV priori, new learning method and neural networks architecture are embraced in our TV guided priori Convolutional Neural Network which diretcly learns an end to end mapping between the low level to high level images.Comment: This paper is underviewring in Journal of Pattern Recognition Letter

    Occlusion-guided compact template learning for ensemble deep network-based pose-invariant face recognition

    Full text link
    Concatenation of the deep network representations extracted from different facial patches helps to improve face recognition performance. However, the concatenated facial template increases in size and contains redundant information. Previous solutions aim to reduce the dimensionality of the facial template without considering the occlusion pattern of the facial patches. In this paper, we propose an occlusion-guided compact template learning (OGCTL) approach that only uses the information from visible patches to construct the compact template. The compact face representation is not sensitive to the number of patches that are used to construct the facial template and is more suitable for incorporating the information from different view angles for image-set based face recognition. Instead of using occlusion masks in face matching (e.g., DPRFS [38]), the proposed method uses occlusion masks in template construction and achieves significantly better image-set based face verification performance on a challenging database with a template size that is an order-of-magnitude smaller than DPRFS.Comment: Accepted by International Conference on Biometrics (ICB 2019) as an Oral presentatio

    Transfer Adversarial Hashing for Hamming Space Retrieval

    Full text link
    Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting, which allows the database and the training set to come from different but relevant domains. However, the transfer retrieval setting will introduce two technical difficulties: first, the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second, the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence, transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH), a new hybrid deep architecture that incorporates a pairwise tt-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets

    DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs

    Full text link
    Due to the high storage and search efficiency, hashing has become prevalent for large-scale similarity search. Particularly, deep hashing methods have greatly improved the search performance under supervised scenarios. In contrast, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable supervisory similarity signals. To address this issue, we propose a novel deep unsupervised hashing model, dubbed DistillHash, which can learn a distilled data set consisted of data pairs, which have confidence similarity signals. Specifically, we investigate the relationship between the initial noisy similarity signals learned from local structures and the semantic similarity labels assigned by a Bayes optimal classifier. We show that under a mild assumption, some data pairs, of which labels are consistent with those assigned by the Bayes optimal classifier, can be potentially distilled. Inspired by this fact, we design a simple yet effective strategy to distill data pairs automatically and further adopt a Bayesian learning framework to learn hash functions from the distilled data set. Extensive experimental results on three widely used benchmark datasets show that the proposed DistillHash consistently accomplishes the state-of-the-art search performance

    Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss

    Full text link
    Deep supervised hashing has emerged as an influential solution to large-scale semantic image retrieval problems in computer vision. In the light of recent progress, convolutional neural network based hashing methods typically seek pair-wise or triplet labels to conduct the similarity preserving learning. However, complex semantic concepts of visual contents are hard to capture by similar/dissimilar labels, which limits the retrieval performance. Generally, pair-wise or triplet losses not only suffer from expensive training costs but also lack in extracting sufficient semantic information. In this regard, we propose a novel deep supervised hashing model to learn more compact class-level similarity preserving binary codes. Our deep learning based model is motivated by deep metric learning that directly takes semantic labels as supervised information in training and generates corresponding discriminant hashing code. Specifically, a novel cubic constraint loss function based on Gaussian distribution is proposed, which preserves semantic variations while penalizes the overlap part of different classes in the embedding space. To address the discrete optimization problem introduced by binary codes, a two-step optimization strategy is proposed to provide efficient training and avoid the problem of gradient vanishing. Extensive experiments on four large-scale benchmark databases show that our model can achieve the state-of-the-art retrieval performance. Moreover, when training samples are limited, our method surpasses other supervised deep hashing methods with non-negligible margins

    A Decade Survey of Content Based Image Retrieval using Deep Learning

    Full text link
    The content based image retrieval aims to find the similar images from a large scale dataset against a query image. Generally, the similarity between the representative features of the query image and dataset images is used to rank the images for retrieval. In early days, various hand designed feature descriptors have been investigated based on the visual cues such as color, texture, shape, etc. that represent the images. However, the deep learning has emerged as a dominating alternative of hand-designed feature engineering from a decade. It learns the features automatically from the data. This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval. The categorization of existing state-of-the-art methods from different perspectives is also performed for greater understanding of the progress. The taxonomy used in this survey covers different supervision, different networks, different descriptor type and different retrieval type. A performance analysis is also performed using the state-of-the-art methods. The insights are also presented for the benefit of the researchers to observe the progress and to make the best choices. The survey presented in this paper will help in further research progress in image retrieval using deep learning

    Learning Visual Knowledge Memory Networks for Visual Question Answering

    Full text link
    Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning from structured human knowledge with confirmation from visual content. This paper proposes visual knowledge memory network (VKMN) to address this issue, which seamlessly incorporates structured human knowledge and deep visual features into memory networks in an end-to-end learning framework. Comparing to existing methods for leveraging external knowledge for supporting VQA, this paper stresses more on two missing mechanisms. First is the mechanism for integrating visual contents with knowledge facts. VKMN handles this issue by embedding knowledge triples (subject, relation, target) and deep visual features jointly into the visual knowledge features. Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs. VKMN stores joint embedding using key-value pair structure in the memory networks so that it is easy to handle multiple facts. Experiments show that the proposed method achieves promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms state-of-the-art methods on the knowledge-reasoning related questions.Comment: Supplementary to CVPR 2018 versio
    • …
    corecore