2,108 research outputs found
Deep Discrete Supervised Hashing
Hashing has been widely used for large-scale search due to its low storage
cost and fast query speed. By using supervised information, supervised hashing
can significantly outperform unsupervised hashing. Recently, discrete
supervised hashing and deep hashing are two representative progresses in
supervised hashing. On one hand, hashing is essentially a discrete optimization
problem. Hence, utilizing supervised information to directly guide discrete
(binary) coding procedure can avoid sub-optimal solution and improve the
accuracy. On the other hand, deep hashing, which integrates deep feature
learning and hash-code learning into an end-to-end architecture, can enhance
the feedback between feature learning and hash-code learning. The key in
discrete supervised hashing is to adopt supervised information to directly
guide the discrete coding procedure in hashing. The key in deep hashing is to
adopt the supervised information to directly guide the deep feature learning
procedure. However, there have not existed works which can use the supervised
information to directly guide both discrete coding procedure and deep feature
learning procedure in the same framework. In this paper, we propose a novel
deep hashing method, called deep discrete supervised hashing (DDSH), to address
this problem. DDSH is the first deep hashing method which can utilize
supervised information to directly guide both discrete coding procedure and
deep feature learning procedure, and thus enhance the feedback between these
two important procedures. Experiments on three real datasets show that DDSH can
outperform other state-of-the-art baselines, including both discrete hashing
and deep hashing baselines, for image retrieval
Attribute-Guided Network for Cross-Modal Zero-Shot Hashing
Zero-Shot Hashing aims at learning a hashing model that is trained only by
instances from seen categories but can generate well to those of unseen
categories. Typically, it is achieved by utilizing a semantic embedding space
to transfer knowledge from seen domain to unseen domain. Existing efforts
mainly focus on single-modal retrieval task, especially Image-Based Image
Retrieval (IBIR). However, as a highlighted research topic in the field of
hashing, cross-modal retrieval is more common in real world applications. To
address the Cross-Modal Zero-Shot Hashing (CMZSH) retrieval task, we propose a
novel Attribute-Guided Network (AgNet), which can perform not only IBIR, but
also Text-Based Image Retrieval (TBIR). In particular, AgNet aligns different
modal data into a semantically rich attribute space, which bridges the gap
caused by modality heterogeneity and zero-shot setting. We also design an
effective strategy that exploits the attribute to guide the generation of hash
codes for image and text within the same network. Extensive experimental
results on three benchmark datasets (AwA, SUN, and ImageNet) demonstrate the
superiority of AgNet on both cross-modal and single-modal zero-shot image
retrieval tasks.Comment: 9 pages, 8 figure
Hadamard Matrix Guided Online Hashing
Online image hashing has attracted increasing research attention recently,
which receives large-scale data in a streaming manner to update the hash
functions on-the-fly. Its key challenge lies in the difficulty of balancing the
learning timeliness and model accuracy. To this end, most works follow a
supervised setting, i.e., using class labels to boost the hashing performance,
which defects in two aspects: First, strong constraints, e.g., orthogonal or
similarity preserving, are used, which however are typically relaxed and lead
to large accuracy drop. Second, large amounts of training batches are required
to learn the up-to-date hash functions, which largely increase the learning
complexity. To handle the above challenges, a novel supervised online hashing
scheme termed Hadamard Matrix Guided Online Hashing (HMOH) is proposed in this
paper. Our key innovation lies in introducing Hadamard matrix, which is an
orthogonal binary matrix built via Sylvester method. In particular, to release
the need of strong constraints, we regard each column of Hadamard matrix as the
target code for each class label, which by nature satisfies several desired
properties of hashing codes. To accelerate the online training, LSH is first
adopted to align the lengths of target code and to-be-learned binary code. We
then treat the learning of hash functions as a set of binary classification
problems to fit the assigned target code. Finally, extensive experiments
demonstrate the superior accuracy and efficiency of the proposed method over
various state-of-the-art methods. Codes are available at
https://github.com/lmbxmu/mycode
Image Super-Resolution Using TV Priori Guided Convolutional Network
We proposed a TV priori information guided deep learning method for single
image super-resolution(SR). The new alogorithm up-sample method based on TV
priori, new learning method and neural networks architecture are embraced in
our TV guided priori Convolutional Neural Network which diretcly learns an end
to end mapping between the low level to high level images.Comment: This paper is underviewring in Journal of Pattern Recognition Letter
Occlusion-guided compact template learning for ensemble deep network-based pose-invariant face recognition
Concatenation of the deep network representations extracted from different
facial patches helps to improve face recognition performance. However, the
concatenated facial template increases in size and contains redundant
information. Previous solutions aim to reduce the dimensionality of the facial
template without considering the occlusion pattern of the facial patches. In
this paper, we propose an occlusion-guided compact template learning (OGCTL)
approach that only uses the information from visible patches to construct the
compact template. The compact face representation is not sensitive to the
number of patches that are used to construct the facial template and is more
suitable for incorporating the information from different view angles for
image-set based face recognition. Instead of using occlusion masks in face
matching (e.g., DPRFS [38]), the proposed method uses occlusion masks in
template construction and achieves significantly better image-set based face
verification performance on a challenging database with a template size that is
an order-of-magnitude smaller than DPRFS.Comment: Accepted by International Conference on Biometrics (ICB 2019) as an
Oral presentatio
Transfer Adversarial Hashing for Hamming Space Retrieval
Hashing is widely applied to large-scale image retrieval due to the storage
and retrieval efficiency. Existing work on deep hashing assumes that the
database in the target domain is identically distributed with the training set
in the source domain. This paper relaxes this assumption to a transfer
retrieval setting, which allows the database and the training set to come from
different but relevant domains. However, the transfer retrieval setting will
introduce two technical difficulties: first, the hash model trained on the
source domain cannot work well on the target domain due to the large
distribution gap; second, the domain gap makes it difficult to concentrate the
database points to be within a small Hamming ball. As a consequence, transfer
retrieval performance within Hamming Radius 2 degrades significantly in
existing hashing methods. This paper presents Transfer Adversarial Hashing
(TAH), a new hybrid deep architecture that incorporates a pairwise
-distribution cross-entropy loss to learn concentrated hash codes and an
adversarial network to align the data distributions between the source and
target domains. TAH can generate compact transfer hash codes for efficient
image retrieval on both source and target domains. Comprehensive experiments
validate that TAH yields state of the art Hamming space retrieval performance
on standard datasets
DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs
Due to the high storage and search efficiency, hashing has become prevalent
for large-scale similarity search. Particularly, deep hashing methods have
greatly improved the search performance under supervised scenarios. In
contrast, unsupervised deep hashing models can hardly achieve satisfactory
performance due to the lack of reliable supervisory similarity signals. To
address this issue, we propose a novel deep unsupervised hashing model, dubbed
DistillHash, which can learn a distilled data set consisted of data pairs,
which have confidence similarity signals. Specifically, we investigate the
relationship between the initial noisy similarity signals learned from local
structures and the semantic similarity labels assigned by a Bayes optimal
classifier. We show that under a mild assumption, some data pairs, of which
labels are consistent with those assigned by the Bayes optimal classifier, can
be potentially distilled. Inspired by this fact, we design a simple yet
effective strategy to distill data pairs automatically and further adopt a
Bayesian learning framework to learn hash functions from the distilled data
set. Extensive experimental results on three widely used benchmark datasets
show that the proposed DistillHash consistently accomplishes the
state-of-the-art search performance
Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss
Deep supervised hashing has emerged as an influential solution to large-scale
semantic image retrieval problems in computer vision. In the light of recent
progress, convolutional neural network based hashing methods typically seek
pair-wise or triplet labels to conduct the similarity preserving learning.
However, complex semantic concepts of visual contents are hard to capture by
similar/dissimilar labels, which limits the retrieval performance. Generally,
pair-wise or triplet losses not only suffer from expensive training costs but
also lack in extracting sufficient semantic information. In this regard, we
propose a novel deep supervised hashing model to learn more compact class-level
similarity preserving binary codes. Our deep learning based model is motivated
by deep metric learning that directly takes semantic labels as supervised
information in training and generates corresponding discriminant hashing code.
Specifically, a novel cubic constraint loss function based on Gaussian
distribution is proposed, which preserves semantic variations while penalizes
the overlap part of different classes in the embedding space. To address the
discrete optimization problem introduced by binary codes, a two-step
optimization strategy is proposed to provide efficient training and avoid the
problem of gradient vanishing. Extensive experiments on four large-scale
benchmark databases show that our model can achieve the state-of-the-art
retrieval performance. Moreover, when training samples are limited, our method
surpasses other supervised deep hashing methods with non-negligible margins
A Decade Survey of Content Based Image Retrieval using Deep Learning
The content based image retrieval aims to find the similar images from a
large scale dataset against a query image. Generally, the similarity between
the representative features of the query image and dataset images is used to
rank the images for retrieval. In early days, various hand designed feature
descriptors have been investigated based on the visual cues such as color,
texture, shape, etc. that represent the images. However, the deep learning has
emerged as a dominating alternative of hand-designed feature engineering from a
decade. It learns the features automatically from the data. This paper presents
a comprehensive survey of deep learning based developments in the past decade
for content based image retrieval. The categorization of existing
state-of-the-art methods from different perspectives is also performed for
greater understanding of the progress. The taxonomy used in this survey covers
different supervision, different networks, different descriptor type and
different retrieval type. A performance analysis is also performed using the
state-of-the-art methods. The insights are also presented for the benefit of
the researchers to observe the progress and to make the best choices. The
survey presented in this paper will help in further research progress in image
retrieval using deep learning
Learning Visual Knowledge Memory Networks for Visual Question Answering
Visual question answering (VQA) requires joint comprehension of images and
natural language questions, where many questions can't be directly or clearly
answered from visual content but require reasoning from structured human
knowledge with confirmation from visual content. This paper proposes visual
knowledge memory network (VKMN) to address this issue, which seamlessly
incorporates structured human knowledge and deep visual features into memory
networks in an end-to-end learning framework. Comparing to existing methods for
leveraging external knowledge for supporting VQA, this paper stresses more on
two missing mechanisms. First is the mechanism for integrating visual contents
with knowledge facts. VKMN handles this issue by embedding knowledge triples
(subject, relation, target) and deep visual features jointly into the visual
knowledge features. Second is the mechanism for handling multiple knowledge
facts expanding from question and answer pairs. VKMN stores joint embedding
using key-value pair structure in the memory networks so that it is easy to
handle multiple facts. Experiments show that the proposed method achieves
promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms
state-of-the-art methods on the knowledge-reasoning related questions.Comment: Supplementary to CVPR 2018 versio
- …