37 research outputs found
A Deep Hashing Learning Network
Hashing-based methods seek compact and efficient binary codes that preserve
the neighborhood structure in the original data space. For most existing
hashing methods, an image is first encoded as a vector of hand-crafted visual
feature, followed by a hash projection and quantization step to get the compact
binary vector. Most of the hand-crafted features just encode the low-level
information of the input, the feature may not preserve the semantic
similarities of images pairs. Meanwhile, the hashing function learning process
is independent with the feature representation, so the feature may not be
optimal for the hashing projection. In this paper, we propose a supervised
hashing method based on a well designed deep convolutional neural network,
which tries to learn hashing code and compact representations of data
simultaneously. The proposed model learn the binary codes by adding a compact
sigmoid layer before the loss layer. Experiments on several image data sets
show that the proposed model outperforms other state-of-the-art methods.Comment: 7 pages, 5 figure
First-Take-All: Temporal Order-Preserving Hashing for 3D Action Videos
With the prevalence of the commodity depth cameras, the new paradigm of user
interfaces based on 3D motion capturing and recognition have dramatically
changed the way of interactions between human and computers. Human action
recognition, as one of the key components in these devices, plays an important
role to guarantee the quality of user experience. Although the model-driven
methods have achieved huge success, they cannot provide a scalable solution for
efficiently storing, retrieving and recognizing actions in the large-scale
applications. These models are also vulnerable to the temporal translation and
warping, as well as the variations in motion scales and execution rates. To
address these challenges, we propose to treat the 3D human action recognition
as a video-level hashing problem and propose a novel First-Take-All (FTA)
Hashing algorithm capable of hashing the entire video into hash codes of fixed
length. We demonstrate that this FTA algorithm produces a compact
representation of the video invariant to the above mentioned variations,
through which action recognition can be solved by an efficient nearest neighbor
search by the Hamming distance between the FTA hash codes. Experiments on the
public 3D human action datasets shows that the FTA algorithm can reach a
recognition accuracy higher than 80%, with about 15 bits per frame considering
there are 65 frames per video over the datasets.Comment: 9 pages, 11 figure
Set-to-Set Hashing with Applications in Visual Recognition
Visual data, such as an image or a sequence of video frames, is often
naturally represented as a point set. In this paper, we consider the
fundamental problem of finding a nearest set from a collection of sets, to a
query set. This problem has obvious applications in large-scale visual
retrieval and recognition, and also in applied fields beyond computer vision.
One challenge stands out in solving the problem---set representation and
measure of similarity. Particularly, the query set and the sets in dataset
collection can have varying cardinalities. The training collection is large
enough such that linear scan is impractical. We propose a simple representation
scheme that encodes both statistical and structural information of the sets.
The derived representations are integrated in a kernel framework for flexible
similarity measurement. For the query set process, we adopt a learning-to-hash
pipeline that turns the kernel representations into hash bits based on simple
learners, using multiple kernel learning. Experiments on two visual retrieval
datasets show unambiguously that our set-to-set hashing framework outperforms
prior methods that do not take the set-to-set search setting.Comment: 9 page
Discrete Hashing with Deep Neural Network
This paper addresses the problem of learning binary hash codes for large
scale image search by proposing a novel hashing method based on deep neural
network. The advantage of our deep model over previous deep model used in
hashing is that our model contains necessary criteria for producing good codes
such as similarity preserving, balance and independence. Another advantage of
our method is that instead of relaxing the binary constraint of codes during
the learning process as most previous works, in this paper, by introducing the
auxiliary variable, we reformulate the optimization into two sub-optimization
steps allowing us to efficiently solve binary constraints without any
relaxation.
The proposed method is also extended to the supervised hashing by leveraging
the label information such that the learned binary codes preserve the pairwise
label of inputs.
The experimental results on three benchmark datasets show the proposed
methods outperform state-of-the-art hashing methods
Semantic Hierarchy Preserving Deep Hashing for Large-scale Image Retrieval
Convolutional neural networks have been widely used in content-based image
retrieval. To better deal with large-scale data, the deep hashing model is
proposed as an effective method, which maps an image to a binary code that can
be used for hashing search. However, most existing deep hashing models only
utilize fine-level semantic labels or convert them to similar/dissimilar labels
for training. The natural semantic hierarchy structures are ignored in the
training stage of the deep hashing model. In this paper, we present an
effective algorithm to train a deep hashing model that can preserve a semantic
hierarchy structure for large-scale image retrieval. Experiments on two
datasets show that our method improves the fine-level retrieval performance.
Meanwhile, our model achieves state-of-the-art results in terms of hierarchical
retrieval
Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss
Deep supervised hashing has emerged as an influential solution to large-scale
semantic image retrieval problems in computer vision. In the light of recent
progress, convolutional neural network based hashing methods typically seek
pair-wise or triplet labels to conduct the similarity preserving learning.
However, complex semantic concepts of visual contents are hard to capture by
similar/dissimilar labels, which limits the retrieval performance. Generally,
pair-wise or triplet losses not only suffer from expensive training costs but
also lack in extracting sufficient semantic information. In this regard, we
propose a novel deep supervised hashing model to learn more compact class-level
similarity preserving binary codes. Our deep learning based model is motivated
by deep metric learning that directly takes semantic labels as supervised
information in training and generates corresponding discriminant hashing code.
Specifically, a novel cubic constraint loss function based on Gaussian
distribution is proposed, which preserves semantic variations while penalizes
the overlap part of different classes in the embedding space. To address the
discrete optimization problem introduced by binary codes, a two-step
optimization strategy is proposed to provide efficient training and avoid the
problem of gradient vanishing. Extensive experiments on four large-scale
benchmark databases show that our model can achieve the state-of-the-art
retrieval performance. Moreover, when training samples are limited, our method
surpasses other supervised deep hashing methods with non-negligible margins
SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval
Hashing methods have been widely used for efficient similarity retrieval on
large scale image database. Traditional hashing methods learn hash functions to
generate binary codes from hand-crafted features, which achieve limited
accuracy since the hand-crafted features cannot optimally represent the image
content and preserve the semantic similarity. Recently, several deep hashing
methods have shown better performance because the deep architectures generate
more discriminative feature representations. However, these deep hashing
methods are mainly designed for supervised scenarios, which only exploit the
semantic similarity information, but ignore the underlying data structures. In
this paper, we propose the semi-supervised deep hashing (SSDH) approach, to
perform more effective hash function learning by simultaneously preserving
semantic similarity and underlying data structures. The main contributions are
as follows: (1) We propose a semi-supervised loss to jointly minimize the
empirical error on labeled data, as well as the embedding error on both labeled
and unlabeled data, which can preserve the semantic similarity and capture the
meaningful neighbors on the underlying data structures for effective hashing.
(2) A semi-supervised deep hashing network is designed to extensively exploit
both labeled and unlabeled data, in which we propose an online graph
construction method to benefit from the evolving deep features during training
to better capture semantic neighbors. To the best of our knowledge, the
proposed deep network is the first deep hashing method that can perform hash
code learning and feature learning simultaneously in a semi-supervised fashion.
Experimental results on 5 widely-used datasets show that our proposed approach
outperforms the state-of-the-art hashing methods.Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for
Video Technolog
Tattoo Image Search at Scale: Joint Detection and Compact Representation Learning
The explosive growth of digital images in video surveillance and social media
has led to the significant need for efficient search of persons of interest in
law enforcement and forensic applications. Despite tremendous progress in
primary biometric traits (e.g., face and fingerprint) based person
identification, a single biometric trait alone cannot meet the desired
recognition accuracy in forensic scenarios. Tattoos, as one of the important
soft biometric traits, have been found to be valuable for assisting in person
identification. However, tattoo search in a large collection of unconstrained
images remains a difficult problem, and existing tattoo search methods mainly
focus on matching cropped tattoos, which is different from real application
scenarios. To close the gap, we propose an efficient tattoo search approach
that is able to learn tattoo detection and compact representation jointly in a
single convolutional neural network (CNN) via multi-task learning. While the
features in the backbone network are shared by both tattoo detection and
compact representation learning, individual latent layers of each sub-network
optimize the shared features toward the detection and feature learning tasks,
respectively. We resolve the small batch size issue inside the joint tattoo
detection and compact representation learning network via random image stitch
and preceding feature buffering. We evaluate the proposed tattoo search system
using multiple public-domain tattoo benchmarks, and a gallery set with about
300K distracter tattoo images compiled from these datasets and images from the
Internet. In addition, we also introduce a tattoo sketch dataset containing 300
tattoos for sketch-based tattoo search. Experimental results show that the
proposed approach has superior performance in tattoo detection and tattoo
search at scale compared to several state-of-the-art tattoo retrieval
algorithms.Comment: Technical Report (15 pages, 14 figures
Push for Quantization: Deep Fisher Hashing
Current massive datasets demand light-weight access for analysis. Discrete
hashing methods are thus beneficial because they map high-dimensional data to
compact binary codes that are efficient to store and process, while preserving
semantic similarity. To optimize powerful deep learning methods for image
hashing, gradient-based methods are required. Binary codes, however, are
discrete and thus have no continuous derivatives. Relaxing the problem by
solving it in a continuous space and then quantizing the solution is not
guaranteed to yield separable binary codes. The quantization needs to be
included in the optimization. In this paper we push for quantization: We
optimize maximum class separability in the binary space. We introduce a margin
on distances between dissimilar image pairs as measured in the binary space. In
addition to pair-wise distances, we draw inspiration from Fisher's Linear
Discriminant Analysis (Fisher LDA) to maximize the binary distances between
classes and at the same time minimize the binary distance of images within the
same class. Experiments on CIFAR-10, NUS-WIDE and ImageNet100 demonstrate
compact codes comparing favorably to the current state of the art.Comment: BMVC 201
Object Detection based Deep Unsupervised Hashing
Recently, similarity-preserving hashing methods have been extensively studied
for large-scale image retrieval. Compared with unsupervised hashing, supervised
hashing methods for labeled data have usually better performance by utilizing
semantic label information. Intuitively, for unlabeled data, it will improve
the performance of unsupervised hashing methods if we can first mine some
supervised semantic 'label information' from unlabeled data and then
incorporate the 'label information' into the training process. Thus, in this
paper, we propose a novel Object Detection based Deep Unsupervised Hashing
method (ODDUH). Specifically, a pre-trained object detection model is utilized
to mining supervised 'label information', which is used to guide the learning
process to generate high-quality hash codes.Extensive experiments on two public
datasets demonstrate that the proposed method outperforms the state-of-the-art
unsupervised hashing methods in the image retrieval task