29 research outputs found
Training Triplet Networks with GAN
Triplet networks are widely used models that are characterized by good
performance in classification and retrieval tasks. In this work we propose to
train a triplet network by putting it as the discriminator in Generative
Adversarial Nets (GANs). We make use of the good capability of representation
learning of the discriminator to increase the predictive quality of the model.
We evaluated our approach on Cifar10 and MNIST datasets and observed
significant improvement on the classification performance using the simple k-nn
method
Improved Search in Hamming Space using Deep Multi-Index Hashing
Similarity-preserving hashing is a widely-used method for nearest neighbour
search in large-scale image retrieval tasks. There has been considerable
research on generating efficient image representation via the
deep-network-based hashing methods. However, the issue of efficient searching
in the deep representation space remains largely unsolved. To this end, we
propose a simple yet efficient deep-network-based multi-index hashing method
for simultaneously learning the powerful image representation and the efficient
searching. To achieve these two goals, we introduce the multi-index hashing
(MIH) mechanism into the proposed deep architecture, which divides the binary
codes into multiple substrings. Due to the non-uniformly distributed codes will
result in inefficiency searching, we add the two balanced constraints at
feature-level and instance-level, respectively. Extensive evaluations on
several benchmark image retrieval datasets show that the learned balanced
binary codes bring dramatic speedups and achieve comparable performance over
the existing baselines
Transductive Zero-Shot Hashing via Coarse-to-Fine Similarity Mining
Zero-shot Hashing (ZSH) is to learn hashing models for novel/target classes
without training data, which is an important and challenging problem. Most
existing ZSH approaches exploit transfer learning via an intermediate shared
semantic representations between the seen/source classes and novel/target
classes. However, due to having disjoint, the hash functions learned from the
source dataset are biased when applied directly to the target classes. In this
paper, we study the transductive ZSH, i.e., we have unlabeled data for novel
classes. We put forward a simple yet efficient joint learning approach via
coarse-to-fine similarity mining which transfers knowledges from source data to
target data. It mainly consists of two building blocks in the proposed deep
architecture: 1) a shared two-streams network, which the first stream operates
on the source data and the second stream operates on the unlabeled data, to
learn the effective common image representations, and 2) a coarse-to-fine
module, which begins with finding the most representative images from target
classes and then further detect similarities among these images, to transfer
the similarities of the source data to the target data in a greedy fashion.
Extensive evaluation results on several benchmark datasets demonstrate that the
proposed hashing method achieves significant improvement over the
state-of-the-art methods
Deep Policy Hashing Network with Listwise Supervision
Deep-networks-based hashing has become a leading approach for large-scale
image retrieval, which learns a similarity-preserving network to map similar
images to nearby hash codes. The pairwise and triplet losses are two widely
used similarity preserving manners for deep hashing. These manners ignore the
fact that hashing is a prediction task on the list of binary codes. However,
learning deep hashing with listwise supervision is challenging in 1) how to
obtain the rank list of whole training set when the batch size of the deep
network is always small and 2) how to utilize the listwise supervision. In this
paper, we present a novel deep policy hashing architecture with two systems are
learned in parallel: a query network and a shared and slowly changing database
network. The following three steps are repeated until convergence: 1) the
database network encodes all training samples into binary codes to obtain a
whole rank list, 2) the query network is trained based on policy learning to
maximize a reward that indicates the performance of the whole ranking list of
binary codes, e.g., mean average precision (MAP), and 3) the database network
is updated as the query network. Extensive evaluations on several benchmark
datasets show that the proposed method brings substantial improvements over
state-of-the-art hashing methods.Comment: 8 pages, accepted by ACM ICM
Sampling Matters in Deep Embedding Learning
Deep embeddings answer one simple question: How similar are two images?
Learning these embeddings is the bedrock of verification, zero-shot learning,
and visual search. The most prominent approaches optimize a deep convolutional
network with a suitable loss function, such as contrastive loss or triplet
loss. While a rich line of work focuses solely on the loss functions, we show
in this paper that selecting training examples plays an equally important role.
We propose distance weighted sampling, which selects more informative and
stable examples than traditional approaches. In addition, we show that a simple
margin based loss is sufficient to outperform all other loss functions. We
evaluate our approach on the Stanford Online Products, CAR196, and the
CUB200-2011 datasets for image retrieval and clustering, and on the LFW dataset
for face verification. Our method achieves state-of-the-art performance on all
of them.Comment: Add supplementary material. Paper published in ICCV 201
Visual Tracking via Shallow and Deep Collaborative Model
In this paper, we propose a robust tracking method based on the collaboration
of a generative model and a discriminative classifier, where features are
learned by shallow and deep architectures, respectively. For the generative
model, we introduce a block-based incremental learning scheme, in which a local
binary mask is constructed to deal with occlusion. The similarity degrees
between the local patches and their corresponding subspace are integrated to
formulate a more accurate global appearance model. In the discriminative model,
we exploit the advances of deep learning architectures to learn generic
features which are robust to both background clutters and foreground appearance
variations. To this end, we first construct a discriminative training set from
auxiliary video sequences. A deep classification neural network is then trained
offline on this training set. Through online fine-tuning, both the hierarchical
feature extractor and the classifier can be adapted to the appearance change of
the target for effective online tracking. The collaboration of these two models
achieves a good balance in handling occlusion and target appearance change,
which are two contradictory challenging factors in visual tracking. Both
quantitative and qualitative evaluations against several state-of-the-art
algorithms on challenging image sequences demonstrate the accuracy and the
robustness of the proposed tracker.Comment: Undergraduate Thesis, appearing in Pattern Recognitio
Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Given the benefits of its low storage requirements and high retrieval
efficiency, hashing has recently received increasing attention. In
particular,cross-modal hashing has been widely and successfully used in
multimedia similarity search applications. However, almost all existing methods
employing cross-modal hashing cannot obtain powerful hash codes due to their
ignoring the relative similarity between heterogeneous data that contains
richer semantic information, leading to unsatisfactory retrieval performance.
In this paper, we propose a triplet-based deep hashing (TDH) network for
cross-modal retrieval. First, we utilize the triplet labels, which describes
the relative relationships among three instances as supervision in order to
capture more general semantic correlations between cross-modal instances. We
then establish a loss function from the inter-modal view and the intra-modal
view to boost the discriminative abilities of the hash codes. Finally, graph
regularization is introduced into our proposed TDH method to preserve the
original semantic similarity between hash codes in Hamming space. Experimental
results show that our proposed method outperforms several state-of-the-art
approaches on two popular cross-modal datasets
Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval
Deep distance metric learning (DDML), which is proposed to learn image
similarity metrics in an end-to-end manner based on the convolution neural
network, has achieved encouraging results in many computer vision
tasks.-normalization in the embedding space has been used to improve the
performance of several DDML methods. However, the commonly used Euclidean
distance is no longer an accurate metric for -normalized embedding space,
i.e., a hyper-sphere. Another challenge of current DDML methods is that their
loss functions are usually based on rigid data formats, such as the triplet
tuple. Thus, an extra process is needed to prepare data in specific formats. In
addition, their losses are obtained from a limited number of samples, which
leads to a lack of the global view of the embedding space. In this paper, we
replace the Euclidean distance with the cosine similarity to better utilize the
-normalization, which is able to attenuate the curse of dimensionality.
More specifically, a novel loss function based on the von Mises-Fisher
distribution is proposed to learn a compact hyper-spherical embedding space.
Moreover, a new efficient learning algorithm is developed to better capture the
global structure of the embedding space. Experiments for both classification
and retrieval tasks on several standard datasets show that our method achieves
state-of-the-art performance with a simpler training procedure. Furthermore, we
demonstrate that, even with a small number of convolutional layers, our model
can still obtain significantly better classification performance than the
widely used softmax loss.Comment: codes will come soo
Triplet Permutation Method for Deep Learning of Single-Shot Person Re-Identification
Solving Single-Shot Person Re-Identification (Re-Id) by training Deep
Convolutional Neural Networks is a daunting challenge, due to the lack of
training data, since only two images per person are available. This causes the
overfitting of the models, leading to degenerated performance. This paper
formulates the Triplet Permutation method to generate multiple training sets,
from a certain re-id dataset. This is a novel strategy for feeding triplet
networks, which reduces the overfitting of the Single-Shot Re-Id model. The
improved performance has been demonstrated over one of the most challenging
Re-Id datasets, PRID2011, proving the effectiveness of the method
Deep Discrete Supervised Hashing
Hashing has been widely used for large-scale search due to its low storage
cost and fast query speed. By using supervised information, supervised hashing
can significantly outperform unsupervised hashing. Recently, discrete
supervised hashing and deep hashing are two representative progresses in
supervised hashing. On one hand, hashing is essentially a discrete optimization
problem. Hence, utilizing supervised information to directly guide discrete
(binary) coding procedure can avoid sub-optimal solution and improve the
accuracy. On the other hand, deep hashing, which integrates deep feature
learning and hash-code learning into an end-to-end architecture, can enhance
the feedback between feature learning and hash-code learning. The key in
discrete supervised hashing is to adopt supervised information to directly
guide the discrete coding procedure in hashing. The key in deep hashing is to
adopt the supervised information to directly guide the deep feature learning
procedure. However, there have not existed works which can use the supervised
information to directly guide both discrete coding procedure and deep feature
learning procedure in the same framework. In this paper, we propose a novel
deep hashing method, called deep discrete supervised hashing (DDSH), to address
this problem. DDSH is the first deep hashing method which can utilize
supervised information to directly guide both discrete coding procedure and
deep feature learning procedure, and thus enhance the feedback between these
two important procedures. Experiments on three real datasets show that DDSH can
outperform other state-of-the-art baselines, including both discrete hashing
and deep hashing baselines, for image retrieval