686 research outputs found
Transfer Adversarial Hashing for Hamming Space Retrieval
Hashing is widely applied to large-scale image retrieval due to the storage
and retrieval efficiency. Existing work on deep hashing assumes that the
database in the target domain is identically distributed with the training set
in the source domain. This paper relaxes this assumption to a transfer
retrieval setting, which allows the database and the training set to come from
different but relevant domains. However, the transfer retrieval setting will
introduce two technical difficulties: first, the hash model trained on the
source domain cannot work well on the target domain due to the large
distribution gap; second, the domain gap makes it difficult to concentrate the
database points to be within a small Hamming ball. As a consequence, transfer
retrieval performance within Hamming Radius 2 degrades significantly in
existing hashing methods. This paper presents Transfer Adversarial Hashing
(TAH), a new hybrid deep architecture that incorporates a pairwise
-distribution cross-entropy loss to learn concentrated hash codes and an
adversarial network to align the data distributions between the source and
target domains. TAH can generate compact transfer hash codes for efficient
image retrieval on both source and target domains. Comprehensive experiments
validate that TAH yields state of the art Hamming space retrieval performance
on standard datasets
A Survey on Learning to Hash
Nearest neighbor search is a problem of finding the data points from the
database such that the distances from them to the query point are the smallest.
Learning to hash is one of the major solutions to this problem and has been
widely studied recently. In this paper, we present a comprehensive survey of
the learning to hash algorithms, categorize them according to the manners of
preserving the similarities into: pairwise similarity preserving, multiwise
similarity preserving, implicit similarity preserving, as well as quantization,
and discuss their relations. We separate quantization from pairwise similarity
preserving as the objective function is very different though quantization, as
we show, can be derived from preserving the pairwise similarities. In addition,
we present the evaluation protocols, and the general performance analysis, and
point out that the quantization algorithms perform superiorly in terms of
search accuracy, search time cost, and space cost. Finally, we introduce a few
emerging topics.Comment: To appear in IEEE Transactions On Pattern Analysis and Machine
Intelligence (TPAMI
SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network
Cross-modal hashing aims to map heterogeneous multimedia data into a common
Hamming space, which can realize fast and flexible retrieval across different
modalities. Supervised cross-modal hashing methods have achieved considerable
progress by incorporating semantic side information. However, they mainly have
two limitations: (1) Heavily rely on large-scale labeled cross-modal training
data which are labor intensive and hard to obtain. (2) Ignore the rich
information contained in the large amount of unlabeled data across different
modalities, especially the margin examples that are easily to be incorrectly
retrieved, which can help to model the correlations. To address these problems,
in this paper we propose a novel Semi-supervised Cross-Modal Hashing approach
by Generative Adversarial Network (SCH-GAN). We aim to take advantage of GAN's
ability for modeling data distributions to promote cross-modal hashing learning
in an adversarial way. The main contributions can be summarized as follows: (1)
We propose a novel generative adversarial network for cross-modal hashing. In
our proposed SCH-GAN, the generative model tries to select margin examples of
one modality from unlabeled data when giving a query of another modality. While
the discriminative model tries to distinguish the selected examples and true
positive examples of the query. These two models play a minimax game so that
the generative model can promote the hashing performance of discriminative
model. (2) We propose a reinforcement learning based algorithm to drive the
training of proposed SCH-GAN. The generative model takes the correlation score
predicted by discriminative model as a reward, and tries to select the examples
close to the margin to promote discriminative model by maximizing the margin
between positive and negative data. Experiments on 3 widely-used datasets
verify the effectiveness of our proposed approach.Comment: 12 pages, submitted to IEEE Transactions on Cybernetic
Deep Class-Wise Hashing: Semantics-Preserving Hashing via Class-wise Loss
Deep supervised hashing has emerged as an influential solution to large-scale
semantic image retrieval problems in computer vision. In the light of recent
progress, convolutional neural network based hashing methods typically seek
pair-wise or triplet labels to conduct the similarity preserving learning.
However, complex semantic concepts of visual contents are hard to capture by
similar/dissimilar labels, which limits the retrieval performance. Generally,
pair-wise or triplet losses not only suffer from expensive training costs but
also lack in extracting sufficient semantic information. In this regard, we
propose a novel deep supervised hashing model to learn more compact class-level
similarity preserving binary codes. Our deep learning based model is motivated
by deep metric learning that directly takes semantic labels as supervised
information in training and generates corresponding discriminant hashing code.
Specifically, a novel cubic constraint loss function based on Gaussian
distribution is proposed, which preserves semantic variations while penalizes
the overlap part of different classes in the embedding space. To address the
discrete optimization problem introduced by binary codes, a two-step
optimization strategy is proposed to provide efficient training and avoid the
problem of gradient vanishing. Extensive experiments on four large-scale
benchmark databases show that our model can achieve the state-of-the-art
retrieval performance. Moreover, when training samples are limited, our method
surpasses other supervised deep hashing methods with non-negligible margins
A Survey on Web Multimedia Mining
Modern developments in digital media technologies has made transmitting and
storing large amounts of multi/rich media data (e.g. text, images, music, video
and their combination) more feasible and affordable than ever before. However,
the state of the art techniques to process, mining and manage those rich media
are still in their infancy. Advances developments in multimedia acquisition and
storage technology the rapid progress has led to the fast growing incredible
amount of data stored in databases. Useful information to users can be revealed
if these multimedia files are analyzed. Multimedia mining deals with the
extraction of implicit knowledge, multimedia data relationships, or other
patterns not explicitly stored in multimedia files. Also in retrieval, indexing
and classification of multimedia data with efficient information fusion of the
different modalities is essential for the system's overall performance. The
purpose of this paper is to provide a systematic overview of multimedia mining.
This article is also represents the issues in the application process component
for multimedia mining followed by the multimedia mining models.Comment: 13 Pages; The International Journal of Multimedia & Its Applications
(IJMA) Vol.3, No.3, August 201
Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search
As hashing becomes an increasingly appealing technique for large-scale image
retrieval, multi-label hashing is also attracting more attention for the
ability to exploit multi-level semantic contents. In this paper, we propose a
novel deep hashing method for scalable multi-label image search. Unlike
existing approaches with conventional objectives such as contrast and triplet
losses, we employ a rank list, rather than pairs or triplets, to provide
sufficient global supervision information for all the samples. Specifically, a
new rank-consistency objective is applied to align the similarity orders from
two spaces, the original space and the hamming space. A powerful loss function
is designed to penalize the samples whose semantic similarity and hamming
distance are mismatched in two spaces. Besides, a multi-label softmax
cross-entropy loss is presented to enhance the discriminative power with a
concise formulation of the derivative function. In order to manipulate the
neighborhood structure of the samples with different labels, we design a
multi-label clustering loss to cluster the hashing vectors of the samples with
the same labels by reducing the distances between the samples and their
multiple corresponding class centers. The state-of-the-art experimental results
achieved on three public multi-label datasets, MIRFLICKR-25K, IAPRTC12 and
NUS-WIDE, demonstrate the effectiveness of the proposed method
Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing
In recent years, cross-modal hashing (CMH) has attracted increasing
attentions, mainly because its potential ability of mapping contents from
different modalities, especially in vision and language, into the same space,
so that it becomes efficient in cross-modal data retrieval. There are two main
frameworks for CMH, differing from each other in whether semantic supervision
is required. Compared to the unsupervised methods, the supervised methods often
enjoy more accurate results, but require much heavier labors in data
annotation. In this paper, we propose a novel approach that enables guiding a
supervised method using outputs produced by an unsupervised method.
Specifically, we make use of teacher-student optimization for propagating
knowledge. Experiments are performed on two popular CMH benchmarks, i.e., the
MIRFlickr and NUS-WIDE datasets. Our approach outperforms all existing
unsupervised methods by a large margin.Comment: This paper has been accepted for CVPR202
From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories
Sound effects play an essential role in producing high-quality radio stories
but require enormous labor cost to add. In this paper, we address the problem
of automatically adding sound effects to radio stories with a retrieval-based
model. However, directly implementing a tag-based retrieval model leads to high
false positives due to the ambiguity of story contents. To solve this problem,
we introduce a retrieval-based framework hybridized with a semantic inference
model which helps to achieve robust retrieval results. Our model relies on
fine-designed features extracted from the context of candidate triggers. We
collect two story dubbing datasets through crowdsourcing to analyze the setting
of adding sound effects and to train and test our proposed methods. We further
discuss the importance of each feature and introduce several heuristic rules
for the trade-off between precision and recall. Together with the
text-to-speech technology, our results reveal a promising automatic pipeline on
producing high-quality radio stories.Comment: In the Proceedings of the 42nd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR 2019
TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems
There has been an explosion of interest in designing various Knowledge Graph
Neural Networks (KGNNs), which achieve state-of-the-art performance and provide
great explainability for recommendation. The promising performance is mainly
resulting from their capability of capturing high-order proximity messages over
the knowledge graphs. However, training KGNNs at scale is challenging due to
the high memory usage. In the forward pass, the automatic differentiation
engines (\textsl{e.g.}, TensorFlow/PyTorch) generally need to cache all
intermediate activation maps in order to compute gradients in the backward
pass, which leads to a large GPU memory footprint. Existing work solves this
problem by utilizing multi-GPU distributed frameworks. Nonetheless, this poses
a practical challenge when seeking to deploy KGNNs in memory-constrained
environments, especially for industry-scale graphs.
Here we present TinyKG, a memory-efficient GPU-based training framework for
KGNNs for the tasks of recommendation. Specifically, TinyKG uses exact
activations in the forward pass while storing a quantized version of
activations in the GPU buffers. During the backward pass, these low-precision
activations are dequantized back to full-precision tensors, in order to compute
gradients. To reduce the quantization errors, TinyKG applies a simple yet
effective quantization algorithm to compress the activations, which ensures
unbiasedness with low variance. As such, the training memory footprint of KGNNs
is largely reduced with negligible accuracy loss. To evaluate the performance
of our TinyKG, we conduct comprehensive experiments on real-world datasets. We
found that our TinyKG with INT2 quantization aggressively reduces the memory
footprint of activation maps with , only with loss in accuracy,
allowing us to deploy KGNNs on memory-constrained devices
Simultaneous Region Localization and Hash Coding for Fine-grained Image Retrieval
Fine-grained image hashing is a challenging problem due to the difficulties
of discriminative region localization and hash code generation. Most existing
deep hashing approaches solve the two tasks independently. While these two
tasks are correlated and can reinforce each other. In this paper, we propose a
deep fine-grained hashing to simultaneously localize the discriminative regions
and generate the efficient binary codes. The proposed approach consists of a
region localization module and a hash coding module. The region localization
module aims to provide informative regions to the hash coding module. The hash
coding module aims to generate effective binary codes and give feedback for
learning better localizer. Moreover, to better capture subtle differences,
multi-scale regions at different layers are learned without the need of
bounding-box/part annotations. Extensive experiments are conducted on two
public benchmark fine-grained datasets. The results demonstrate significant
improvements in the performance of our method relative to other fine-grained
hashing algorithms
- …