414 research outputs found
Supervised Matrix Factorization for Cross-Modality Hashing
Matrix factorization has been recently utilized for the task of multi-modal
hashing for cross-modality visual search, where basis functions are learned to
map data from different modalities to the same Hamming embedding. In this
paper, we propose a novel cross-modality hashing algorithm termed Supervised
Matrix Factorization Hashing (SMFH) which tackles the multi-modal hashing
problem with a collective non-matrix factorization across the different
modalities. In particular, SMFH employs a well-designed binary code learning
algorithm to preserve the similarities among multi-modal original features
through a graph regularization. At the same time, semantic labels, when
available, are incorporated into the learning procedure. We conjecture that all
these would facilitate to preserve the most relevant information during the
binary quantization process, and hence improve the retrieval accuracy. We
demonstrate the superior performance of SMFH on three cross-modality visual
search benchmarks, i.e., the PASCAL-Sentence, Wiki, and NUS-WIDE, with
quantitative comparison to various state-of-the-art methodsComment: 7 pages, 4 figure
Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval
Supervised cross-modal hashing has gained increasing research interest on
large-scale retrieval task owning to its satisfactory performance and
efficiency. However, it still has some challenging issues to be further
studied: 1) most of them fail to well preserve the semantic correlations in
hash codes because of the large heterogenous gap; 2) most of them relax the
discrete constraint on hash codes, leading to large quantization error and
consequent low performance; 3) most of them suffer from relatively high memory
cost and computational complexity during training procedure, which makes them
unscalable. In this paper, to address above issues, we propose a supervised
cross-modal hashing method based on matrix factorization dubbed Efficient
Discrete Supervised Hashing (EDSH). Specifically, collective matrix
factorization on heterogenous features and semantic embedding with class labels
are seamlessly integrated to learn hash codes. Therefore, the feature based
similarities and semantic correlations can be both preserved in hash codes,
which makes the learned hash codes more discriminative. Then an efficient
discrete optimal algorithm is proposed to handle the scalable issue. Instead of
learning hash codes bit-by-bit, hash codes matrix can be obtained directly
which is more efficient. Extensive experimental results on three public
real-world datasets demonstrate that EDSH produces a superior performance in
both accuracy and scalability over some existing cross-modal hashing methods
Discriminative Supervised Hashing for Cross-Modal similarity Search
With the advantage of low storage cost and high retrieval efficiency, hashing
techniques have recently been an emerging topic in cross-modal similarity
search. As multiple modal data reflect similar semantic content, many
researches aim at learning unified binary codes. However, discriminative
hashing features learned by these methods are not adequate. This results in
lower accuracy and robustness. We propose a novel hashing learning framework
which jointly performs classifier learning, subspace learning and matrix
factorization to preserve class-specific semantic content, termed
Discriminative Supervised Hashing (DSH), to learn the discrimative unified
binary codes for multi-modal data. Besides, reducing the loss of information
and preserving the non-linear structure of data, DSH non-linearly projects
different modalities into the common space in which the similarity among
heterogeneous data points can be measured. Extensive experiments conducted on
the three publicly available datasets demonstrate that the framework proposed
in this paper outperforms several state-of -the-art methods.Comment: 7 pages,3 figures,4 tables;The paper is under consideration at Image
and Vision Computin
Unsupervised Cross-Media Hashing with Structure Preservation
Recent years have seen the exponential growth of heterogeneous multimedia
data. The need for effective and accurate data retrieval from heterogeneous
data sources has attracted much research interest in cross-media retrieval.
Here, given a query of any media type, cross-media retrieval seeks to find
relevant results of different media types from heterogeneous data sources. To
facilitate large-scale cross-media retrieval, we propose a novel unsupervised
cross-media hashing method. Our method incorporates local affinity and distance
repulsion constraints into a matrix factorization framework. Correspondingly,
the proposed method learns hash functions that generates unified hash codes
from different media types, while ensuring intrinsic geometric structure of the
data distribution is preserved. These hash codes empower the similarity between
data of different media types to be evaluated directly. Experimental results on
two large-scale multimedia datasets demonstrate the effectiveness of the
proposed method, where we outperform the state-of-the-art methods
Unsupervised Multi-modal Hashing for Cross-modal retrieval
With the advantage of low storage cost and high efficiency, hashing learning
has received much attention in the domain of Big Data. In this paper, we
propose a novel unsupervised hashing learning method to cope with this open
problem to directly preserve the manifold structure by hashing. To address this
problem, both the semantic correlation in textual space and the locally
geometric structure in the visual space are explored simultaneously in our
framework. Besides, the `2;1-norm constraint is imposed on the projection
matrices to learn the discriminative hash function for each modality. Extensive
experiments are performed to evaluate the proposed method on the three publicly
available datasets and the experimental results show that our method can
achieve superior performance over the state-of-the-art methods.Comment: 4 pages, 4 figure
Triplet-Based Deep Hashing Network for Cross-Modal Retrieval
Given the benefits of its low storage requirements and high retrieval
efficiency, hashing has recently received increasing attention. In
particular,cross-modal hashing has been widely and successfully used in
multimedia similarity search applications. However, almost all existing methods
employing cross-modal hashing cannot obtain powerful hash codes due to their
ignoring the relative similarity between heterogeneous data that contains
richer semantic information, leading to unsatisfactory retrieval performance.
In this paper, we propose a triplet-based deep hashing (TDH) network for
cross-modal retrieval. First, we utilize the triplet labels, which describes
the relative relationships among three instances as supervision in order to
capture more general semantic correlations between cross-modal instances. We
then establish a loss function from the inter-modal view and the intra-modal
view to boost the discriminative abilities of the hash codes. Finally, graph
regularization is introduced into our proposed TDH method to preserve the
original semantic similarity between hash codes in Hamming space. Experimental
results show that our proposed method outperforms several state-of-the-art
approaches on two popular cross-modal datasets
Attribute-Guided Network for Cross-Modal Zero-Shot Hashing
Zero-Shot Hashing aims at learning a hashing model that is trained only by
instances from seen categories but can generate well to those of unseen
categories. Typically, it is achieved by utilizing a semantic embedding space
to transfer knowledge from seen domain to unseen domain. Existing efforts
mainly focus on single-modal retrieval task, especially Image-Based Image
Retrieval (IBIR). However, as a highlighted research topic in the field of
hashing, cross-modal retrieval is more common in real world applications. To
address the Cross-Modal Zero-Shot Hashing (CMZSH) retrieval task, we propose a
novel Attribute-Guided Network (AgNet), which can perform not only IBIR, but
also Text-Based Image Retrieval (TBIR). In particular, AgNet aligns different
modal data into a semantically rich attribute space, which bridges the gap
caused by modality heterogeneity and zero-shot setting. We also design an
effective strategy that exploits the attribute to guide the generation of hash
codes for image and text within the same network. Extensive experimental
results on three benchmark datasets (AwA, SUN, and ImageNet) demonstrate the
superiority of AgNet on both cross-modal and single-modal zero-shot image
retrieval tasks.Comment: 9 pages, 8 figure
Semi-supervised Multimodal Hashing
Retrieving nearest neighbors across correlated data in multiple modalities,
such as image-text pairs on Facebook and video-tag pairs on YouTube, has become
a challenging task due to the huge amount of data. Multimodal hashing methods
that embed data into binary codes can boost the retrieving speed and reduce
storage requirement. As unsupervised multimodal hashing methods are usually
inferior to supervised ones, while the supervised ones requires too much
manually labeled data, the proposed method in this paper utilizes a part of
labels to design a semi-supervised multimodal hashing method. It first computes
the transformation matrices for data matrices and label matrix. Then, with
these transformation matrices, fuzzy logic is introduced to estimate a label
matrix for unlabeled data. Finally, it uses the estimated label matrix to learn
hashing functions for data in each modality to generate a unified binary code
matrix. Experiments show that the proposed semi-supervised method with 50%
labels can get a medium performance among the compared supervised ones and
achieve an approximate performance to the best supervised method with 90%
labels. With only 10% labels, the proposed method can still compete with the
worst compared supervised one
Weakly-paired Cross-Modal Hashing
Hashing has been widely adopted for large-scale data retrieval in many
domains, due to its low storage cost and high retrieval speed. Existing
cross-modal hashing methods optimistically assume that the correspondence
between training samples across modalities are readily available. This
assumption is unrealistic in practical applications. In addition, these methods
generally require the same number of samples across different modalities, which
restricts their flexibility. We propose a flexible cross-modal hashing approach
(Flex-CMH) to learn effective hashing codes from weakly-paired data, whose
correspondence across modalities are partially (or even totally) unknown.
FlexCMH first introduces a clustering-based matching strategy to explore the
local structure of each cluster, and thus to find the potential correspondence
between clusters (and samples therein) across modalities. To reduce the impact
of an incomplete correspondence, it jointly optimizes in a unified objective
function the potential correspondence, the cross-modal hashing functions
derived from the correspondence, and a hashing quantitative loss. An
alternative optimization technique is also proposed to coordinate the
correspondence and hash functions, and to reinforce the reciprocal effects of
the two objectives. Experiments on publicly multi-modal datasets show that
FlexCMH achieves significantly better results than state-of-the-art methods,
and it indeed offers a high degree of flexibility for practical cross-modal
hashing tasks
SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network
Cross-modal hashing aims to map heterogeneous multimedia data into a common
Hamming space, which can realize fast and flexible retrieval across different
modalities. Supervised cross-modal hashing methods have achieved considerable
progress by incorporating semantic side information. However, they mainly have
two limitations: (1) Heavily rely on large-scale labeled cross-modal training
data which are labor intensive and hard to obtain. (2) Ignore the rich
information contained in the large amount of unlabeled data across different
modalities, especially the margin examples that are easily to be incorrectly
retrieved, which can help to model the correlations. To address these problems,
in this paper we propose a novel Semi-supervised Cross-Modal Hashing approach
by Generative Adversarial Network (SCH-GAN). We aim to take advantage of GAN's
ability for modeling data distributions to promote cross-modal hashing learning
in an adversarial way. The main contributions can be summarized as follows: (1)
We propose a novel generative adversarial network for cross-modal hashing. In
our proposed SCH-GAN, the generative model tries to select margin examples of
one modality from unlabeled data when giving a query of another modality. While
the discriminative model tries to distinguish the selected examples and true
positive examples of the query. These two models play a minimax game so that
the generative model can promote the hashing performance of discriminative
model. (2) We propose a reinforcement learning based algorithm to drive the
training of proposed SCH-GAN. The generative model takes the correlation score
predicted by discriminative model as a reward, and tries to select the examples
close to the margin to promote discriminative model by maximizing the margin
between positive and negative data. Experiments on 3 widely-used datasets
verify the effectiveness of our proposed approach.Comment: 12 pages, submitted to IEEE Transactions on Cybernetic
- …