9 research outputs found
Fusion-supervised Deep Cross-modal Hashing
Deep hashing has recently received attention in cross-modal retrieval for its
impressive advantages. However, existing hashing methods for cross-modal
retrieval cannot fully capture the heterogeneous multi-modal correlation and
exploit the semantic information. In this paper, we propose a novel
\emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly,
FDCH learns unified binary codes through a fusion hash network with paired
samples as input, which effectively enhances the modeling of the correlation of
heterogeneous multi-modal data. Then, these high-quality unified hash codes
further supervise the training of the modality-specific hash networks for
encoding out-of-sample queries. Meanwhile, both pair-wise similarity
information and classification information are embedded in the hash networks
under one stream framework, which simultaneously preserves cross-modal
similarity and keeps semantic consistency. Experimental results on two
benchmark datasets demonstrate the state-of-the-art performance of FDCH
Attention Model Enhanced Network for Classification of Breast Cancer Image
Breast cancer classification remains a challenging task due to inter-class
ambiguity and intra-class variability. Existing deep learning-based methods try
to confront this challenge by utilizing complex nonlinear projections. However,
these methods typically extract global features from entire images, neglecting
the fact that the subtle detail information can be crucial in extracting
discriminative features. In this study, we propose a novel method named
Attention Model Enhanced Network (AMEN), which is formulated in a multi-branch
fashion with pixel-wised attention model and classification submodular.
Specifically, the feature learning part in AMEN can generate pixel-wised
attention map, while the classification submodular are utilized to classify the
samples. To focus more on subtle detail information, the sample image is
enhanced by the pixel-wised attention map generated from former branch.
Furthermore, boosting strategy are adopted to fuse classification results from
different branches for better performance. Experiments conducted on three
benchmark datasets demonstrate the superiority of the proposed method under
various scenarios
Learning Binary Semantic Embedding for Histology Image Classification and Retrieval
With the development of medical imaging technology and machine learning,
computer-assisted diagnosis which can provide impressive reference to
pathologists, attracts extensive research interests. The exponential growth of
medical images and uninterpretability of traditional classification models have
hindered the applications of computer-assisted diagnosis. To address these
issues, we propose a novel method for Learning Binary Semantic Embedding
(LBSE). Based on the efficient and effective embedding, classification and
retrieval are performed to provide interpretable computer-assisted diagnosis
for histology images. Furthermore, double supervision, bit uncorrelation and
balance constraint, asymmetric strategy and discrete optimization are
seamlessly integrated in the proposed method for learning binary embedding.
Experiments conducted on three benchmark datasets validate the superiority of
LBSE under various scenarios
Supervised Online Hashing via Similarity Distribution Learning
Online hashing has attracted extensive research attention when facing
streaming data. Most online hashing methods, learning binary codes based on
pairwise similarities of training instances, fail to capture the semantic
relationship, and suffer from a poor generalization in large-scale applications
due to large variations. In this paper, we propose to model the similarity
distributions between the input data and the hashing codes, upon which a novel
supervised online hashing method, dubbed as Similarity Distribution based
Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship
in the produced Hamming space. Specifically, we first transform the discrete
similarity matrix into a probability matrix via a Gaussian-based normalization
to address the extremely imbalanced distribution issue. And then, we introduce
a scaling Student t-distribution to solve the challenging initialization
problem, and efficiently bridge the gap between the known and unknown
distributions. Lastly, we align the two distributions via minimizing the
Kullback-Leibler divergence (KL-diverence) with stochastic gradient descent
(SGD), by which an intuitive similarity constraint is imposed to update hashing
model on the new streaming data with a powerful generalizing ability to the
past data. Extensive experiments on three widely-used benchmarks validate the
superiority of the proposed SDOH over the state-of-the-art methods in the
online retrieval task
Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval
Supervised cross-modal hashing has gained increasing research interest on
large-scale retrieval task owning to its satisfactory performance and
efficiency. However, it still has some challenging issues to be further
studied: 1) most of them fail to well preserve the semantic correlations in
hash codes because of the large heterogenous gap; 2) most of them relax the
discrete constraint on hash codes, leading to large quantization error and
consequent low performance; 3) most of them suffer from relatively high memory
cost and computational complexity during training procedure, which makes them
unscalable. In this paper, to address above issues, we propose a supervised
cross-modal hashing method based on matrix factorization dubbed Efficient
Discrete Supervised Hashing (EDSH). Specifically, collective matrix
factorization on heterogenous features and semantic embedding with class labels
are seamlessly integrated to learn hash codes. Therefore, the feature based
similarities and semantic correlations can be both preserved in hash codes,
which makes the learned hash codes more discriminative. Then an efficient
discrete optimal algorithm is proposed to handle the scalable issue. Instead of
learning hash codes bit-by-bit, hash codes matrix can be obtained directly
which is more efficient. Extensive experimental results on three public
real-world datasets demonstrate that EDSH produces a superior performance in
both accuracy and scalability over some existing cross-modal hashing methods
Deep Cross-modal Proxy Hashing
Due to their high retrieval efficiency and low storage cost for cross-modal
search task, cross-modal hashing methods have attracted considerable attention.
For supervised cross-modal hashing methods, how to make the learned hash codes
preserve semantic structure information sufficiently is a key point to further
enhance the retrieval performance. As far as we know, almost all supervised
cross-modal hashing methods preserve semantic structure information depending
on at-least-one similarity definition fully or partly, i.e., it defines two
datapoints as similar ones if they share at least one common category otherwise
they are dissimilar. Obviously, the at-least-one similarity misses abundant
semantic structure information. To tackle this problem, in this paper, we
propose a novel Deep Cross-modal Proxy Hashing, called DCPH. Specifically, DCPH
first learns a proxy hashing network to generate a discriminative proxy hash
code for each category. Then, by utilizing the learned proxy hash code as
supervised information, a novel -- is proposed
without defining the at-least-one similarity between datapoints. By minimizing
the novel --, the learned hash codes will
simultaneously preserve the cross-modal similarity and abundant semantic
structure information well. Extensive experiments on two benchmark datasets
show that the proposed method outperforms the state-of-the-art baselines in
cross-modal retrieval task
Task-adaptive Asymmetric Deep Cross-modal Hashing
Supervised cross-modal hashing aims to embed the semantic correlations of
heterogeneous modality data into the binary hash codes with discriminative
semantic labels. Because of its advantages on retrieval and storage efficiency,
it is widely used for solving efficient cross-modal retrieval. However,
existing researches equally handle the different tasks of cross-modal
retrieval, and simply learn the same couple of hash functions in a symmetric
way for them. Under such circumstance, the uniqueness of different cross-modal
retrieval tasks are ignored and sub-optimal performance may be brought.
Motivated by this, we present a Task-adaptive Asymmetric Deep Cross-modal
Hashing (TA-ADCMH) method in this paper. It can learn task-adaptive hash
functions for two sub-retrieval tasks via simultaneous modality representation
and asymmetric hash learning. Unlike previous cross-modal hashing approaches,
our learning framework jointly optimizes semantic preserving that transforms
deep features of multimedia data into binary hash codes, and the semantic
regression which directly regresses query modality representation to explicit
label. With our model, the binary codes can effectively preserve semantic
correlations across different modalities, meanwhile, adaptively capture the
query semantics. The superiority of TA-ADCMH is proved on two standard datasets
from many aspects
Deep Cross-Modal Hashing with Hashing Functions and Unified Hash Codes Jointly Learning
Due to their high retrieval efficiency and low storage cost, cross-modal
hashing methods have attracted considerable attention. Generally, compared with
shallow cross-modal hashing methods, deep cross-modal hashing methods can
achieve a more satisfactory performance by integrating feature learning and
hash codes optimizing into a same framework. However, most existing deep
cross-modal hashing methods either cannot learn a unified hash code for the two
correlated data-points of different modalities in a database instance or cannot
guide the learning of unified hash codes by the feedback of hashing function
learning procedure, to enhance the retrieval accuracy. To address the issues
above, in this paper, we propose a novel end-to-end Deep Cross-Modal Hashing
with Hashing Functions and Unified Hash Codes Jointly Learning (DCHUC).
Specifically, by an iterative optimization algorithm, DCHUC jointly learns
unified hash codes for image-text pairs in a database and a pair of hash
functions for unseen query image-text pairs. With the iterative optimization
algorithm, the learned unified hash codes can be used to guide the hashing
function learning procedure; Meanwhile, the learned hashing functions can
feedback to guide the unified hash codes optimizing procedure. Extensive
experiments on three public datasets demonstrate that the proposed method
outperforms the state-of-the-art cross-modal hashing methods
Asymmetric Correlation Quantization Hashing for Cross-modal Retrieval
Due to the superiority in similarity computation and database storage for
large-scale multiple modalities data, cross-modal hashing methods have
attracted extensive attention in similarity retrieval across the heterogeneous
modalities. However, there are still some limitations to be further taken into
account: (1) most current CMH methods transform real-valued data points into
discrete compact binary codes under the binary constraints, limiting the
capability of representation for original data on account of abundant loss of
information and producing suboptimal hash codes; (2) the discrete binary
constraint learning model is hard to solve, where the retrieval performance may
greatly reduce by relaxing the binary constraints for large quantization error;
(3) handling the learning problem of CMH in a symmetric framework, leading to
difficult and complex optimization objective. To address above challenges, in
this paper, a novel Asymmetric Correlation Quantization Hashing (ACQH) method
is proposed. Specifically, ACQH learns the projection matrixs of heterogeneous
modalities data points for transforming query into a low-dimensional
real-valued vector in latent semantic space and constructs the stacked
compositional quantization embedding in a coarse-to-fine manner for indicating
database points by a series of learnt real-valued codeword in the codebook with
the help of pointwise label information regression simultaneously. Besides, the
unified hash codes across modalities can be directly obtained by the discrete
iterative optimization framework devised in the paper. Comprehensive
experiments on diverse three benchmark datasets have shown the effectiveness
and rationality of ACQH.Comment: 12 page