9 research outputs found

    Fusion-supervised Deep Cross-modal Hashing

    Full text link
    Deep hashing has recently received attention in cross-modal retrieval for its impressive advantages. However, existing hashing methods for cross-modal retrieval cannot fully capture the heterogeneous multi-modal correlation and exploit the semantic information. In this paper, we propose a novel \emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly, FDCH learns unified binary codes through a fusion hash network with paired samples as input, which effectively enhances the modeling of the correlation of heterogeneous multi-modal data. Then, these high-quality unified hash codes further supervise the training of the modality-specific hash networks for encoding out-of-sample queries. Meanwhile, both pair-wise similarity information and classification information are embedded in the hash networks under one stream framework, which simultaneously preserves cross-modal similarity and keeps semantic consistency. Experimental results on two benchmark datasets demonstrate the state-of-the-art performance of FDCH

    Attention Model Enhanced Network for Classification of Breast Cancer Image

    Full text link
    Breast cancer classification remains a challenging task due to inter-class ambiguity and intra-class variability. Existing deep learning-based methods try to confront this challenge by utilizing complex nonlinear projections. However, these methods typically extract global features from entire images, neglecting the fact that the subtle detail information can be crucial in extracting discriminative features. In this study, we propose a novel method named Attention Model Enhanced Network (AMEN), which is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular. Specifically, the feature learning part in AMEN can generate pixel-wised attention map, while the classification submodular are utilized to classify the samples. To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch. Furthermore, boosting strategy are adopted to fuse classification results from different branches for better performance. Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios

    Learning Binary Semantic Embedding for Histology Image Classification and Retrieval

    Full text link
    With the development of medical imaging technology and machine learning, computer-assisted diagnosis which can provide impressive reference to pathologists, attracts extensive research interests. The exponential growth of medical images and uninterpretability of traditional classification models have hindered the applications of computer-assisted diagnosis. To address these issues, we propose a novel method for Learning Binary Semantic Embedding (LBSE). Based on the efficient and effective embedding, classification and retrieval are performed to provide interpretable computer-assisted diagnosis for histology images. Furthermore, double supervision, bit uncorrelation and balance constraint, asymmetric strategy and discrete optimization are seamlessly integrated in the proposed method for learning binary embedding. Experiments conducted on three benchmark datasets validate the superiority of LBSE under various scenarios

    Supervised Online Hashing via Similarity Distribution Learning

    Full text link
    Online hashing has attracted extensive research attention when facing streaming data. Most online hashing methods, learning binary codes based on pairwise similarities of training instances, fail to capture the semantic relationship, and suffer from a poor generalization in large-scale applications due to large variations. In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space. Specifically, we first transform the discrete similarity matrix into a probability matrix via a Gaussian-based normalization to address the extremely imbalanced distribution issue. And then, we introduce a scaling Student t-distribution to solve the challenging initialization problem, and efficiently bridge the gap between the known and unknown distributions. Lastly, we align the two distributions via minimizing the Kullback-Leibler divergence (KL-diverence) with stochastic gradient descent (SGD), by which an intuitive similarity constraint is imposed to update hashing model on the new streaming data with a powerful generalizing ability to the past data. Extensive experiments on three widely-used benchmarks validate the superiority of the proposed SDOH over the state-of-the-art methods in the online retrieval task

    Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

    Full text link
    Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However, it still has some challenging issues to be further studied: 1) most of them fail to well preserve the semantic correlations in hash codes because of the large heterogenous gap; 2) most of them relax the discrete constraint on hash codes, leading to large quantization error and consequent low performance; 3) most of them suffer from relatively high memory cost and computational complexity during training procedure, which makes them unscalable. In this paper, to address above issues, we propose a supervised cross-modal hashing method based on matrix factorization dubbed Efficient Discrete Supervised Hashing (EDSH). Specifically, collective matrix factorization on heterogenous features and semantic embedding with class labels are seamlessly integrated to learn hash codes. Therefore, the feature based similarities and semantic correlations can be both preserved in hash codes, which makes the learned hash codes more discriminative. Then an efficient discrete optimal algorithm is proposed to handle the scalable issue. Instead of learning hash codes bit-by-bit, hash codes matrix can be obtained directly which is more efficient. Extensive experimental results on three public real-world datasets demonstrate that EDSH produces a superior performance in both accuracy and scalability over some existing cross-modal hashing methods

    Deep Cross-modal Proxy Hashing

    Full text link
    Due to their high retrieval efficiency and low storage cost for cross-modal search task, cross-modal hashing methods have attracted considerable attention. For supervised cross-modal hashing methods, how to make the learned hash codes preserve semantic structure information sufficiently is a key point to further enhance the retrieval performance. As far as we know, almost all supervised cross-modal hashing methods preserve semantic structure information depending on at-least-one similarity definition fully or partly, i.e., it defines two datapoints as similar ones if they share at least one common category otherwise they are dissimilar. Obviously, the at-least-one similarity misses abundant semantic structure information. To tackle this problem, in this paper, we propose a novel Deep Cross-modal Proxy Hashing, called DCPH. Specifically, DCPH first learns a proxy hashing network to generate a discriminative proxy hash code for each category. Then, by utilizing the learned proxy hash code as supervised information, a novel MarginMargin-SoftMaxSoftMax-like losslike\ loss is proposed without defining the at-least-one similarity between datapoints. By minimizing the novel MarginMargin-SoftMaxSoftMax-like losslike\ loss, the learned hash codes will simultaneously preserve the cross-modal similarity and abundant semantic structure information well. Extensive experiments on two benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in cross-modal retrieval task

    Task-adaptive Asymmetric Deep Cross-modal Hashing

    Full text link
    Supervised cross-modal hashing aims to embed the semantic correlations of heterogeneous modality data into the binary hash codes with discriminative semantic labels. Because of its advantages on retrieval and storage efficiency, it is widely used for solving efficient cross-modal retrieval. However, existing researches equally handle the different tasks of cross-modal retrieval, and simply learn the same couple of hash functions in a symmetric way for them. Under such circumstance, the uniqueness of different cross-modal retrieval tasks are ignored and sub-optimal performance may be brought. Motivated by this, we present a Task-adaptive Asymmetric Deep Cross-modal Hashing (TA-ADCMH) method in this paper. It can learn task-adaptive hash functions for two sub-retrieval tasks via simultaneous modality representation and asymmetric hash learning. Unlike previous cross-modal hashing approaches, our learning framework jointly optimizes semantic preserving that transforms deep features of multimedia data into binary hash codes, and the semantic regression which directly regresses query modality representation to explicit label. With our model, the binary codes can effectively preserve semantic correlations across different modalities, meanwhile, adaptively capture the query semantics. The superiority of TA-ADCMH is proved on two standard datasets from many aspects

    Deep Cross-Modal Hashing with Hashing Functions and Unified Hash Codes Jointly Learning

    Full text link
    Due to their high retrieval efficiency and low storage cost, cross-modal hashing methods have attracted considerable attention. Generally, compared with shallow cross-modal hashing methods, deep cross-modal hashing methods can achieve a more satisfactory performance by integrating feature learning and hash codes optimizing into a same framework. However, most existing deep cross-modal hashing methods either cannot learn a unified hash code for the two correlated data-points of different modalities in a database instance or cannot guide the learning of unified hash codes by the feedback of hashing function learning procedure, to enhance the retrieval accuracy. To address the issues above, in this paper, we propose a novel end-to-end Deep Cross-Modal Hashing with Hashing Functions and Unified Hash Codes Jointly Learning (DCHUC). Specifically, by an iterative optimization algorithm, DCHUC jointly learns unified hash codes for image-text pairs in a database and a pair of hash functions for unseen query image-text pairs. With the iterative optimization algorithm, the learned unified hash codes can be used to guide the hashing function learning procedure; Meanwhile, the learned hashing functions can feedback to guide the unified hash codes optimizing procedure. Extensive experiments on three public datasets demonstrate that the proposed method outperforms the state-of-the-art cross-modal hashing methods

    Asymmetric Correlation Quantization Hashing for Cross-modal Retrieval

    Full text link
    Due to the superiority in similarity computation and database storage for large-scale multiple modalities data, cross-modal hashing methods have attracted extensive attention in similarity retrieval across the heterogeneous modalities. However, there are still some limitations to be further taken into account: (1) most current CMH methods transform real-valued data points into discrete compact binary codes under the binary constraints, limiting the capability of representation for original data on account of abundant loss of information and producing suboptimal hash codes; (2) the discrete binary constraint learning model is hard to solve, where the retrieval performance may greatly reduce by relaxing the binary constraints for large quantization error; (3) handling the learning problem of CMH in a symmetric framework, leading to difficult and complex optimization objective. To address above challenges, in this paper, a novel Asymmetric Correlation Quantization Hashing (ACQH) method is proposed. Specifically, ACQH learns the projection matrixs of heterogeneous modalities data points for transforming query into a low-dimensional real-valued vector in latent semantic space and constructs the stacked compositional quantization embedding in a coarse-to-fine manner for indicating database points by a series of learnt real-valued codeword in the codebook with the help of pointwise label information regression simultaneously. Besides, the unified hash codes across modalities can be directly obtained by the discrete iterative optimization framework devised in the paper. Comprehensive experiments on diverse three benchmark datasets have shown the effectiveness and rationality of ACQH.Comment: 12 page
    corecore