5 research outputs found

    Enhanced Discrete Multi-modal Hashing: More Constraints yet Less Time to Learn (Extended Abstract)

    Get PDF
    This paper proposes a novel method, Enhanced Discrete Multi-modal Hashing (EDMH), which learns binary codes and hash functions simultaneously from the pairwise similarity matrix of data for large-scale cross-view retrieval. EDMH distinguishes itself from existing methods by considering not just the binarization constraint but also the balance and decorrelation constraints. Although those additional discrete constraints make the optimization problem of EDMH look a lot more complicated, we are actually able to develop a fast iterative learning algorithm in the alternating optimization framework for it, as after introducing a couple of auxiliary variables each subproblem of optimization turns out to have closed-form solutions. It has been confirmed by extensive experiments that EDMH can consistently deliver better retrieval performances than state-of-the-art MH methods at lower computational costs

    Enhanced Discrete Multi-modal Hashing: more constraints yet less time to learn

    Get PDF
    Due to the exponential growth of multimedia data, multi-modal hashing as a promising technique to make cross-view retrieval scalable is attracting more and more attention. However, most of the existing multi-modal hashing methods either divide the learning process unnaturally into two separate stages or treat the discrete optimization problem simplistically as a continuous one, which leads to suboptimal results. Recently, a few discrete multi-modal hashing methods that try to address such issues have emerged, but they still ignore several important discrete constraints (such as the balance and decorrelation of hash bits). In this paper, we overcome those limitations by proposing a novel method named "Enhanced Discrete Multi-modal Hashing (EDMH)" which learns binary codes and hashing functions simultaneously from the pairwise similarity matrix of data, under the aforementioned discrete constraints. Although the model of EDMH looks a lot more complex than the other models for multi-modal hashing, we are actually able to develop a fast iterative learning algorithm for it, since the subproblems of its optimization all have closed-form solutions after introducing two auxiliary variables. Our experimental results on three real-world datasets have revealed the usefulness of those previously ignored discrete constraints and demonstrated that EDMH not only performs much better than state-of-the-art competitors according to several retrieval metrics but also runs much faster than most of them

    New ideas and trends in deep multimodal content understanding: a review

    Get PDF
    The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering) multimodal tasks. Besides, we analyze two aspects of the challenge in terms of better content understanding in deep multimodal applications. We then introduce current ideas and trends in deep multimodal feature learning, such as feature embedding approaches and objective function design, which are crucial in overcoming the aforementioned challenges. Finally, we include several promising directions for future research.Computer Systems, Imagery and Medi

    Generalized Semantic Preserving Hashing for Cross-Modal Retrieval

    No full text
    Cross-modal retrieval is gaining importance due to the availability of large amounts of multimedia data. Hashingbased techniques provide an attractive solution to this problem when the data size is large. For cross-modal retrieval, data from the two modalities may be associated with a single label or multiple labels, and in addition, may or may not have a one-to-one correspondence. This work proposes a simple hashing framework which has the capability to work with different scenarios while effectively capturing the semantic relationship between the data items. The work proceeds in two stages in which the first stage learns the optimum hash codes by factorizing an affinity matrix, constructed using the label information. In the second stage, ridge regression and kernel logistic regression is used to learn the hash functions for mapping the input data to the bit domain. We also propose a novel iterative solution for cases where the training data is very large, or when the whole training data is not available at once. Extensive experiments on single label data set like Wiki and multi-label datasets like MirFlickr, NUS-WIDE, Pascal, and LabelMe, and comparisons with the state-of-the-art, shows the usefulness of the proposed approach
    corecore