179 research outputs found

    MESH : a flexible manifold-embedded semantic hashing for cross-modal retrieval

    Get PDF
    Hashing based methods for cross-modal retrieval has been widely explored in recent years. However, most of them mainly focus on the preservation of neighborhood relationship and label consistency, while ignore the proximity of neighbors and proximity of classes, which degrades the discrimination of hash codes. And most of them learn hash codes and hashing functions simultaneously, which limits the flexibility of algorithms. To address these issues, in this article, we propose a two-step cross-modal retrieval method named Manifold-Embedded Semantic Hashing (MESH). It exploits Local Linear Embedding to model the neighborhood proximity and uses class semantic embeddings to consider the proximity of classes. By so doing, MESH can not only extract the manifold structure in different modalities, but also can embed the class semantic information into hash codes to further improve the discrimination of learned hash codes. Moreover, the two-step scheme makes MESH flexible to various hashing functions. Extensive experimental results on three datasets show that MESH is superior to 10 state-of-the-art cross-modal hashing methods. Moreover, MESH also demonstrates superiority on deep features compared with the deep cross-modal hashing method. © 2013 IEEE

    Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval

    Get PDF
    In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data. We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos. The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval. We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features\u27 ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms. Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method
    • …
    corecore