10 research outputs found

    Internet cross-media retrieval based on deep learning

    Get PDF
    With the development of Internet, multimedia information such as image and video is widely used. Therefore, how to find the required multimedia data quickly and accurately in a large number of resources , has become a research focus in the field of information process. In this paper, we propose a real time internet cross-media retrieval method based on deep learning. As an innovation, we have made full improvement in feature extracting and distance detection. After getting a large amount of image feature vectors, we sort the elements in the vector according to their contribution and then eliminate unnecessary features. Experiments show that our method can achieve high precision in image-text cross media retrieval, using less retrieval time. This method has a great application space in the field of cross media retrieval

    Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval

    No full text
    A better similarity mapping function across heterogeneous high-dimensional features is very desirable for many applications involving multi-modal data. In this paper, we introduce coupled dictionary learning (DL) into supervised sparse coding for multi-modal (cross-media) retrieval. We call this Supervised coupled dictionary learning with group structures for Multi-Modal retrieval (SliM2). SliM2 formulates the multi-modal mapping as a constrained dictionary learning problem. By utilizing the intrinsic power of DL to deal with the heterogeneous features, SliM2 extends unimodal DL to multi-modal DL. Moreover, the label information is employed in SliM2 to discover the shared structure inside intra-modality within the same class by a mixed norm (i.e., `l1/l2`-norm). As a result, the multimodal retrieval is conducted via a set of jointly learned mapping functions across multi-modal data. The experimental results show the effectiveness of our proposed model when applied to cross-media retrieval

    Learning Multimodal Structures in Computer Vision

    Get PDF
    A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately. We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand. Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power. We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition
    corecore