9,940 research outputs found

    Multi-label modality enhanced attention based self-supervised deep cross-modal hashing

    Get PDF
    The recent deep cross-modal hashing (DCMH) has achieved superior performance in effective and efficient cross-modal retrieval and thus has drawn increasing attention. Nevertheless, there are still two limitations for most existing DCMH methods: (1) single labels are usually leveraged to measure the semantic similarity of cross-modal pairwise instances while neglecting that many cross-modal datasets contain abundant semantic information among multi-labels. (2) several DCMH methods utilized the multi-labels to supervise the learning of hash functions. Nevertheless, the feature space of multilabels suffers the weakness of sparse, resulting in sub-optimization for the hash functions learning. Thus, this paper proposed a multi-label modality enhanced attention-based self-supervised deep cross-modal hashing (MMACH) framework. Specifically, a multi-label modality enhanced attention module is designed to integrate the significant features from cross-modal data into multi-labels feature representations, aiming to improve its completion. Moreover, a multi-label cross-modal triplet loss is defined based on the criterion that the feature representations of cross-modal pairwise instances with more common categories should preserve higher semantic similarity than other instances. To the best of our knowledge, the multi-label cross-modal triplet loss is the first time designed for cross-modal retrieval. Extensive experiments on four multi-label cross-modal datasets demonstrate the effectiveness and efficiency of our proposed MMACH. Moreover, the MMACH also achieved superior performance and outperformed several state-of-the-art methods on the task of cross-modal retrieval. The source code of MMACH is available at https://github.com/SWU-CS-MediaLab/MMACH. (c) 2021 Elsevier B.V. All rights reserved.Computer Systems, Imagery and Medi

    Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes

    Full text link
    In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.Comment: The paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. It will apear in a future issu

    Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

    Get PDF
    Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary in terms of their targeted tasks and methodology, they rely on the key functionality of tag relevance, i.e. estimating the relevance of a specific tag with respect to the visual content of a given image and its social context. By analyzing what information a specific method exploits to construct its tag relevance function and how such information is exploited, this paper introduces a taxonomy to structure the growing literature, understand the ingredients of the main works, clarify their connections and difference, and recognize their merits and limitations. For a head-to-head comparison between the state-of-the-art, a new experimental protocol is presented, with training sets containing 10k, 100k and 1m images and an evaluation on three test sets, contributed by various research groups. Eleven representative works are implemented and evaluated. Putting all this together, the survey aims to provide an overview of the past and foster progress for the near future.Comment: to appear in ACM Computing Survey

    Learning a Policy for Opportunistic Active Learning

    Full text link
    Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.Comment: EMNLP 2018 Camera Read
    • …
    corecore