6 research outputs found

    TTT-UCDR: Test-time Training for Universal Cross-Domain Retrieval

    Full text link
    Image retrieval under generalized test scenarios has gained significant momentum in literature, and the recently proposed protocol of Universal Cross-domain Retrieval is a pioneer in this direction. A common practice in any such generalized classification or retrieval algorithm is to exploit samples from multiple domains during training to learn a domain-invariant representation of data. Such criterion is often restrictive, and thus in this work, for the first time, we explore the challenges associated with generalized retrieval problems under a low-data regime, which is quite relevant in many real-world scenarios. We attempt to make any retrieval model trained on a small cross-domain dataset (containing just two training domains) more generalizable towards any unknown query domain or category by quickly adapting it to the test data during inference. This form of test-time training or adaptation of the retrieval model is explored by means of a number of self-supervision-based loss functions, for example, Rotnet, Jigsaw-puzzle, Barlow twins, etc., in this work. Extensive experiments on multiple large-scale datasets demonstrate the effectiveness of the proposed approach.Comment: 9 pages, 1 figure, 3 table

    Generalized Zero-Shot Cross-Modal Retrieval

    No full text
    Cross-modal retrieval is an important research area due to its wide range of applications, and several algorithms have been proposed to address this task. We feel that it is the right time to take a step hack and analyze the current status of research in this area. As new object classes are continuously being discovered over time, it is necessary to design algorithms that can generalize to data from previously unseen classes. Towards that goal, our first contribution is to establish protocols for generalized zero-shot cross-modal retrieval and analyze the generalization ability of the standard cross-modal algorithms. Second, we propose a semantic-aware ranking algorithm that can be used as an add-on to any existing cross-modal approach to improve its performance on both seen and unseen classes. Finally, we propose a modification of the standard evaluation metric (MAP for single-label data and NUCG for multi-label data), which we feel is a more intuitive measure of the cross-modal retrieval performance. Extensive experiments on two single-label and three multi-label crass-modal datasets show the effectiveness of the proposed approach

    Generalized Zero-Shot Cross-Modal Retrieval

    No full text

    Cross-modal retrieval in challenging scenarios using attributes

    No full text
    Cross-modal retrieval is an important field of research today because of the abundance of multi-media data. In this work, we attempt to address two challenging scenarios that we may encounter in real-life cross-modal retrieval, but which are relatively unexplored in literature. First, due to the ever-increasing number of new categories of data, cross-modal algorithms should be able to generalize to categories which it has not seen during training. Second, the data that is available during testing may be degraded (for example, it has low resolution or noise) as compared to those available during training. Here, we evaluate how these adverse conditions affect the performance of the state-of-the-art cross-modal approaches. We also propose a unified framework that can handle all these diverse and challenging scenarios without any modification. In the proposed approach, the data from different modalities are projected into a common semantic preserving latent space in which semantic relations as given by the classname embeddings (attributes) are preserved. Extensive experiments on diverse cross-modal data including image-text, RGB-depth and comparison with the state-of-the-art approaches show the usefulness of the proposed approach for these challenging scenarios
    corecore