6 research outputs found
TTT-UCDR: Test-time Training for Universal Cross-Domain Retrieval
Image retrieval under generalized test scenarios has gained significant
momentum in literature, and the recently proposed protocol of Universal
Cross-domain Retrieval is a pioneer in this direction. A common practice in any
such generalized classification or retrieval algorithm is to exploit samples
from multiple domains during training to learn a domain-invariant
representation of data. Such criterion is often restrictive, and thus in this
work, for the first time, we explore the challenges associated with generalized
retrieval problems under a low-data regime, which is quite relevant in many
real-world scenarios. We attempt to make any retrieval model trained on a small
cross-domain dataset (containing just two training domains) more generalizable
towards any unknown query domain or category by quickly adapting it to the test
data during inference. This form of test-time training or adaptation of the
retrieval model is explored by means of a number of self-supervision-based loss
functions, for example, Rotnet, Jigsaw-puzzle, Barlow twins, etc., in this
work. Extensive experiments on multiple large-scale datasets demonstrate the
effectiveness of the proposed approach.Comment: 9 pages, 1 figure, 3 table
Generalized Zero-Shot Cross-Modal Retrieval
Cross-modal retrieval is an important research area due to its wide range of applications, and several algorithms have been proposed to address this task. We feel that it is the right time to take a step hack and analyze the current status of research in this area. As new object classes are continuously being discovered over time, it is necessary to design algorithms that can generalize to data from previously unseen classes. Towards that goal, our first contribution is to establish protocols for generalized zero-shot cross-modal retrieval and analyze the generalization ability of the standard cross-modal algorithms. Second, we propose a semantic-aware ranking algorithm that can be used as an add-on to any existing cross-modal approach to improve its performance on both seen and unseen classes. Finally, we propose a modification of the standard evaluation metric (MAP for single-label data and NUCG for multi-label data), which we feel is a more intuitive measure of the cross-modal retrieval performance. Extensive experiments on two single-label and three multi-label crass-modal datasets show the effectiveness of the proposed approach
Cross-modal retrieval in challenging scenarios using attributes
Cross-modal retrieval is an important field of research today because of the abundance of multi-media data. In this work, we attempt to address two challenging scenarios that we may encounter in real-life cross-modal retrieval, but which are relatively unexplored in literature. First, due to the ever-increasing number of new categories of data, cross-modal algorithms should be able to generalize to categories which it has not seen during training. Second, the data that is available during testing may be degraded (for example, it has low resolution or noise) as compared to those available during training. Here, we evaluate how these adverse conditions affect the performance of the state-of-the-art cross-modal approaches. We also propose a unified framework that can handle all these diverse and challenging scenarios without any modification. In the proposed approach, the data from different modalities are projected into a common semantic preserving latent space in which semantic relations as given by the classname embeddings (attributes) are preserved. Extensive experiments on diverse cross-modal data including image-text, RGB-depth and comparison with the state-of-the-art approaches show the usefulness of the proposed approach for these challenging scenarios