review journal article

Dual alignment : Partial negative and soft-label alignment for text-to-image person retrieval

Abstract

Text-to-image person retrieval is a task to retrieve the right matched images based on a given textual description of the interested person. The main challenge lies in the inherent modal difference between texts and images. Most existing works narrow the modality gap by aligning the feature representations of text and image in a latent embedding space. However, these methods usually leverage the hard label and mine insufficient or incorrect hard negatives to achieve cross-modal alignment, generating incorrect hard negative pairs so as to suboptimal performance. To tackle the above problems, we propose a dual alignment framework, Partial negative and Soft-label Alignment (PASA), which includes the partial negative alignment (PA) strategy and the Soft-label Alignment (SA) strategy. Specifically, PA pushes far away the hard negatives in the triplet loss by considering a certain amount of negatives within each mini-batch as hard negatives, preventing the distraction to the positive text–image pairs. Based on PA, SA further achieves the alignment between the similarity distribution on these hard negatives by the manner of soft-label, as well as the alignment between inter-modal and intra-modal. Extensive experiments on three public datasets, CUHK-PEDES, ICFG-PEDES and RSTPReid, demonstrate that our proposed PASA method can consistently improve the performance of text-to-image person retrieval, and achieve new state-of-the-art results on the above three datasets

Similar works

This paper was published in Lancaster E-Prints.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.