836 research outputs found

    Ranked List Loss for Deep Metric Learning

    Full text link
    The objective of deep metric learning (DML) is to learn embeddings that can capture semantic similarity and dissimilarity information among data points. Existing pairwise or tripletwise loss functions used in DML are known to suffer from slow convergence due to a large proportion of trivial pairs or triplets as the model improves. To improve this, ranking-motivated structured losses are proposed recently to incorporate multiple examples and exploit the structured information among them. They converge faster and achieve state-of-the-art performance. In this work, we unveil two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them. First, given a query, only a fraction of data points is incorporated to build the similarity structure. Consequently, some useful examples are ignored and the structure is less informative. To address this, we propose to build a set-based similarity structure by exploiting all instances in the gallery. The learning setting can be interpreted as few-shot retrieval: given a mini-batch, every example is iteratively used as a query, and the rest ones compose the gallery to search, i.e., the support set in few-shot setting. The rest examples are split into a positive set and a negative set. For every mini-batch, the learning objective of ranked list loss is to make the query closer to the positive set than to the negative set by a margin. Second, previous methods aim to pull positive pairs as close as possible in the embedding space. As a result, the intraclass data distribution tends to be extremely compressed. In contrast, we propose to learn a hypersphere for each class in order to preserve useful similarity structure inside it, which functions as regularisation. Extensive experiments demonstrate the superiority of our proposal by comparing with the state-of-the-art methods.Comment: Accepted to T-PAMI. Therefore, to read the offical version, please go to IEEE Xplore. Fine-grained image retrieval task. Our source code is available online: https://github.com/XinshaoAmosWang/Ranked-List-Loss-for-DM

    Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search

    Full text link
    Text-based person search aims to retrieve the corresponding person images in an image database by virtue of a describing sentence about the person, which poses great potential for various applications such as video surveillance. Extracting visual contents corresponding to the human description is the key to this cross-modal matching problem. Moreover, correlated images and descriptions involve different granularities of semantic relevance, which is usually ignored in previous methods. To exploit the multilevel corresponding visual contents, we propose a pose-guided multi-granularity attention network (PMA). Firstly, we propose a coarse alignment network (CA) to select the related image regions to the global description by a similarity-based attention. To further capture the phrase-related visual body part, a fine-grained alignment network (FA) is proposed, which employs pose information to learn latent semantic alignment between visual body part and textual noun phrase. To verify the effectiveness of our model, we perform extensive experiments on the CUHK Person Description Dataset (CUHK-PEDES) which is currently the only available dataset for text-based person search. Experimental results show that our approach outperforms the state-of-the-art methods by 15 \% in terms of the top-1 metric.Comment: published in AAAI2020(oral

    A Simple Deep Learning Architecture for City-scale Vehicle Re-identification

    Get PDF
    The task of vehicle re-identification aims to identify a vehicle across different cameras with non overlapping fields of view and it is a challenging research problem due to viewpoint orientation, scene occlusions and intrinsic inter-class similarity of the data. In this paper, we propose a simplistic approach for one-shot vehicle re-identification based on a siamese/triple convolutional architecture for feature representation. Our method involves learning a feature space in which the vehicles of the same identities are projected closer to one another compared to those with different identities. Moreover, we provide an extensive evaluation of loss functions, including a novel combination of triplet loss with classification loss, and other network parameters applied to our vehicle re-identification system. Compared to most existing state-of-the-art approaches, our approach is simpler and more straightforward for training, utilizing only identity-level annotations. The proposed method is evaluated on the large-scale CityFlow-ReID dataset

    Attention in Multimodal Neural Networks for Person Re-identification

    Get PDF
    • …
    corecore