2 research outputs found

    AXM-Net: Cross-Modal Context Sharing Attention Network for Person Re-ID

    Full text link
    Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align inter-modality representations according to semantic information present for a person and ignore background information. In this work, we present AXM-Net, a novel CNN based architecture designed for learning semantically aligned visual and textual representations. The underlying building block consists of multiple streams of feature maps coming from visual and textual modalities and a novel learnable context sharing semantic alignment network. We also propose complementary intra modal attention learning mechanisms to focus on more fine-grained local details in the features along with a cross-modal affinity loss for robust feature matching. Our design is unique in its ability to implicitly learn feature alignments from data. The entire AXM-Net can be trained in an end-to-end manner. We report results on both person search and cross-modal Re-ID tasks. Extensive experimentation validates the proposed framework and demonstrates its superiority by outperforming the current state-of-the-art methods by a significant margin

    Échantillonnage sĂ©mantique et classification des couleurs pour la recherche de personnes dans les vidĂ©os

    Get PDF
    RÉSUMÉ : Dans le cadre de ce travail, nous nous intĂ©ressons Ă  l’application de la recherche de personnes par une description par mots-clefs, dans des images issues de vidĂ©os de sĂ©curitĂ©. Avec ce type d’application, nous cherchons Ă  dĂ©crire les personnes prĂ©sentes dans des vidĂ©os, selon des caractĂ©ristiques saillantes (e.g. couleurs des vĂȘtements), sans nous intĂ©resser Ă  leurs identitĂ©s qui sont inconnues a priori. Il s’agit ainsi d’une approche similaire Ă  la gĂ©nĂ©ration de "portraits robots", qui peuvent alors ĂȘtre attachĂ©s aux images comme mĂ©ta-donnĂ©es, facilitant ainsi la recherche de profils particuliers dans des vidĂ©os (e.g. on cherche une personne avec un pantalon vert et un gilet bleu). Dans un premier temps, nous identifions comme caractĂ©ristiques saillantes les vĂȘtements, et en particulier nous cherchons Ă  dĂ©crire prĂ©cisĂ©ment leurs couleurs. L’objectif de notre travail est ainsi le dĂ©veloppement d’une mĂ©thode de classification des couleurs des vĂȘtements dans les images qui se veut le plus proche possible de la perception humaine. Pour cela, nous nous intĂ©ressons Ă  la fois au vocabulaire utilisĂ©, qui se doit d’ĂȘtre suffisamment gĂ©nĂ©raliste, et aux espaces de couleurs considĂ©rĂ©s, qui doivent pouvoir Ă©muler les propriĂ©tĂ©s de la perception des couleurs humaine.----------ABSTRACT : In this work, we look into the application of person search, in the context of images taken from security cameras. In this type of application, we aim to describe the persons present in videos, according to their characteristics, such as the color of their clothes. The application of person search can be likened to using composite portraits, where persons are described by their semantic parts (such as t-shirt, hat, pants..), and one is looking for persons with specific characteristics. In our case, these characteristics can then be attached to images as metadata, facilitating the search for specific profiles in videos, for example, a person with green pants and a blue shirt. In a first part, we identify the clothes being worn as salient characteristics, and in particular we decide to look into describing very precisely their colors. The goal of our work is thus to propose a color classification method for images, that aims to be as close as possible to human perception
    corecore