Search CORE

2 research outputs found

AXM-Net: Cross-Modal Context Sharing Attention Network for Person Re-ID

Author: Awais Muhammad
Farooq Ammarah
Khalid Syed Safwan
Kittler Josef
Publication venue
Publication date: 19/03/2021
Field of study

Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align inter-modality representations according to semantic information present for a person and ignore background information. In this work, we present AXM-Net, a novel CNN based architecture designed for learning semantically aligned visual and textual representations. The underlying building block consists of multiple streams of feature maps coming from visual and textual modalities and a novel learnable context sharing semantic alignment network. We also propose complementary intra modal attention learning mechanisms to focus on more fine-grained local details in the features along with a cross-modal affinity loss for robust feature matching. Our design is unique in its ability to implicitly learn feature alignments from data. The entire AXM-Net can be trained in an end-to-end manner. We report results on both person search and cross-modal Re-ID tasks. Extensive experimentation validates the proposed framework and demonstrates its superiority by outperforming the current state-of-the-art methods by a significant margin

arXiv.org e-Print Archive

Échantillonnage sémantique et classification des couleurs pour la recherche de personnes dans les vidéos

Author: Simon Jules
Publication venue
Publication date: 01/08/2020
Field of study

RÉSUMÉ : Dans le cadre de ce travail, nous nous intéressons à l’application de la recherche de personnes par une description par mots-clefs, dans des images issues de vidéos de sécurité. Avec ce type d’application, nous cherchons à décrire les personnes présentes dans des vidéos, selon des caractéristiques saillantes (e.g. couleurs des vêtements), sans nous intéresser à leurs identités qui sont inconnues a priori. Il s’agit ainsi d’une approche similaire à la génération de "portraits robots", qui peuvent alors être attachés aux images comme méta-données, facilitant ainsi la recherche de profils particuliers dans des vidéos (e.g. on cherche une personne avec un pantalon vert et un gilet bleu). Dans un premier temps, nous identifions comme caractéristiques saillantes les vêtements, et en particulier nous cherchons à décrire précisément leurs couleurs. L’objectif de notre travail est ainsi le développement d’une méthode de classification des couleurs des vêtements dans les images qui se veut le plus proche possible de la perception humaine. Pour cela, nous nous intéressons à la fois au vocabulaire utilisé, qui se doit d’être suffisamment généraliste, et aux espaces de couleurs considérés, qui doivent pouvoir émuler les propriétés de la perception des couleurs humaine.----------ABSTRACT : In this work, we look into the application of person search, in the context of images taken from security cameras. In this type of application, we aim to describe the persons present in videos, according to their characteristics, such as the color of their clothes. The application of person search can be likened to using composite portraits, where persons are described by their semantic parts (such as t-shirt, hat, pants..), and one is looking for persons with specific characteristics. In our case, these characteristics can then be attached to images as metadata, facilitating the search for specific profiles in videos, for example, a person with green pants and a blue shirt. In a first part, we identify the clothes being worn as salient characteristics, and in particular we decide to look into describing very precisely their colors. The goal of our work is thus to propose a color classification method for images, that aims to be as close as possible to human perception

PolyPublie