4 research outputs found
Diverse Knowledge Distillation for End-to-End Person Search
Person search aims to localize and identify a specific person from a gallery
of images. Recent methods can be categorized into two groups, i.e., two-step
and end-to-end approaches. The former views person search as two independent
tasks and achieves dominant results using separately trained person detection
and re-identification (Re-ID) models. The latter performs person search in an
end-to-end fashion. Although the end-to-end approaches yield higher inference
efficiency, they largely lag behind those two-step counterparts in terms of
accuracy. In this paper, we argue that the gap between the two kinds of methods
is mainly caused by the Re-ID sub-networks of end-to-end methods. To this end,
we propose a simple yet strong end-to-end network with diverse knowledge
distillation to break the bottleneck. We also design a spatial-invariant
augmentation to assist model to be invariant to inaccurate detection results.
Experimental results on the CUHK-SYSU and PRW datasets demonstrate the
superiority of our method against existing approaches -- it achieves on par
accuracy with state-of-the-art two-step methods while maintaining high
efficiency due to the single joint model. Code is available at:
https://git.io/DKD-PersonSearch.Comment: Accepted to AAAI, 2021. Code is available at:
https://git.io/DKD-PersonSearc
Ground-to-Aerial Person Search: Benchmark Dataset and Approach
In this work, we construct a large-scale dataset for Ground-to-Aerial Person
Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding
boxes for 2,644 identities appearing in both of the UAVs and ground
surveillance cameras. To our knowledge, this is the first dataset for
cross-platform intelligent surveillance applications, where the UAVs could work
as a powerful complement for the ground surveillance cameras. To more
realistically simulate the actual cross-platform Ground-to-Aerial surveillance
scenarios, the surveillance cameras are fixed about 2 meters above the ground,
while the UAVs capture videos of persons at different location, with a variety
of view-angles, flight attitudes and flight modes. Therefore, the dataset has
the following unique characteristics: 1) drastic view-angle changes between
query and gallery person images from cross-platform cameras; 2) diverse
resolutions, poses and views of the person images under 9 rich real-world
scenarios. On basis of the G2APS benchmark dataset, we demonstrate detailed
analysis about current two-step and end-to-end person search methods, and
further propose a simple yet effective knowledge distillation scheme on the
head of the ReID network, which achieves state-of-the-art performances on both
of the G2APS and the previous two public person search datasets, i.e., PRW and
CUHK-SYSU. The dataset and source code available on
\url{https://github.com/yqc123456/HKD_for_person_search}.Comment: Accepted by ACM MM 202
Ăchantillonnage sĂ©mantique et classification des couleurs pour la recherche de personnes dans les vidĂ©os
RĂSUMĂ : Dans le cadre de ce travail, nous nous intĂ©ressons Ă lâapplication de la recherche de personnes par une description par mots-clefs, dans des images issues de vidĂ©os de sĂ©curitĂ©. Avec ce type dâapplication, nous cherchons Ă dĂ©crire les personnes prĂ©sentes dans des vidĂ©os, selon des caractĂ©ristiques saillantes (e.g. couleurs des vĂȘtements), sans nous intĂ©resser Ă leurs identitĂ©s qui sont inconnues a priori. Il sâagit ainsi dâune approche similaire Ă la gĂ©nĂ©ration de "portraits robots", qui peuvent alors ĂȘtre attachĂ©s aux images comme mĂ©ta-donnĂ©es, facilitant ainsi la recherche de profils particuliers dans des vidĂ©os (e.g. on cherche une personne avec un pantalon vert et un gilet bleu). Dans un premier temps, nous identifions comme caractĂ©ristiques saillantes les vĂȘtements, et en particulier nous cherchons Ă dĂ©crire prĂ©cisĂ©ment leurs couleurs. Lâobjectif de notre travail est ainsi le dĂ©veloppement dâune mĂ©thode de classification des couleurs des vĂȘtements dans les images qui se veut le plus proche possible de la perception humaine. Pour cela, nous nous intĂ©ressons Ă la fois au vocabulaire utilisĂ©, qui se doit dâĂȘtre suffisamment gĂ©nĂ©raliste, et aux espaces de couleurs considĂ©rĂ©s, qui doivent pouvoir Ă©muler les propriĂ©tĂ©s de la perception des couleurs humaine.----------ABSTRACT : In this work, we look into the application of person search, in the context of images taken from security cameras. In this type of application, we aim to describe the persons present in videos, according to their characteristics, such as the color of their clothes. The application of person search can be likened to using composite portraits, where persons are described by their semantic parts (such as t-shirt, hat, pants..), and one is looking for persons with specific characteristics. In our case, these characteristics can then be attached to images as metadata, facilitating the search for specific profiles in videos, for example, a person with green pants and a blue shirt. In a first part, we identify the clothes being worn as salient characteristics, and in particular we decide to look into describing very precisely their colors. The goal of our work is thus to propose a color classification method for images, that aims to be as close as possible to human perception