22 research outputs found
People tracking and re-identification by face recognition for RGB-D camera networks
This paper describes a face recognition-based people tracking and re-identification system for RGB-D camera networks. The system tracks people and learns their faces online to keep track of their identities even if they move out from the camera's field of view once. For robust people re-identification, the system exploits the combination of a deep neural network- based face representation and a Bayesian inference-based face classification method. The system also provides a predefined people identification capability: it associates the online learned faces with predefined people face images and names to know the people's whereabouts, thus, allowing a rich human-system interaction. Through experiments, we validate the re-identification and the predefined people identification capabilities of the system and show an example of the integration of the system with a mobile robot. The overall system is built as a Robot Operating System (ROS) module. As a result, it simplifies the integration with the many existing robotic systems and algorithms which use such middleware. The code of this work has been released as open-source in order to provide a baseline for the future publications in this field
A multi-viewpoint feature-based re-identification system driven by skeleton keypoints
Thanks to the increasing popularity of 3D sensors, robotic vision has experienced huge improvements in a wide range of applications and systems in the last years. Besides the many benefits, this migration caused some incompatibilities with those systems that cannot be based on range sensors, like intelligent video surveillance systems, since the two kinds of sensor data lead to different representations of people and objects. This work goes in the direction of bridging the gap, and presents a novel re-identification system that takes advantage of multiple video flows in order to enhance the performance of a skeletal tracking algorithm, which is in turn exploited for driving the re-identification. A new, geometry-based method for joining together the detections provided by the skeletal tracker from multiple video flows is introduced, which is capable of dealing with many people in the scene, coping with the errors introduced in each view by the skeletal tracker. Such method has a high degree of generality, and can be applied to any kind of body pose estimation algorithm. The system was tested on a public dataset for video surveillance applications, demonstrating the improvements achieved by the multi-viewpoint approach in the accuracy of both body pose estimation and re-identification. The proposed approach was also compared with a skeletal tracking system working on 3D data: the comparison assessed the good performance level of the multi-viewpoint approach. This means that the lack of the rich information provided by 3D sensors can be compensated by the availability of more than one viewpoint
Learning nuisances to track pedestrians in autonomous vehicles
Autonomous vehicles rely on an accurate perception module. One of the fundamental challenges is to efficiently track pedestrians surrounding a vehicle to anticipate risky situations. Over the past decades, researchers have formulated the tracking problem as a data association one where they proposed various representations aiming for invariance to nuisances such as viewpoint changes, body deformation, object occlusion, and illumination changes. However, these methods still suffer to address abrupt changes since they do not explicitly model the nature of the nuisances. In this work, we propose to train a classifier that recognizes these nuisances, more specifically rotational body deformation of pedestrians. We aim to detect deformations as a method to find a good representation that will lead to better tracking of pedestrians as well as other tasks
Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification
Gait-based person re-identification (Re-ID) is valuable for safety-critical
applications, and using only 3D skeleton data to extract discriminative gait
features for person Re-ID is an emerging open topic. Existing methods either
adopt hand-crafted features or learn gait features by traditional supervised
learning paradigms. Unlike previous methods, we for the first time propose a
generic gait encoding approach that can utilize unlabeled skeleton data to
learn gait representations in a self-supervised manner. Specifically, we first
propose to introduce self-supervision by learning to reconstruct input skeleton
sequences in reverse order, which facilitates learning richer high-level
semantics and better gait representations. Second, inspired by the fact that
motion's continuity endows temporally adjacent skeletons with higher
correlations ("locality"), we propose a locality-aware attention mechanism that
encourages learning larger attention weights for temporally adjacent skeletons
when reconstructing current skeleton, so as to learn locality when encoding
gait. Finally, we propose Attention-based Gait Encodings (AGEs), which are
built using context vectors learned by locality-aware attention, as final gait
representations. AGEs are directly utilized to realize effective person Re-ID.
Our approach typically improves existing skeleton-based methods by 10-20%
Rank-1 accuracy, and it achieves comparable or even superior performance to
multi-modal methods with extra RGB or depth information. Our codes are
available at https://github.com/Kali-Hac/SGE-LA.Comment: Accepted at IJCAI 2020 Main Track. Sole copyright holder is IJCAI.
Codes are available at https://github.com/Kali-Hac/SGE-L
AveroBot: An audio-visual dataset for people re-identification and verification in human-robot interaction
Intelligent technologies have pervaded our daily life, making it easier for people to complete their activities. One emerging application is involving the use of robots for assisting people in various tasks (e.g., visiting a museum). In this context, it is crucial to enable robots to correctly identify people. Existing robots often use facial information to establish the identity of a person of interest. But, the face alone may not offer enough relevant information due to variations in pose, illumination, resolution and recording distance. Other biometric modalities like the voice can improve the recognition performance in these conditions. However, the existing datasets in robotic scenarios usually do not include the audio cue and tend to suffer from one or more limitations: most of them are acquired under controlled conditions, limited in number of identities or samples per user, collected by the same recording device, and/or not freely available. In this paper, we propose AveRobot, an audio-visual dataset of 111 participants vocalizing short sentences under robot assistance scenarios. The collection took place into a three-floor building through eight different cameras with built-in microphones. The performance for face and voice re-identification and verification was evaluated on this dataset with deep learning baselines, and compared against audio-visual datasets from diverse scenarios. The results showed that AveRobot is a challenging dataset for people re-identification and verification
LiDAR-based Person Re-identification
Camera-based person re-identification (ReID) systems have been widely applied
in the field of public security. However, cameras often lack the perception of
3D morphological information of human and are susceptible to various
limitations, such as inadequate illumination, complex background, and personal
privacy. In this paper, we propose a LiDAR-based ReID framework, ReID3D, that
utilizes pre-training strategy to retrieve features of 3D body shape and
introduces Graph-based Complementary Enhancement Encoder for extracting
comprehensive features. Due to the lack of LiDAR datasets, we build LReID, the
first LiDAR-based person ReID dataset, which is collected in several outdoor
scenes with variations in natural conditions. Additionally, we introduce
LReID-sync, a simulated pedestrian dataset designed for pre-training encoders
with tasks of point cloud completion and shape parameter learning. Extensive
experiments on LReID show that ReID3D achieves exceptional performance with a
rank-1 accuracy of 94.0, highlighting the significant potential of LiDAR in
addressing person ReID tasks. To the best of our knowledge, we are the first to
propose a solution for LiDAR-based ReID. The code and datasets will be released
soon
Hand-Crafted System for Person Re-Identification:A Comprehensive Review
International audienceIn video surveillance, Person Re-Identification(Re-ID) consists in recognizing an individual who has already been observed (hence the term Re-Identification) over a network of cameras. Usually, the person Re-Id system is divided into two stages: i)constructing a person's appearance signature by extracting feature representations which should be robust against pose variations, illumination changes and occlusions and ii)Establishing the correspondence/matching between feature representations of probe and gallery by learning similarity metrics or ranking functions. A gallery is a dataset composed of images of people with known IDs whereas a probe is collected of detected persons with unknown IDs from different cameras. Specifically, the process of person Re-Identification aims essentially at matching individuals across non-overlapping cameras at different instants and locations. However, the matching is challenging due to disparities of human bodies and visual ambiguities across different cameras. This paper provides an overview of hand-crafted system for person Re-identification, including features extraction and metric learning as well as their advantages and drawbacks. The performance of some state-of-the-art person Re-ID methods on the commonly used benchmark datasets is compared and analyzed. It also provides a starting point for researchers who want to conduct novel investigations on this challenging topic