3 research outputs found
Features for Multi-Target Multi-Camera Tracking and Re-Identification
Multi-Target Multi-Camera Tracking (MTMCT) tracks many people through video
taken from several cameras. Person Re-Identification (Re-ID) retrieves from a
gallery images of people similar to a person query image. We learn good
features for both MTMCT and Re-ID with a convolutional neural network. Our
contributions include an adaptive weighted triplet loss for training and a new
technique for hard-identity mining. Our method outperforms the state of the art
both on the DukeMTMC benchmarks for tracking, and on the Market-1501 and
DukeMTMC-ReID benchmarks for Re-ID. We examine the correlation between good
Re-ID and good MTMCT scores, and perform ablation studies to elucidate the
contributions of the main components of our system. Code is available.Comment: Accepted as spotlight at CVPR 201
Appearance Descriptors for Person Re-identification: a Comprehensive Review
In video-surveillance, person re-identification is the task of recognising
whether an individual has already been observed over a network of cameras.
Typically, this is achieved by exploiting the clothing appearance, as classical
biometric traits like the face are impractical in real-world video surveillance
scenarios. Clothing appearance is represented by means of low-level
\textit{local} and/or \textit{global} features of the image, usually extracted
according to some part-based body model to treat different body parts (e.g.
torso and legs) independently. This paper provides a comprehensive review of
current approaches to build appearance descriptors for person
re-identification. The most relevant techniques are described in detail, and
categorised according to the body models and features used. The aim of this
work is to provide a structured body of knowledge and a starting point for
researchers willing to conduct novel investigations on this challenging topic
Dissimilarity-based people re-identification and search for intelligent video surveillance
Intelligent video-surveillance is at present one of the most active research fields in computer science. It brings together a wide variety of computer vision and machine learning techniques to provide useful tools for surveillance operators and forensic video analytics. Person
re-identification is among these tools; it consists of recognising whether an individual has already been observed over a network of cameras. Person re-identification has various possible applications, e.g., off-line retrieval of all the video-sequences showing an individual of interest whose image is given as query, or on-line pedestrian tracking overmultiple cameras. The task is typically achieved by exploiting the clothing appearance, as classical biometric traits like the face are impractical in real-world video surveillance scenarios. Clothing appearance
is represented by means of low-level local and global features of the images, usually extracted according to some part-based body model to treat different body parts (e.g.
torso and legs) independently. The use of novel sensor technologies, e.g. RGB-D cameras like the MS Kinect, could also allow for the extraction of anthropometric measures from a reconstructed 3D model of the body, that can be used in combination with the clothing appearance to increase recognition accuracy. This thesis presents a novel framework, namedMultipleComponentDissimilarity (MCD),
to construct descriptors of images of persons, using dissimilarity representations, a recent paradigm in machine learning in which the objects of interest are described as vectors of dissimilarities to a set of predefined prototypes. MCD extends the original dissimilarity
paradigm to objects decomposable in multiple parts and with localised characteristics, to better deal with the peculiarities of the human body. The use of MCD has at least three important advantages:
(i) a drastic reduction of computational needs, mostly due to the compactness of dissimilarity
representations (basically, small vectors of real numbers, easy to store and very fast to be matched);
(ii) a totally generic formulation of the underlying low-level representation, that allows
one to combine different descriptors, even if they are heterogeneous in terms of the
model and features used, into a single dissimilarity vector;
(iii) it provides a natural way to learn high-level concepts from low-level representations. Building on its above salient features, MCD is used in this thesis to achieve several objectives:
(i) develop an approach to speed up existing person re-identification methods;
iii
(ii) implement a novel person re-identification method based on the combination of different
local and global features into a single dissimilarity vector, able to attain state-ofthe-
art performance;
(iv) develop a multi-modal approach to person re-identification (a novelty in the literature),
by combining the clothing appearance with anthropometric measures extracted
through the use of novel RGB-D sensors, into a single dissimilarity vector;
(v) develop a method to perform a novel task, proposed for the first time in this thesis,
consisting in finding, among a set of images of individuals, those relevant to a textual, semantic query describing clothing appearance of an individual of interest. This task has been named appearance-based people search and can be useful in applications like
forensics video analysis, where a textual description of the individual of interest given by a witness can be available, instead of an image. Person re-identification and appearance-based people search are different tasks, aimed at addressing different problems. Still, they can be seen as instances of the more general problem of searching and matching people on multi-media data, e.g., video footages, rangedepth data, speech audio data. Building on the commonalities with Information Retrieval, in the final part of the thesis, a possible formulation of the task of people search on multimedia data will be proposed, with some suggestions and guidelines on how to exploit the
MCD framework for addressing this novel class of problems