25 research outputs found
Optimisation du suivi de personnes dans un réseau de caméras
This thesis addresses the problem of improving the performance of people tracking process in a new framework called Global Tracker, which evaluates the quality of people trajectory (obtained by simple tracker) and recovers the potential errors from the previous stage. The first part of this Global Tracker estimates the quality of the tracking results, based on a statistical model analyzing the distribution of the target features to detect potential anomalies.To differentiate real errors from natural phenomena, we analyze all the interactions between the tracked object and its surroundings (other objects and background elements). In the second part, a post tracking method is designed to associate different tracklets (segments of trajectory) corresponding to the same person which were not associated by a first stage of tracking. This tracklet matching process selects the most relevant appearance features to compute a visual signature for each tracklet. Finally, the Global Tracker is evaluated with various benchmark datasets reproducing real-life situations, outperforming the state-of-the-art trackers.Cette thèse s’intéresse à l’amélioration des performances du processus de suivi de personnes dans un réseau de caméras et propose une nouvelle plate-forme appelée global tracker. Cette plate-forme évalue la qualité des trajectoires obtenues par un simple algorithme de suivi et permet de corriger les erreurs potentielles de cette première étape de suivi. La première partie de ce global tracker estime la qualité des trajectoires à partir d’un modèle statistique analysant des distributions des caractéristiques de la cible (ie : l’objet suivi) telles que ses dimensions, sa vitesse, sa direction, afin de détecter de potentielles anomalies. Pour distinguer de véritables erreurs par rapport à des phénomènes optiques, nous analysons toutes les interactions entre l’objet suivi et tout son environnement incluant d’autres objets mobiles et les éléments du fond de la scène. Dans la deuxième partie du global tracker, une méthode en post-traitement a été conçue pour associer les différentes tracklets (ie : segments de trajectoires fiables) correspondant à la même personne qui n’auraient pas été associées correctement par la première étape de suivi. L’algorithme d’association des tracklets choisit les caractéristiques d’apparence les plus saillantes et discriminantes afin de calculer une signature visuelle adaptée à chaque tracklet. Finalement le global tracker est évalué à partir de plusieurs bases de données de benchmark qui reproduit une large variété de situations réelles. A travers toutes ces expérimentations, les performances du global tracker sont équivalentes ou supérieures aux meilleurs algorithmes de suivi de l’état de l’art
Person re-Identification over distributed spaces and time
PhDReplicating the human visual system and cognitive abilities that the brain uses to process the
information it receives is an area of substantial scientific interest. With the prevalence of video
surveillance cameras a portion of this scientific drive has been into providing useful automated
counterparts to human operators. A prominent task in visual surveillance is that of matching
people between disjoint camera views, or re-identification. This allows operators to locate people
of interest, to track people across cameras and can be used as a precursory step to multi-camera
activity analysis. However, due to the contrasting conditions between camera views and their
effects on the appearance of people re-identification is a non-trivial task. This thesis proposes
solutions for reducing the visual ambiguity in observations of people between camera views
This thesis first looks at a method for mitigating the effects on the appearance of people under
differing lighting conditions between camera views. This thesis builds on work modelling
inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer
Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited
training samples. Unlike previous methods that use a mean-based representation for a set of
training samples, the cumulative nature of the CBTF retains colour information from underrepresented
samples in the training set. Additionally, the bi-directionality of the mapping function
is explored to try and maximise re-identification accuracy by ensuring samples are accurately
mapped between cameras.
Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing
lighting conditions within a single camera. As the CBTF requires manually labelled training
samples it is limited to static lighting conditions and is less effective if the lighting changes. This
Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting
change over time, or rely on camera transition time information to update. By utilising contextual
information drawn from the background in each camera view, an estimation of the lighting
change within a single camera can be made. This background lighting model allows the mapping
of colour information back to the original training conditions and thus remove the need for
3
retraining.
Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous
methods use a score based on a direct distance measure of set features to form a correct/incorrect
match result. Rather than offering an operator a single outcome, the ranking paradigm is to give
the operator a ranked list of possible matches and allow them to make the final decision. By utilising
a Support Vector Machine (SVM) ranking method, a weighting on the appearance features
can be learned that capitalises on the fact that not all image features are equally important to
re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues
by separating the training samples into smaller subsets and boosting the trained models.
Finally, the thesis looks at a practical application of the ranking paradigm in a real world application.
The system encompasses both the re-identification stage and the precursory extraction
and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined
to extract relevant information from the video, while several combinations of matching
techniques are combined with temporal priors to form a more comprehensive overall matching
criteria.
The effectiveness of the proposed approaches is tested on datasets obtained from a variety
of challenging environments including offices, apartment buildings, airports and outdoor public
spaces
Re-identifying people in the crowd
Developing an automated surveillance system is of great interest for various reasons including forensic and security applications. In the case of a network of surveillance cameras with non-overlapping fields of view, person detection and tracking alone are insufficient to track a subject of interest across the network. In this case, instances of a person captured in one camera view need to be retrieved among a gallery of different people, in other camera views. This vision problem is commonly known as person re-identification (re-id).
Cross-view instances of pedestrians exhibit varied levels of illumination, viewpoint, and pose variations which makes the problem very challenging. Despite recent progress towards improving accuracy, existing systems suffer from low applicability to real-world scenarios. This is mainly caused by the need for large amounts of annotated data from pairwise camera views to be available for training. Given the difficulty of obtaining such data and annotating it, this thesis aims to bring the person re-id problem a step closer to real-world deployment.
In the first contribution, the single-shot protocol, where each individual is represented by a pair of images that need to be matched, is considered. Following the extensive annotation of four datasets for six attributes, an evaluation of the most widely used feature extraction schemes is conducted. The results reveal two high-performing descriptors among those evaluated, and show illumination variation to have the most impact on re-id accuracy.
Motivated by the wide availability of videos from surveillance cameras and the additional visual and temporal information they provide, video-based person re-id is then investigated, and a su-pervised system is developed. This is achieved by improving and extending the best performing image-based person descriptor into three dimensions and combining it with distance metric learn-ing. The system obtained achieves state-of-the-art results on two widely used datasets.
Given the cost and difficulty of obtaining labelled data from pairwise cameras in a network to train the model, an unsupervised video-based person re-id method is also developed. It is based on a set-based distance measure that leverages rank vectors to estimate the similarity scores between person tracklets. The proposed system outperforms other unsupervised methods by a large margin on two datasets while competing with deep learning methods on another large-scale dataset
Smart video surveillance of pedestrians : fixed, aerial, and multi-camera methods
Crowd analysis from video footage is an active research topic in the field of computer vision. Crowds can be analaysed using different approaches, depending on their characteristics. Furthermore, analysis can be performed from footage obtained through different sources. Fixed CCTV cameras can be used, as well as cameras mounted on moving vehicles. To begin, a literature review is provided, where research works in the the fields of crowd analysis, as well as object and people tracking, occlusion handling, multi-view and sensor fusion, and multi-target tracking are analyses and compared, and their advantages and limitations highlighted. Following that, the three contributions of this thesis are presented: in a first study, crowds will be classified based on various cues (i.e. density, entropy), so that the best approaches to further analyse behaviour can be selected; then, some of the challenges of individual target tracking from aerial video footage will be tackled; finally, a study on the analysis of groups of people from multiple cameras is proposed. The analysis entails the movements of people and objects in the scene. The idea is to track as many people as possible within the crowd, and to be able to obtain knowledge from their movements, as a group, and to classify different types of scenes. An additional contribution of this thesis, are two novel datasets: on the one hand, a first set to test the proposed aerial video analysis methods; on the other, a second to validate the third study, that is, with groups of people recorded from multiple overlapping cameras performing different actions
Few-Shot Deep Adversarial Learning for Video-based Person Re-identification
Video-based person re-identification (re-ID) refers to matching people across
camera views from arbitrary unaligned video footages. Existing methods rely on
supervision signals to optimise a projected space under which the distances
between inter/intra-videos are maximised/minimised. However, this demands
exhaustively labelling people across camera views, rendering them unable to be
scaled in large networked cameras. Also, it is noticed that learning effective
video representations with view invariance is not explicitly addressed for
which features exhibit different distributions otherwise. Thus, matching videos
for person re-ID demands flexible models to capture the dynamics in time-series
observations and learn view-invariant representations with access to limited
labeled training samples. In this paper, we propose a novel few-shot deep
learning approach to video-based person re-ID, to learn comparable
representations that are discriminative and view-invariant. The proposed method
is developed on the variational recurrent neural networks (VRNNs) and trained
adversarially to produce latent variables with temporal dependencies that are
highly discriminative yet view-invariant in matching persons. Through extensive
experiments conducted on three benchmark datasets, we empirically show the
capability of our method in creating view-invariant temporal features and
state-of-the-art performance achieved by our method.Comment: Appearing at IEEE Transactions on Image Processin
Understanding Target Trajectory Behavior: A Dynamic Scene Modeling Approach
[Resumen] El análisis de comportamiento humano es uno de los campos más activos en la rama de visión por computador. Con el incremento de cámaras, especialmente en entornos controlados tales como aeropuertos, estaciones de tren o museos, se hace cada vez
más necesario el uso de sistemas automáticos que puedan catalogar la información
proporcionada. En el caso de entornos concurridos, es muy difĂcil el poder distinguir el comportamiento de personas en base a sus gestos, debido a la falta de visiĂłn de su cuerpo al completo. Por ende, el análisis de comportamiento se realiza en base a sus trayectorias, añadiendo tĂ©cnicas de razonamiento de alto nivel para ulilizar dicha informaciĂłn en mĂşltiples aplicaciones, tales como la video vigilancia o el análisis de tráfico. El propĂłsito de esta investigaciĂłn es el desarrollo de un sistema totalmente automático para el análisis de comportamiento de las personas. Por una parte, se presentan dos sistemas para el seguimiento de mĂşltiples objetivos, asĂ como un sistema novedoso para la re-identificaciĂłn de personas, con la intenciĂłn de detectar todo objeto de interĂ©s en la escena, devolviendo sus trayectorias como salida. Por otra parte, se presenta un sistema novedoso para el análisis de comportamiento basado en informaciĂłn del entorno de la escena. Está basado en la idea que que toda persona,cuando intenta llegar a un cierto lugar, tiende a seguir el mismo camino que suele utilizar la mayorĂa de la gente. Se presentan una serie de mĂ©tricas para la detecciĂłn de movimientos anĂłmalos, haciendo que este mĂ©todo sea ideal para su utilizaciĂłn en sistemas de tiempo real.[Abstract] Human behavior analysis is one of the most active computer vision research fields. As the number of cameras are increased, especially in restricted environments, like airports, train stations or museums, the need of automatic systems that can catalog the information provided by the cameras becomes crucial. In the case of crowded scenes, it is very difficult to distinguish people behavior because of the lack of visual contact of the whole body. Thus, behavior analysis remains in the evaluation of trajectories, adding high-level knowledge approaches in order to use that information in several applications like video surveillance or traffic analysis.
The proposal of this research is the design of a fully-automatic human behavior
system from a distance. On the one hand, two different multiple-target tracking
methods and a target re-identification procedure are presented to detect every target in the scene, returning their trajectories as output. On the other hand, a novel behavior analysis system, which includes information about the environment, is provided. It is based in the idea that every person tries to reach a goal in the
scene following the same path the majority of people should use. An extremely fast
abnormal behavior metric is presented, providing our method with the capabilities
needed to be used in real-time scenarios[Resumo] A análise do comportamento humano é un dos campos máis activos na rama da
visión por computadora. Co incremento de cámaras, especialmente en entornos controlados tales coma aeroportos, estacións de tren ou museos, faise cada vez máis
necesario o uso de sistemas automáticos que poidan catalogar a información proporcionada.
No caso de entornos concurridos, é moi complicado de poder distinguir o comportamento de persoas dacordo cos seus xestos, debido á falta dunha visión
completa do corpo do suxeito. Por tanto, a análise de comportamento tende a realizarse
en base á traxectoria, engadindo técnicas de razoamento de alto nivel para utilizar dita información en diversas aplicacións, tales coma a video vixiancia ou a análise de tráfico. O propósito desta investigación é o desenrolo dun sistema totalmente automático
para a análise do comportamento das persoas. Por unha parte, preséntanse dous
sistemas para o seguimento de mĂşltiples obxectivos, asĂ coma un sistema novidoso
para a re-identificación de persoas, coa intención de detectar todo obxecto de interés
na escena, devolvendo as traxectorias asociadas como saĂda. Por outra parte,
preséntase un sistema novidoso para a análise de comportamente baseada na informaci
Ăłn do entorno da escena. Está baseado na idea de que toda persoa, cando intenta acadar un certo luegar, tende a seguir o mesmo cami~no que xeralmente usa a maiorĂa da xente. PresĂ©ntanse unha serie de mĂ©tricas para a detecciĂłn de movementos anĂłmalos, facendo posible que este mĂ©todo poida ser utilizado en sistemas de tempo real
A Multi-Resident Number Estimation Method for Smart Homes
Population aging requires innovative solutions to increase the quality of life and preserve autonomous and independent living at home. A need of particular significance is the identification of behavioral drifts. A relevant behavioral drift concerns sociality: older people tend to isolate themselves. There is therefore the need to find methodologies to identify if, when, and how long the person is in the company of other people (possibly, also considering the number). The challenge is to address this task in poorly sensorized apartments, with non-intrusive sensors that are typically wireless and can only provide local and simple information. The proposed method addresses technological issues, such as PIR (Passive InfraRed) blind times, topological issues, such as sensor interference due to the inability to separate detection areas, and algorithmic issues. The house is modeled as a graph to constrain transitions between adjacent rooms. Each room is associated with a set of values, for each identified person. These values decay over time and represent the probability that each person is still in the room. Because the used sensors cannot determine the number of people, the approach is based on a multi-branch inference that, over time, differentiates the movements in the apartment and estimates the number of people. The proposed algorithm has been validated with real data obtaining an accuracy of 86.8%