21 research outputs found

    QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios

    Get PDF
    The concerns about individuals security have justified the increasing number of surveillance cameras deployed both in private and public spaces. However, contrary to popular belief, these devices are in most cases used solely for recording, instead of feeding intelligent analysis processes capable of extracting information about the observed individuals. Thus, even though video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant details about the subjects that took part in a crime depends on the manual inspection of recordings. As such, the current goal of the research community is the development of automated surveillance systems capable of monitoring and identifying subjects in surveillance scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at designing a visual surveillance system capable of acquiring biometric data at a distance (e.g., face, iris or gait) without requiring human intervention in the process, as well as devising biometric recognition methods robust to the degradation factors resulting from the unconstrained acquisition process. Regarding the first goal, the analysis of the data acquired by typical surveillance systems shows that large acquisition distances significantly decrease the resolution of biometric samples, and thus their discriminability is not sufficient for recognition purposes. In the literature, diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring high-resolution imagery at a distance, particularly when using a master-slave configuration. In the master-slave configuration, the video acquired by a typical surveillance camera is analyzed for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged at high-resolution by the PTZ camera. Several methods have already shown that this configuration can be used for acquiring biometric data at a distance. Nevertheless, these methods failed at providing effective solutions to the typical challenges of this strategy, restraining its use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development of a biometric data acquisition system based on the cooperation of a PTZ camera with a typical surveillance camera. The first proposal is a camera calibration method capable of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ camera. The second proposal is a camera scheduling method for determining - in real-time - the sequence of acquisitions that maximizes the number of different targets obtained, while minimizing the cumulative transition time. In order to achieve the first goal of this thesis, both methods were combined with state-of-the-art approaches of the human monitoring field to develop a fully automated surveillance capable of acquiring biometric data at a distance and without human cooperation, designated as QUIS-CAMPI system. The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis of the performance of the state-of-the-art biometric recognition approaches shows that these approaches attain almost ideal recognition rates in unconstrained data. However, this performance is incongruous with the recognition rates observed in surveillance scenarios. Taking into account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a distance ranging from 5 to 40 meters and without human intervention in the acquisition process. This set allows to objectively assess the performance of state-of-the-art biometric recognition methods in data that truly encompass the covariates of surveillance scenarios. As such, this set was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained by the nine methods specially designed for this competition. In addition, the data acquired by the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the development of methods robust to the covariates of surveillance scenarios. The first proposal regards a method for detecting corrupted features in biometric signatures inferred by a redundancy analysis algorithm. The second proposal is a caricature-based face recognition approach capable of enhancing the recognition performance by automatically generating a caricature from a 2D photo. The experimental evaluation of these methods shows that both approaches contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivíduos tem justificado o crescimento do número de câmaras de vídeo-vigilância instaladas tanto em espaços privados como públicos. Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente capaz de inferir em tempo real informações sobre os indivíduos observados. Assim, apesar de a vídeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda confinado à disponibilização de vídeos que têm que ser manualmente inspecionados para extrair informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal desafio da comunidade científica é o desenvolvimento de sistemas automatizados capazes de monitorizar e identificar indivíduos em ambientes de vídeo-vigilância. Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento biométrico aos ambientes de vídeo-vigilância. De forma mais especifica, pretende-se 1) conceber um sistema de vídeo-vigilância que consiga adquirir dados biométricos a longas distâncias (e.g., imagens da cara, íris, ou vídeos do tipo de passo) sem requerer a cooperação dos indivíduos no processo; e 2) desenvolver métodos de reconhecimento biométrico robustos aos fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas. No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas típicos de vídeo-vigilância mostra que, devido à distância de captura, os traços biométricos amostrados não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis. Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave. Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse (e.g. carros, pessoas) a partir do vídeo adquirido por uma câmara de vídeo-vigilância e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados com esta estratégia, impedindo assim o seu uso em ambientes de vídeo-vigilância. Deste modo, esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de vídeo-vigilância usando uma câmara PTZ assistida por uma câmara típica de vídeo-vigilância. O primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara master para o ângulo da câmara PTZ (slave) sem o auxílio de outros dispositivos óticos. O segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações que maximiza o número de diferentes sujeitos observados e simultaneamente minimiza o tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os dois métodos propostos foram combinados com os avanços alcançados na área da monitorização de humanos para assim desenvolver o primeiro sistema de vídeo-vigilância completamente automatizado e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação dos indivíduos no processo, designado por sistema QUIS-CAMPI. O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho não é corroborado pelos resultados observados em ambientes de vídeo-vigilância, o que sugere que os conjuntos de dados atuais não contêm verdadeiramente os fatores de degradação típicos dos ambientes de vídeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biométricos atuais, esta tese introduz um novo conjunto de dados biométricos (imagens da face e vídeos do tipo de passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho dos métodos do estado-da-arte no reconhecimento de indivíduos em imagens/vídeos capturados num ambiente real de vídeo-vigilância. Como tal, este conjunto foi utilizado para promover a primeira competição de reconhecimento biométrico em ambientes não controlados. Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9 métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para aumentar a robustez aos fatores de degradação observados em ambientes de vídeo-vigilância. O primeiro é um método para detetar características corruptas em assinaturas biométricas através da análise da redundância entre subconjuntos de características. O segundo é um método de reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma única foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir as taxas de erro em dados adquiridos de forma não controlada

    Robust real-time tracking in smart camera networks

    Get PDF

    Méthodes de vision à la motion et leurs applications

    Get PDF
    La détection de mouvement est une opération de base souvent utilisée en vision par ordinateur, que ce soit pour la détection de piétons, la détection d’anomalies, l’analyse de scènes vidéo ou le suivi d’objets en temps réel. Bien qu’un très grand nombre d’articles ait été publiés sur le sujet, plusieurs questions restent en suspens. Par exemple, il n’est toujours pas clair comment détecter des objets en mouvement dans des vidéos contenant des situations difficiles à gérer comme d'importants mouvements de fonds et des changements d’illumination. De plus, il n’y a pas de consensus sur comment quantifier les performances des méthodes de détection de mouvement. Aussi, il est souvent difficile d’incorporer de l’information de mouvement à des opérations de haut niveau comme par exemple la détection de piétons. Dans cette thèse, j’aborde quatre problèmes en lien avec la détection de mouvement: 1. Comment évaluer efficacement des méthodes de détection de mouvement? Pour répondre à cette question, nous avons mis sur pied une procédure d’évaluation de telles méthodes. Cela a mené à la création de la plus grosse base de données 100\% annotée au monde dédiée à la détection de mouvement et organisé une compétition internationale (CVPR 2014). J’ai également exploré différentes métriques d’évaluation ainsi que des stratégies de combinaison de méthodes de détection de mouvement. 2. L’annotation manuelle de chaque objet en mouvement dans un grand nombre de vidéos est un immense défi lors de la création d’une base de données d’analyse vidéo. Bien qu’il existe des méthodes de segmentation automatiques et semi-automatiques, ces dernières ne sont jamais assez précises pour produire des résultats de type “vérité terrain”. Pour résoudre ce problème, nous avons proposé une méthode interactive de segmentation d’objets en mouvement basée sur l’apprentissage profond. Les résultats obtenus sont aussi précis que ceux obtenus par un être humain tout en étant 40 fois plus rapide. 3. Les méthodes de détection de piétons sont très souvent utilisées en analyse de la vidéo. Malheureusement, elles souffrent parfois d’un grand nombre de faux positifs ou de faux négatifs tout dépendant de l’ajustement des paramètres de la méthode. Dans le but d’augmenter les performances des méthodes de détection de piétons, nous avons proposé un filtre non linéaire basée sur la détection de mouvement permettant de grandement réduire le nombre de faux positifs. 4. L’initialisation de fond ({\em background initialization}) est le processus par lequel on cherche à retrouver l’image de fond d’une vidéo sans les objets en mouvement. Bien qu’un grand nombre de méthodes ait été proposé, tout comme la détection de mouvement, il n’existe aucune base de donnée ni procédure d’évaluation pour de telles méthodes. Nous avons donc mis sur pied la plus grosse base de données au monde pour ce type d’applications et avons organisé une compétition internationale (ICPR 2016).Abstract : Motion detection is a basic video analytic operation on which many high-level computer vision tasks are built upon, e.g., pedestrian detection, anomaly detection, scene understanding and object tracking strategies. Even though a large number of motion detection methods have been proposed in the last decades, some important questions are still unanswered, including: (1) how to separate the foreground from the background accurately even under extremely challenging circumstances? (2) how to evaluate different motion detection methods? And (3) how to use motion information extracted by motion detection to help improving high-level computer vision tasks? In this thesis, we address four problems related to motion detection: 1. How can we benchmark (and on which videos) motion detection method? Current datasets are either too small with a limited number of scenarios, or only provide bounding box ground truth that indicates the rough location of foreground objects. As a solution, we built the largest and most objective motion detection dataset in the world with pixel accurate ground truth to evaluate and compare motion detection methods. We also explore various evaluation metrics as well as different combination strategies. 2. Providing pixel accurate ground truth is a huge challenge when building a motion detection dataset. While automatic labeling methods suffer from a too large false detection rate to be used as ground truth, manual labeling of hundreds of thousands of frames is extremely time consuming. To solve this problem, we proposed an interactive deep learning method for segmenting moving objects from videos. The proposed method can reach human-level accuracies while lowering the labeling time by a factor of 40. 3. Pedestrian detectors always suffer from either false positive detections or false negative detections all depending on the parameter tuning. Unfortunately, manual adjustment of parameters for a large number of videos is not feasible in practice. In order to make pedestrian detectors more robust on a large variety of videos, we combined motion detection with various state-of-the-art pedestrian detectors. This is done by a novel motion-based nonlinear filtering process which improves detectors by a significant margin. 4. Scene background initialization is the process by which a method tries to recover the RGB background image of a video without foreground objects in it. However, one of the reasons that background modeling is challenging is that there is no good dataset and benchmarking framework to estimate the performance of background modeling methods. To fix this problem, we proposed an extensive survey as well as a novel benchmarking framework for scene background initialization

    동적 카메라에서 동적 물체 탐지를 위한 배경 중심 접근법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 최진영.A number of surveillance cameras have been installed for safety and security in actual environments. To achieve a human-level visual intelligence via cameras, there has been much effort to develop many computer vision algorithms realizing the various visual functions from low level to high level. Among them, the moving object detection is a fundamental function because the attention to a moving object is essential to understand its high-level behavior. Most of moving object detection algorithms in a fixed camera adopt the background-centric modeling approach. However, the background-centric approach does not work well in a moving camera because the modeling of moving background in an online way is challengeable. Until now, most algorithms for the object detection in a moving camera have relied on the object-centric approach using appearance-based recognition schemes. However, the object-centric approach suffers from the heavy computational complexity. In this thesis, we propose an efficient and robust scheme based on the background-centric approach to detect moving objects in the dynamic background environments using moving cameras. To tackle the challenges arising from the dynamic background, in this thesis, we deal with four problems: false positives from inaccurate camera motion estimation, sudden scene changes such as illumination, slow moving object relative to camera movement, and motion model limitation in a dashcam video. To solve the false positives due to motion estimation error, we propose a new scheme to improve the robustness of moving object detection in a moving camera. To lessen the influence of background motion, we adopt a dual-mode kernel model that builds two background models using a grid-based modeling. In addition, to reduce the false detections and the missing of true objects, we introduce an attentional sampling scheme based on spatio-temporal properties of moving objects. From the spatio-temporal properties, we build a foreground probability map and generate a sampling map which selects the candidate pixels to find the actual objects. We apply the background subtraction and model update with attention to only the selected pixels. To resolve sudden scene changes and slow moving object problems, we propose a situation-aware background learning method that handles dynamic scenes for moving object detection in a moving camera. We suggest new modules that utilizes situation variables and builds a background model adaptively. Our method compensates for camera movement and updates the background model according to the situation variables. The situation-aware scheme enables the algorithm to build a clear background model without contamination by the foreground. To overcome the limitation of motion model in a dashcam video, we propose a prior-based attentional update scheme to handle dynamic scene changes. Motivated by the center-focused and structure-focused tendencies of human attention, we extend the compensation-based method that focuses on the center changes and neglects minor changes on the important scene structure. The center-focused tendency is implemented by increasing the learning rate of the boundary region through the multiplication of the attention map and the age model. The structure-focused tendency is used to build a robust background model through the model selection after the road and sky region are estimated. In experiments, the proposed framework shows its efficiency and robustness through qualitative and quantitative comparison evaluation with the state-of-the arts. Through the first scheme, it takes only 4.8 ms in one frame processing without parallel processing. The second scheme enables to adapt rapidly changing scenes while maintaining the performance and speed. Through the third scheme for the driving situation, successful results are shown in background modeling and moving object detection in dashcam videos.1 Introduction 1 1.1 Background 1 1.2 Related works 4 1.3 Contributions 10 1.4 Contents of Thesis 11 2 Problem Statements 13 2.1 Background-centric approach for a fixed camera 13 2.2 Problem statements for a moving camera 17 3 Dual modeling with Attentional Sampling 25 3.1 Dual-mode modeling for a moving camera 26 3.1.1 Age model for adaptive learning rate 28 3.1.2 Grid-based modeling 29 3.1.3 Dual-mode kernel modeling 32 3.1.4 Motion compensation by mixing models 35 3.2 Dual-mode modeling with Attentional sampling 36 3.2.1 Foreground probability map based on occurrence 37 3.2.2 Sampling Map Generation 41 3.2.3 Model update with sampling map 43 3.2.4 Probabilistic Foreground Decision 44 3.3 Benefits 45 4 Situation-aware Background Learning 47 4.1 Situation Variable Estimation 51 4.1.1 Background Motion Estimation 51 4.1.2 Foreground Motion Estimation 52 4.1.3 Illumination Change Estimation 53 4.2 Situation-Aware Background Learning 54 4.2.1 Situation-Aware Warping of the Background Model 54 4.2.2 Situation-Aware Update of the Background Model 55 4.3 Foreground Decision 58 4.4 Benefits 59 5 Prior-based Attentional Update for dashcam video 61 5.1 Camera Motion Estimation 65 5.2 Road and Sky region estimation 66 5.3 Background learning 69 5.4 Foreground Result Combining 75 5.5 Benefits 77 6 Experiments 79 6.1 Qualitative Comparisons 82 6.1.1 Dual modeling with attentional sampling 82 6.1.2 Situation-aware background learning 84 6.1.3 Prior-based attentional update 88 6.2 Quantitative Comparisons 91 6.2.1 Dual modeling with attentional sampling 91 6.2.2 Situation-aware background learning 91 6.2.3 Prior-based attentional update (PBAU) 93 6.2.4 Runtime evaluation 94 6.2.5 Unified framework 94 6.3 Application: combining with recognition algorithm 98 6.4 Discussion 102 6.4.1 Issues 102 6.4.2 Strength 104 6.4.3 Limitation 105 7 Concluding remarks and Future works 109 Bibliography 113 초록 125Docto

    Unsupervised maritime target detection

    Get PDF
    The unsupervised detection of maritime targets in grey scale video is a difficult problem in maritime video surveillance. Most approaches assume that the camera is static and employ pixel-wise background modelling techniques for foreground detection; other methods rely on colour or thermal information to detect targets. These methods fail in real-world situations when the static camera assumption is violated, and colour or thermal data is unavailable. In defence and security applications, prior information and training samples of targets may be unavailable for training a classifier; the learning of a one class classifier for the background may be impossible as well. Thus, an unsupervised online approach that attempts to learn from the scene data is highly desirable. In this thesis, the characteristics of the maritime scene and the ocean texture are exploited for foreground detection. Two fast and effective methods are investigated for target detection. Firstly, online regionbased background texture models are explored for describing the appearance of the ocean. This approach avoids the need for frame registration because the model is built spatially rather than temporally. The texture appearance of the ocean is described using Local Binary Pattern (LBP) descriptors. Two models are proposed: one model is a Gaussian Mixture (GMM) and the other, referred to as a Sparse Texture Model (STM), is a set of histogram texture distributions. The foreground detections are optimized using a Graph Cut (GC) that enforces spatial coherence. Secondly, feature tracking is investigated as a means of detecting stable features in an image frame that typically correspond to maritime targets; unstable features are background regions. This approach is a Track-Before-Detect (TBD) concept and it is implemented using a hierarchical scheme for motion estimation, and matching of Scale- Invariant Feature Transform (SIFT) appearance features. The experimental results show that these approaches are feasible for foreground detection in maritime video when the camera is either static or moving. Receiver Operating Characteristic (ROC) curves were generated for five test sequences and the Area Under the ROC Curve (AUC) was analyzed for the performance of the proposed methods. The texture models, without GC optimization, achieved an AUC of 0.85 or greater on four out of the five test videos. At 50% True Positive Rate (TPR), these four test scenarios had a False Positive Rate (FPR) of less than 2%. With the GC optimization, an AUC of greater than 0.8 was achieved for all the test cases and the FPR was reduced in all cases when compared to the results without the GC. In comparison to the state of the art in background modelling for maritime scenes, our texture model methods achieved the best performance or comparable performance. The two texture models executed at a reasonable processing frame rate. The experimental results for TBD show that one may detect target features using a simple track score based on the track length. At 50% TPR a FPR of less than 4% is achieved for four out of the five test scenarios. These results are very promising for maritime target detection

    Self-supervised foreground segmentation by sequences of images without camera motion

    Get PDF
    Sense cap mena de dubte, l’última dècada ha suposat un gran avenç pel que fa als algorismes d’aprenentatge profund. El seu impacte ha arribat a un ample rang de camps: des de visió artificial o processament de llenguatge natural, fins a medicina personalitzada o imatge biomèdica. Tot i això, la majoria de tasques que han estat solucionades per xarxes neuronals artificials depenen d’una gran quantitat de mostres ja anotades per a ser entrenades. D’aquí sorgeix la necessitat d’estratègies d’entrenament com l’aprenentatge auto-supervisat, que substitueix la necessitat de dades etiquetades per una major quantitat de mostres i un mètode que permeti extreure’n informació. En aquest treball proposem un nou algorisme per a la segmentació d’objectes en primer pla en imatges que no depèn de mostres etiquetades per al seu entrenament. En comptes d’això, aprofita la similitud que hi ha entre seqüències d’imatges amb un mateix fons per distingir aquells elements que corresponen al primer pla. A més, proposem un mètode que permet agrupar diferents regions segmentades per distingir diferents tipus d’objectes. Malgrat que aquest mètode només permet treballar amb imatges fotografiades amb una càmera fixa, ampliem l’algorisme de manera que sigui capaç d generalitzar aquests resultats a imatges mai vistes, la qual cosa ens permet avaluar-lo en conjunts de dades usats habitualment com a referència.Sin duda alguna, la última década ha supuesto un gran avance para los algoritmos de aprendizaje profundo. Su impacto ha llegado a un amplio rango de campos: desde la visión artificial o el procesado de lenguaje natural, hasta la medicina personalizada o la imagen biomédica. Aun así, la mayoría de tareas que han sido solucionadas por redes neuronales artificiales dependen de una gran cantidad de datos etiquetados para ser entrenadas. De aquí surge la necesidad de estrategias como el aprendizaje auto-supervisado, que sustituye la necesidad de muestras anotadas por una mayor cantidad de datos y un método para extraer información útil de ellos. En este trabajo, proponemos un nuevo algoritmo para la segmentación de objetos en primer plano en imágenes que no depende de muestras etiquetadas para ser entrenado. En vez de esto, aprovecha la similitud entre secuencias de imágenes con un mismo fondo para distinguir aquellos elementos que corresponden al primer plano. Además, proponemos un método que permite agrupar distintas regiones segmentadas para distinguir distintos tipos de objeto. A pesar de que el algoritmo solamente puede trabajar con imágenes fotografiadas con una misma cámara fija, ampliamos nuestra propuesta para que pueda generalizar estos resultados a imágenes nunca vistas, lo cual nos permite evaluarlo en conjuntos de datos usados comúnmente como referencia.Undeniably, last decade has proven to be a success for deep learning based algorithms. It has positively impacted a wide range of fields of knowledge, that range from computer vision or natural language processing to biomedical imaging or personalized medicine. Despite this, most of the tasks that have been solved by artificial neural networks rely on a bast amount of annotated samples, which require a lot of human work to be obtained. This is where the need of new training schemes like self-supervised learning arises, that replace labeled data with a larger amount of samples and a strategy to extract meaningful information from it. In this project we propose a novel approach for image foreground segmentation that does not rely on already segmented images to be trained. Instead it exploits the similarity between sequences of images with common backgrounds to extract representations that allow to successfully distinguish foreground regions. Then we propose a method to cluster these regions to discover groups of similar type of objects. Although this approach only works with sets of images taken with a fixed-camera, we take an extra step and suggest a method to generalize to unseen backgrounds, which allows us to test our results on established benchmarks.Outgoin

    Segmentation mutuelle d'objets d'intérêt dans des séquences d'images stéréo multispectrales

    Get PDF
    Les systèmes de vidéosurveillance automatisés actuellement déployés dans le monde sont encore bien loin de ceux qui sont représentés depuis des années dans les oeuvres de sciencefiction. Une des raisons derrière ce retard de développement est le manque d’outils de bas niveau permettant de traiter les données brutes captées sur le terrain. Le pré-traitement de ces données sert à réduire la quantité d’information qui transige vers des serveurs centralisés, qui eux effectuent l’interprétation complète du contenu visuel capté. L’identification d’objets d’intérêt dans les images brutes à partir de leur mouvement est un exemple de pré-traitement qui peut être réalisé. Toutefois, dans un contexte de vidéosurveillance, une méthode de pré-traitement ne peut généralement pas se fier à un modèle d’apparence ou de forme qui caractérise ces objets, car leur nature exacte n’est pas connue d’avance. Cela complique donc l’élaboration des méthodes de traitement de bas niveau. Dans cette thèse, nous présentons différentes méthodes permettant de détecter et de segmenter des objets d’intérêt à partir de séquences vidéo de manière complètement automatisée. Nous explorons d’abord les approches de segmentation vidéo monoculaire par soustraction d’arrière-plan. Ces approches se basent sur l’idée que l’arrière-plan d’une scène peut être modélisé au fil du temps, et que toute variation importante d’apparence non prédite par le modèle dévoile en fait la présence d’un objet en intrusion. Le principal défi devant être relevé par ce type de méthode est que leur modèle d’arrière-plan doit pouvoir s’adapter aux changements dynamiques des conditions d’observation de la scène. La méthode conçue doit aussi pouvoir rester sensible à l’apparition de nouveaux objets d’intérêt, malgré cette robustesse accrue aux comportements dynamiques prévisibles. Nous proposons deux méthodes introduisant différentes techniques de modélisation qui permettent de mieux caractériser l’apparence de l’arrière-plan sans que le modèle soit affecté par les changements d’illumination, et qui analysent la persistance locale de l’arrière-plan afin de mieux détecter les objets d’intérêt temporairement immobilisés. Nous introduisons aussi de nouveaux mécanismes de rétroaction servant à ajuster les hyperparamètres de nos méthodes en fonction du dynamisme observé de la scène et de la qualité des résultats produits.----------ABSTRACT: The automated video surveillance systems currently deployed around the world are still quite far in terms of capabilities from the ones that have inspired countless science fiction works over the past few years. One of the reasons behind this lag in development is the lack of lowlevel tools that allow raw image data to be processed directly in the field. This preprocessing is used to reduce the amount of information transferred to centralized servers that have to interpret the captured visual content for further use. The identification of objects of interest in raw images based on motion is an example of a reprocessing step that might be required by a large system. However, in a surveillance context, the preprocessing method can seldom rely on an appearance or shape model to recognize these objects since their exact nature cannot be known exactly in advance. This complicates the elaboration of low-level image processing methods. In this thesis, we present different methods that detect and segment objects of interest from video sequences in a fully unsupervised fashion. We first explore monocular video segmentation approaches based on background subtraction. These approaches are based on the idea that the background of an observed scene can be modeled over time, and that any drastic variation in appearance that is not predicted by the model actually reveals the presence of an intruding object. The main challenge that must be met by background subtraction methods is that their model should be able to adapt to dynamic changes in scene conditions. The designed methods must also remain sensitive to the emergence of new objects of interest despite this increased robustness to predictable dynamic scene behaviors. We propose two methods that introduce different modeling techniques to improve background appearance description in an illumination-invariant way, and that analyze local background persistence to improve the detection of temporarily stationary objects. We also introduce new feedback mechanisms used to adjust the hyperparameters of our methods based on the observed dynamics of the scene and the quality of the generated output
    corecore