278 research outputs found

    Automated camera ranking and selection using video content and scene context

    Get PDF
    PhDWhen observing a scene with multiple cameras, an important problem to solve is to automatically identify “what camera feed should be shown and when?” The answer to this question is of interest for a number of applications and scenarios ranging from sports to surveillance. In this thesis we present a framework for the ranking of each video frame and camera across time and the camera network, respectively. This ranking is then used for automated video production. In the first stage information from each camera view and from the objects in it is extracted and represented in a way that allows for object- and frame-ranking. First objects are detected and ranked within and across camera views. This ranking takes into account both visible and contextual information related to the object. Then content ranking is performed based on the objects in the view and camera-network level information. We propose two novel techniques for content ranking namely: Routing Based Ranking (RBR) and Multivariate Gaussian based Ranking (MVG). In RBR we use a rule based framework where weighted fusion of object and frame level information takes place while in MVG the rank is estimated as a multivariate Gaussian distribution. Through experimental and subjective validation we demonstrate that the proposed content ranking strategies allows the identification of the best-camera at each time. The second part of the thesis focuses on the automatic generation of N-to-1 videos based on the ranked content. We demonstrate that in such production settings it is undesirable to have frequent inter-camera switching. Thus motivating the need for a compromise, between selecting the best camera most of the time and minimising the frequent inter-camera switching, we demonstrate that state-of-the-art techniques for this task are inadequate and fail in dynamic scenes. We propose three novel methods for automated camera selection. The first method (¡go f ) performs a joint optimization of a cost function that depends on both the view quality and inter-camera switching so that a i Abstract ii pleasing best-view video sequence can be composed. The other two methods (¡dbn and ¡util) include the selection decision into the ranking-strategy. In ¡dbn we model the best-camera selection as a state sequence via Directed Acyclic Graphs (DAG) designed as a Dynamic Bayesian Network (DBN), which encodes the contextual knowledge about the camera network and employs the past information to minimize the inter camera switches. In comparison ¡util utilizes the past as well as the future information in a Partially Observable Markov Decision Process (POMDP) where the camera-selection at a certain time is influenced by the past information and its repercussions in the future. The performance of the proposed approach is demonstrated on multiple real and synthetic multi-camera setups. We compare the proposed architectures with various baseline methods with encouraging results. The performance of the proposed approaches is also validated through extensive subjective testing

    QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios

    Get PDF
    The concerns about individuals security have justified the increasing number of surveillance cameras deployed both in private and public spaces. However, contrary to popular belief, these devices are in most cases used solely for recording, instead of feeding intelligent analysis processes capable of extracting information about the observed individuals. Thus, even though video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant details about the subjects that took part in a crime depends on the manual inspection of recordings. As such, the current goal of the research community is the development of automated surveillance systems capable of monitoring and identifying subjects in surveillance scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at designing a visual surveillance system capable of acquiring biometric data at a distance (e.g., face, iris or gait) without requiring human intervention in the process, as well as devising biometric recognition methods robust to the degradation factors resulting from the unconstrained acquisition process. Regarding the first goal, the analysis of the data acquired by typical surveillance systems shows that large acquisition distances significantly decrease the resolution of biometric samples, and thus their discriminability is not sufficient for recognition purposes. In the literature, diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring high-resolution imagery at a distance, particularly when using a master-slave configuration. In the master-slave configuration, the video acquired by a typical surveillance camera is analyzed for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged at high-resolution by the PTZ camera. Several methods have already shown that this configuration can be used for acquiring biometric data at a distance. Nevertheless, these methods failed at providing effective solutions to the typical challenges of this strategy, restraining its use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development of a biometric data acquisition system based on the cooperation of a PTZ camera with a typical surveillance camera. The first proposal is a camera calibration method capable of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ camera. The second proposal is a camera scheduling method for determining - in real-time - the sequence of acquisitions that maximizes the number of different targets obtained, while minimizing the cumulative transition time. In order to achieve the first goal of this thesis, both methods were combined with state-of-the-art approaches of the human monitoring field to develop a fully automated surveillance capable of acquiring biometric data at a distance and without human cooperation, designated as QUIS-CAMPI system. The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis of the performance of the state-of-the-art biometric recognition approaches shows that these approaches attain almost ideal recognition rates in unconstrained data. However, this performance is incongruous with the recognition rates observed in surveillance scenarios. Taking into account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a distance ranging from 5 to 40 meters and without human intervention in the acquisition process. This set allows to objectively assess the performance of state-of-the-art biometric recognition methods in data that truly encompass the covariates of surveillance scenarios. As such, this set was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained by the nine methods specially designed for this competition. In addition, the data acquired by the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the development of methods robust to the covariates of surveillance scenarios. The first proposal regards a method for detecting corrupted features in biometric signatures inferred by a redundancy analysis algorithm. The second proposal is a caricature-based face recognition approach capable of enhancing the recognition performance by automatically generating a caricature from a 2D photo. The experimental evaluation of these methods shows that both approaches contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivĂ­duos tem justificado o crescimento do nĂșmero de cĂąmaras de vĂ­deo-vigilĂąncia instaladas tanto em espaços privados como pĂșblicos. Contudo, ao contrĂĄrio do que normalmente se pensa, estes dispositivos sĂŁo, na maior parte dos casos, usados apenas para gravação, nĂŁo estando ligados a nenhum tipo de software inteligente capaz de inferir em tempo real informaçÔes sobre os indivĂ­duos observados. Assim, apesar de a vĂ­deo-vigilĂąncia ter provado ser essencial na resolução de diversos crimes, o seu uso estĂĄ ainda confinado Ă  disponibilização de vĂ­deos que tĂȘm que ser manualmente inspecionados para extrair informaçÔes relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal desafio da comunidade cientĂ­fica Ă© o desenvolvimento de sistemas automatizados capazes de monitorizar e identificar indivĂ­duos em ambientes de vĂ­deo-vigilĂąncia. Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento biomĂ©trico aos ambientes de vĂ­deo-vigilĂąncia. De forma mais especifica, pretende-se 1) conceber um sistema de vĂ­deo-vigilĂąncia que consiga adquirir dados biomĂ©tricos a longas distĂąncias (e.g., imagens da cara, Ă­ris, ou vĂ­deos do tipo de passo) sem requerer a cooperação dos indivĂ­duos no processo; e 2) desenvolver mĂ©todos de reconhecimento biomĂ©trico robustos aos fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas. No que diz respeito ao primeiro objetivo, a anĂĄlise aos dados adquiridos pelos sistemas tĂ­picos de vĂ­deo-vigilĂąncia mostra que, devido Ă  distĂąncia de captura, os traços biomĂ©tricos amostrados nĂŁo sĂŁo suficientemente discriminativos para garantir taxas de reconhecimento aceitĂĄveis. Na literatura, vĂĄrios trabalhos advogam o uso de cĂąmaras Pan Tilt Zoom (PTZ) para adquirir imagens de alta resolução Ă  distĂąncia, principalmente o uso destes dispositivos no modo masterslave. Na configuração master-slave um mĂłdulo de anĂĄlise inteligente seleciona zonas de interesse (e.g. carros, pessoas) a partir do vĂ­deo adquirido por uma cĂąmara de vĂ­deo-vigilĂąncia e a cĂąmara PTZ Ă© orientada para adquirir em alta resolução as regiĂ”es de interesse. Diversos mĂ©todos jĂĄ mostraram que esta configuração pode ser usada para adquirir dados biomĂ©tricos Ă  distĂąncia, ainda assim estes nĂŁo foram capazes de solucionar alguns problemas relacionados com esta estratĂ©gia, impedindo assim o seu uso em ambientes de vĂ­deo-vigilĂąncia. Deste modo, esta tese propĂ”e dois mĂ©todos para permitir a aquisição de dados biomĂ©tricos em ambientes de vĂ­deo-vigilĂąncia usando uma cĂąmara PTZ assistida por uma cĂąmara tĂ­pica de vĂ­deo-vigilĂąncia. O primeiro Ă© um mĂ©todo de calibração capaz de mapear de forma exata as coordenadas da cĂąmara master para o Ăąngulo da cĂąmara PTZ (slave) sem o auxĂ­lio de outros dispositivos Ăłticos. O segundo mĂ©todo determina a ordem pela qual um conjunto de sujeitos vai ser observado pela cĂąmara PTZ. O mĂ©todo proposto consegue determinar em tempo-real a sequĂȘncia de observaçÔes que maximiza o nĂșmero de diferentes sujeitos observados e simultaneamente minimiza o tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os dois mĂ©todos propostos foram combinados com os avanços alcançados na ĂĄrea da monitorização de humanos para assim desenvolver o primeiro sistema de vĂ­deo-vigilĂąncia completamente automatizado e capaz de adquirir dados biomĂ©tricos a longas distĂąncias sem requerer a cooperação dos indivĂ­duos no processo, designado por sistema QUIS-CAMPI. O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada com o segundo objetivo desta tese. A anĂĄlise do desempenho dos mĂ©todos de reconhecimento biomĂ©trico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento quase perfeitas em dados adquiridos sem restriçÔes (e.g., taxas de reconhecimento maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho nĂŁo Ă© corroborado pelos resultados observados em ambientes de vĂ­deo-vigilĂąncia, o que sugere que os conjuntos de dados atuais nĂŁo contĂȘm verdadeiramente os fatores de degradação tĂ­picos dos ambientes de vĂ­deo-vigilĂąncia. Tendo em conta as vulnerabilidades dos conjuntos de dados biomĂ©tricos atuais, esta tese introduz um novo conjunto de dados biomĂ©tricos (imagens da face e vĂ­deos do tipo de passo) adquiridos pelo sistema QUIS-CAMPI a uma distĂąncia mĂĄxima de 40m e sem a cooperação dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho dos mĂ©todos do estado-da-arte no reconhecimento de indivĂ­duos em imagens/vĂ­deos capturados num ambiente real de vĂ­deo-vigilĂąncia. Como tal, este conjunto foi utilizado para promover a primeira competição de reconhecimento biomĂ©trico em ambientes nĂŁo controlados. Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9 mĂ©todos especialmente desenhados para esta competição. Para alĂ©m disso, os dados adquiridos pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois mĂ©todos para aumentar a robustez aos fatores de degradação observados em ambientes de vĂ­deo-vigilĂąncia. O primeiro Ă© um mĂ©todo para detetar caracterĂ­sticas corruptas em assinaturas biomĂ©tricas atravĂ©s da anĂĄlise da redundĂąncia entre subconjuntos de caracterĂ­sticas. O segundo Ă© um mĂ©todo de reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma Ășnica foto do sujeito. As experiĂȘncias realizadas mostram que ambos os mĂ©todos conseguem reduzir as taxas de erro em dados adquiridos de forma nĂŁo controlada

    Autonomous navigation for guide following in crowded indoor environments

    No full text
    The requirements for assisted living are rapidly changing as the number of elderly patients over the age of 60 continues to increase. This rise places a high level of stress on nurse practitioners who must care for more patients than they are capable. As this trend is expected to continue, new technology will be required to help care for patients. Mobile robots present an opportunity to help alleviate the stress on nurse practitioners by monitoring and performing remedial tasks for elderly patients. In order to produce mobile robots with the ability to perform these tasks, however, many challenges must be overcome. The hospital environment requires a high level of safety to prevent patient injury. Any facility that uses mobile robots, therefore, must be able to ensure that no harm will come to patients whilst in a care environment. This requires the robot to build a high level of understanding about the environment and the people with close proximity to the robot. Hitherto, most mobile robots have used vision-based sensors or 2D laser range finders. 3D time-of-flight sensors have recently been introduced and provide dense 3D point clouds of the environment at real-time frame rates. This provides mobile robots with previously unavailable dense information in real-time. I investigate the use of time-of-flight cameras for mobile robot navigation in crowded environments in this thesis. A unified framework to allow the robot to follow a guide through an indoor environment safely and efficiently is presented. Each component of the framework is analyzed in detail, with real-world scenarios illustrating its practical use. Time-of-flight cameras are relatively new sensors and, therefore, have inherent problems that must be overcome to receive consistent and accurate data. I propose a novel and practical probabilistic framework to overcome many of the inherent problems in this thesis. The framework fuses multiple depth maps with color information forming a reliable and consistent view of the world. In order for the robot to interact with the environment, contextual information is required. To this end, I propose a region-growing segmentation algorithm to group points based on surface characteristics, surface normal and surface curvature. The segmentation process creates a distinct set of surfaces, however, only a limited amount of contextual information is available to allow for interaction. Therefore, a novel classifier is proposed using spherical harmonics to differentiate people from all other objects. The added ability to identify people allows the robot to find potential candidates to follow. However, for safe navigation, the robot must continuously track all visible objects to obtain positional and velocity information. A multi-object tracking system is investigated to track visible objects reliably using multiple cues, shape and color. The tracking system allows the robot to react to the dynamic nature of people by building an estimate of the motion flow. This flow provides the robot with the necessary information to determine where and at what speeds it is safe to drive. In addition, a novel search strategy is proposed to allow the robot to recover a guide who has left the field-of-view. To achieve this, a search map is constructed with areas of the environment ranked according to how likely they are to reveal the guide’s true location. Then, the robot can approach the most likely search area to recover the guide. Finally, all components presented are joined to follow a guide through an indoor environment. The results achieved demonstrate the efficacy of the proposed components

    A Methodology for Extracting Human Bodies from Still Images

    Get PDF
    Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them. One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach

    Human and Group Activity Recognition from Video Sequences

    Get PDF
    A good solution to human activity recognition enables the creation of a wide variety of useful applications such as applications in visual surveillance, vision-based Human-Computer-Interaction (HCI) and gesture recognition. In this thesis, a graph based approach to human activity recognition is proposed which models spatio-temporal features as contextual space-time graphs. In this method, spatio-temporal gradient cuboids were extracted at significant regions of activity, and feature graphs (gradient, space-time, local neighbours, immediate neighbours) are constructed using the similarity matrix. The Laplacian representation of the graph is utilised to reduce the computational complexity and to allow the use of traditional statistical classifiers. A second methodology is proposed to detect and localise abnormal activities in crowded scenes. This approach has two stages: training and identification. During the training stage, specific human activities are identified and characterised by employing modelling of medium-term movement flow through streaklines. Each streakline is formed by multiple optical flow vectors that represent and track locally the movement in the scene. A dictionary of activities is recorded for a given scene during the training stage. During the testing stage, the consistency of each observed activity with those from the dictionary is verified using the Kullback-Leibler (KL) divergence. The anomaly detection of the proposed methodology is compared to state of the art, producing state of the art results for localising anomalous activities. Finally, we propose an automatic group activity recognition approach by modelling the interdependencies of group activity features over time. We propose to model the group interdependences in both motion and location spaces. These spaces are extended to time-space and time-movement spaces and modelled using Kernel Density Estimation (KDE). The recognition performance of the proposed methodology shows an improvement in recognition performance over state of the art results on group activity datasets

    Recognising high-level agent behaviour through observations in data scarce domains

    Get PDF
    This thesis presents a novel method for performing multi-agent behaviour recognition without requiring large training corpora. The reduced need for data means that robust probabilistic recognition can be performed within domains where annotated datasets are traditionally unavailable (e.g. surveillance, defence). Human behaviours are composed from sequences of underlying activities that can be used as salient features. We do not assume that the exact temporal ordering of such features is necessary, so can represent behaviours using an unordered “bag-of-features”. A weak temporal ordering is imposed during inference to match behaviours to observations and replaces the learnt model parameters used by competing methods. Our three-tier architecture comprises low-level video tracking, event analysis and high-level inference. High-level inference is performed using a new, cascading extension of the Rao-Blackwellised Particle Filter. Behaviours are recognised at multiple levels of abstraction and can contain a mixture of solo and multiagent behaviour. We validate our framework using the PETS 2006 video surveillance dataset and our own video sequences, in addition to a large corpus of simulated data. We achieve a mean recognition precision of 96.4% on the simulated data and 89.3% on the combined video data. Our “bag-of-features” framework is able to detect when behaviours terminate and accurately explains agent behaviour despite significant quantities of low-level classification errors in the input, and can even detect agents who change their behaviour

    Multiple object tracking with context awareness

    Get PDF
    [no abstract
    • 

    corecore