Search CORE

33 research outputs found

Panoramic Background Modeling for PTZ Cameras with Competitive Learning Neural Networks

Author: Domínguez Enrique
Luque-Baena Rafael Marcos
López-Rubio Ezequiel
Molina-Cabello Miguel A.
Thurnhofer-Hemsi Karl
Publication venue
Publication date: 29/05/2017
Field of study

The construction of a model of the background of a scene still remains as a challenging task in video surveillance systems, in particular for moving cameras. This work presents a novel approach for constructing a panoramic background model based on competitive learning neural networks and a subsequent piecewise linear interpolation by Delaunay triangulation. The approach can handle arbitrary camera directions and zooms for a Pan-Tilt-Zoom (PTZ) camera-based surveillance system. After testing the proposed approach on several indoor sequences, the results demonstrate that the proposed method is effective and suitable to use for real-time video surveillance applications.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga

Robust computational intelligence techniques for visual information processing

Author: Thurnhofer Hemsi Karl Khader
Publication venue: UMA Editorial
Publication date: 24/03/2021
Field of study

The third part is exclusively dedicated to the super-resolution of Magnetic Resonance Images. In one of these works, an algorithm based on the random shifting technique is developed. Besides, we studied noise removal and resolution enhancement simultaneously. To end, the cost function of deep networks has been modified by different combinations of norms in order to improve their training. Finally, the general conclusions of the research are presented and discussed, as well as the possible future research lines that are able to make use of the results obtained in this Ph.D. thesis.This Ph.D. thesis is about image processing by computational intelligence techniques. Firstly, a general overview of this book is carried out, where the motivation, the hypothesis, the objectives, and the methodology employed are described. The use and analysis of different mathematical norms will be our goal. After that, state of the art focused on the applications of the image processing proposals is presented. In addition, the fundamentals of the image modalities, with particular attention to magnetic resonance, and the learning techniques used in this research, mainly based on neural networks, are summarized. To end up, the mathematical framework on which this work is based on, ₚ-norms, is defined. Three different parts associated with image processing techniques follow. The first non-introductory part of this book collects the developments which are about image segmentation. Two of them are applications for video surveillance tasks and try to model the background of a scenario using a specific camera. The other work is centered on the medical field, where the goal of segmenting diabetic wounds of a very heterogeneous dataset is addressed. The second part is focused on the optimization and implementation of new models for curve and surface fitting in two and three dimensions, respectively. The first work presents a parabola fitting algorithm based on the measurement of the distances of the interior and exterior points to the focus and the directrix. The second work changes to an ellipse shape, and it ensembles the information of multiple fitting methods. Last, the ellipsoid problem is addressed in a similar way to the parabola

Repositorio Institucional Universidad de Málaga

A Deep Moving-camera Background Model

Author: Erez Guy
Freifeld Oren
Weber Ron Shapira
Publication venue
Publication date: 16/09/2022
Field of study

In video analysis, background models have many applications such as background/foreground separation, change detection, anomaly detection, tracking, and more. However, while learning such a model in a video captured by a static camera is a fairly-solved task, in the case of a Moving-camera Background Model (MCBM), the success has been far more modest due to algorithmic and scalability challenges that arise due to the camera motion. Thus, existing MCBMs are limited in their scope and their supported camera-motion types. These hurdles also impeded the employment, in this unsupervised task, of end-to-end solutions based on deep learning (DL). Moreover, existing MCBMs usually model the background either on the domain of a typically-large panoramic image or in an online fashion. Unfortunately, the former creates several problems, including poor scalability, while the latter prevents the recognition and leveraging of cases where the camera revisits previously-seen parts of the scene. This paper proposes a new method, called DeepMCBM, that eliminates all the aforementioned issues and achieves state-of-the-art results. Concretely, first we identify the difficulties associated with joint alignment of video frames in general and in a DL setting in particular. Next, we propose a new strategy for joint alignment that lets us use a spatial transformer net with neither a regularization nor any form of specialized (and non-differentiable) initialization. Coupled with an autoencoder conditioned on unwarped robust central moments (obtained from the joint alignment), this yields an end-to-end regularization-free MCBM that supports a broad range of camera motions and scales gracefully. We demonstrate DeepMCBM's utility on a variety of videos, including ones beyond the scope of other methods. Our code is available at https://github.com/BGU-CS-VIL/DeepMCBM .Comment: 26 paged, 5 figures. To be published in ECCV 202

arXiv.org e-Print Archive

QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios

Author: Neves João Carlos Raposo
Publication venue
Publication date: 01/06/2018
Field of study

The concerns about individuals security have justified the increasing number of surveillance cameras deployed both in private and public spaces. However, contrary to popular belief, these devices are in most cases used solely for recording, instead of feeding intelligent analysis processes capable of extracting information about the observed individuals. Thus, even though video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant details about the subjects that took part in a crime depends on the manual inspection of recordings. As such, the current goal of the research community is the development of automated surveillance systems capable of monitoring and identifying subjects in surveillance scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at designing a visual surveillance system capable of acquiring biometric data at a distance (e.g., face, iris or gait) without requiring human intervention in the process, as well as devising biometric recognition methods robust to the degradation factors resulting from the unconstrained acquisition process. Regarding the first goal, the analysis of the data acquired by typical surveillance systems shows that large acquisition distances significantly decrease the resolution of biometric samples, and thus their discriminability is not sufficient for recognition purposes. In the literature, diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring high-resolution imagery at a distance, particularly when using a master-slave configuration. In the master-slave configuration, the video acquired by a typical surveillance camera is analyzed for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged at high-resolution by the PTZ camera. Several methods have already shown that this configuration can be used for acquiring biometric data at a distance. Nevertheless, these methods failed at providing effective solutions to the typical challenges of this strategy, restraining its use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development of a biometric data acquisition system based on the cooperation of a PTZ camera with a typical surveillance camera. The first proposal is a camera calibration method capable of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ camera. The second proposal is a camera scheduling method for determining - in real-time - the sequence of acquisitions that maximizes the number of different targets obtained, while minimizing the cumulative transition time. In order to achieve the first goal of this thesis, both methods were combined with state-of-the-art approaches of the human monitoring field to develop a fully automated surveillance capable of acquiring biometric data at a distance and without human cooperation, designated as QUIS-CAMPI system. The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis of the performance of the state-of-the-art biometric recognition approaches shows that these approaches attain almost ideal recognition rates in unconstrained data. However, this performance is incongruous with the recognition rates observed in surveillance scenarios. Taking into account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a distance ranging from 5 to 40 meters and without human intervention in the acquisition process. This set allows to objectively assess the performance of state-of-the-art biometric recognition methods in data that truly encompass the covariates of surveillance scenarios. As such, this set was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained by the nine methods specially designed for this competition. In addition, the data acquired by the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the development of methods robust to the covariates of surveillance scenarios. The first proposal regards a method for detecting corrupted features in biometric signatures inferred by a redundancy analysis algorithm. The second proposal is a caricature-based face recognition approach capable of enhancing the recognition performance by automatically generating a caricature from a 2D photo. The experimental evaluation of these methods shows that both approaches contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivíduos tem justificado o crescimento do número de câmaras de vídeo-vigilância instaladas tanto em espaços privados como públicos. Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente capaz de inferir em tempo real informações sobre os indivíduos observados. Assim, apesar de a vídeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda confinado à disponibilização de vídeos que têm que ser manualmente inspecionados para extrair informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal desafio da comunidade científica é o desenvolvimento de sistemas automatizados capazes de monitorizar e identificar indivíduos em ambientes de vídeo-vigilância. Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento biométrico aos ambientes de vídeo-vigilância. De forma mais especifica, pretende-se 1) conceber um sistema de vídeo-vigilância que consiga adquirir dados biométricos a longas distâncias (e.g., imagens da cara, íris, ou vídeos do tipo de passo) sem requerer a cooperação dos indivíduos no processo; e 2) desenvolver métodos de reconhecimento biométrico robustos aos fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas. No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas típicos de vídeo-vigilância mostra que, devido à distância de captura, os traços biométricos amostrados não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis. Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave. Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse (e.g. carros, pessoas) a partir do vídeo adquirido por uma câmara de vídeo-vigilância e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados com esta estratégia, impedindo assim o seu uso em ambientes de vídeo-vigilância. Deste modo, esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de vídeo-vigilância usando uma câmara PTZ assistida por uma câmara típica de vídeo-vigilância. O primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara master para o ângulo da câmara PTZ (slave) sem o auxílio de outros dispositivos óticos. O segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações que maximiza o número de diferentes sujeitos observados e simultaneamente minimiza o tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os dois métodos propostos foram combinados com os avanços alcançados na área da monitorização de humanos para assim desenvolver o primeiro sistema de vídeo-vigilância completamente automatizado e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação dos indivíduos no processo, designado por sistema QUIS-CAMPI. O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho não é corroborado pelos resultados observados em ambientes de vídeo-vigilância, o que sugere que os conjuntos de dados atuais não contêm verdadeiramente os fatores de degradação típicos dos ambientes de vídeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biométricos atuais, esta tese introduz um novo conjunto de dados biométricos (imagens da face e vídeos do tipo de passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho dos métodos do estado-da-arte no reconhecimento de indivíduos em imagens/vídeos capturados num ambiente real de vídeo-vigilância. Como tal, este conjunto foi utilizado para promover a primeira competição de reconhecimento biométrico em ambientes não controlados. Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9 métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para aumentar a robustez aos fatores de degradação observados em ambientes de vídeo-vigilância. O primeiro é um método para detetar características corruptas em assinaturas biométricas através da análise da redundância entre subconjuntos de características. O segundo é um método de reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma única foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir as taxas de erro em dados adquiridos de forma não controlada

UBibliorum repositorio digital da ubi

Systems and Algorithms for Automated Collaborative Observation using Networked Robotic Cameras

Author: Xu Yiliang
Publication venue
Publication date
Field of study

The development of telerobotic systems has evolved from Single Operator Single Robot (SOSR) systems to Multiple Operator Multiple Robot (MOMR) systems. The relationship between human operators and robots follows the master-slave control architecture and the requests for controlling robot actuation are completely generated by human operators. Recently, the fast evolving advances in network and computer technologies and decreasing size and cost of sensors and robots enable us to further extend the MOMR system architecture to incorporate heterogeneous components such as humans, robots, sensors, and automated agents. The requests for controlling robot actuation are generated by all the participants. We term it as the MOMR++ system. However, to reach the best potential and performance of the system, there are many technical challenges needing to be addressed. In this dissertation, we address two major challenges in the MOMR++ system development. We first address the robot coordination and planning issue in the application of an autonomous crowd surveillance system. The system consists of multiple robotic pan-tilt-zoom (PTZ) cameras assisted with a fixed wide-angle camera. The wide-angle camera provides an overview of the scene and detects moving objects, which are required for close-up views using the PTZ cameras. When applied to the pedestrian surveillance application and compared to a previous work, the system achieves increasing number of observed objects by over 210% in heavy traffic scenarios. The key issue here is given the limited number (e.g., p (p > 0)) of PTZ cameras and many more (e.g., n (n >> p)) observation requests, how to coordinate the cameras to best satisfy all the requests. We formulate this problem as a new camera resource allocation problem. Given p cameras, n observation requests, and [epsilon] being approximation bound, we develop an approximation algorithm running in O(n/[epsilon]³ + p²/[epsilon]⁶) time, and an exact algorithm, when p = 2, running in O(n³) time. We then address the automatic object content analysis and recognition issue in the application of an autonomous rare bird species detection system. We set up the system in the forest near Brinkley, Arkansas. The camera monitors the sky, detects motions, and preserves video data for only those targeted bird species. During the one-year search, the system reduces the raw video data of 29.41TB to only 146.7MB (reduction rate 99.9995%). The key issue here is to automatically recognize the flying bird species. We verify the bird body axis dynamic information by an extended Kalman filter (EKF) and compare the bird dynamic state with the prior knowledge of the targeted bird species. We quantify the uncertainty in recognition due to the measurement uncertainty and develop a novel Probable Observation Data Set (PODS)-based EKF method. In experiments with real video data, the algorithm achieves 95% area under the receiver operating characteristic (ROC) curve. Through the exploration of the two MOMR++ systems, we conclude that the new MOMR++ system architecture enables much wider range of participants, enhances the collaboration and interaction between participants so that information can be exchanged in between, suppresses the chance of any individual bias or mistakes in the observation process, and further frees humans from the control/observation process by providing automatic control/observation. The new MOMR++ system architecture is a promising direction for future telerobtics advances

Texas A&M Repository

Real-Time, Multiple Pan/Tilt/Zoom Computer Vision Tracking and 3D Positioning System for Unmanned Aerial System Metrology

Author: Doyle Daniel D.
Publication venue: AFIT Scholar
Publication date: 26/12/2013
Field of study

The study of structural characteristics of Unmanned Aerial Systems (UASs) continues to be an important field of research for developing state of the art nano/micro systems. Development of a metrology system using computer vision (CV) tracking and 3D point extraction would provide an avenue for making these theoretical developments. This work provides a portable, scalable system capable of real-time tracking, zooming, and 3D position estimation of a UAS using multiple cameras. Current state-of-the-art photogrammetry systems use retro-reflective markers or single point lasers to obtain object poses and/or positions over time. Using a CV pan/tilt/zoom (PTZ) system has the potential to circumvent their limitations. The system developed in this paper exploits parallel-processing and the GPU for CV-tracking, using optical flow and known camera motion, in order to capture a moving object using two PTU cameras. The parallel-processing technique developed in this work is versatile, allowing the ability to test other CV methods with a PTZ system using known camera motion. Utilizing known camera poses, the object\u27s 3D position is estimated and focal lengths are estimated for filling the image to a desired amount. This system is tested against truth data obtained using an industrial system

AFTI Scholar (Air Force Institute of Technology)

Segmentación y detección de objetos en imágenes y vídeo mediante inteligencia computacional

Author: Molina Cabello Miguel Ángel
Publication venue: UMA Editorial
Publication date: 01/01/2018
Field of study

Finalmente, se exponen las conclusiones obtenidas tras la realización de esta tesis y unas posibles líneas futuras de investigación. Fecha de lectura de Tesis: 17 diciembre 2018.La presente tesis trata sobre el procesamiento y análisis de imágenes y video mediante sistemas informáticos. Primeramente se hace una introducción, especificando contexto, objetivos y metodología. Luego se muestran los antecedentes, los fundamentos de la videovigilancia, las dificultades existentes y diversos algoritmos del estado del arte, seguido de las principales características del aprendizaje profundo, transporte inteligente y sistemas con cámara PTZ, finalizando con la evaluación de métodos y distintos conjuntos de datos. Después se muestran tres partes. La primera comenta los estudios desarrollados que tratan sobre segmentación. Aquí se explican diferentes modelos desarrollados cuyo objetivo es la detección de objetos, tanto usando hardware genérico o especifico como en ámbitos específicos, o un estudio de cómo influye la reducción del tamaño de las imágenes al rendimiento de los algoritmos. La segunda parte describe los trabajos que utilizan una cámara PTZ. El primero trabajo hace un seguimiento del objeto más anómalo del escenario, siendo el propio sistema el que decide cuáles son anómalos y cuáles no; el segundo muestra un sistema que indica a la cámara los movimientos a realizar en función de la salida producida por un modelo de fondo no panorámico y mejorada con un gas neuronal creciente. La tercera parte trata sobre los estudios desarrollados con relación con el transporte inteligente, como es la clasificación de los vehículos que aparecen en secuencias de tráfico. El primer trabajo aplica técnicas tradicionales como segmentación y extracción de rasgos; el segundo utiliza segmentación y redes convolucionales, complementado con un estudio del redimensionado de imágenes para proveerlas en el formato necesario a cada red; y el tercero emplea un modelo que detecta y clasifica objetos, estimando posteriormente la contaminación generada por los vehículos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Deep learning algorithms for background subtraction and people detection

Author: Tezcan M. Ozan
Publication venue
Publication date: 27/09/2021
Field of study

Video cameras are commonly used today in surveillance and security, autonomous driving and flying, manufacturing and healthcare. While different applications seek different types of information from the video streams, detecting changes and finding people are two key enablers for many of them. This dissertation focuses on both of these tasks: change detection, also known as background subtraction, and people detection from overhead fisheye cameras, an emerging research topic. Background subtraction has been thoroughly researched to date and the top-performing algorithms are data-driven and supervised. Crucially, during training these algorithms rely on the availability of some annotated frames from the video being tested. Instead, we propose a novel, supervised background-subtraction algorithm for unseen videos based on a fully-convolutional neural network. The input to our network consists of the current frame and two background frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance of overfitting, we introduce novel temporal and spatio-temporal data-augmentation methods. We also propose a cross-validation training/evaluation strategy for the largest change-detection dataset, CDNet-2014, that allows a fair and video-agnostic performance comparison of supervised algorithms. Overall, our algorithm achieves significant performance gains over state of the art in terms of F-measure, recall and precision. Furthermore, we develop a real-time variant of our algorithm with performance close to that of the state of the art. Owing to their large field of view, fisheye cameras mounted overhead are becoming a surveillance modality of choice for large indoor spaces. However, due to their top-down viewpoint and unique optics, standing people appear radially oriented and radially distorted in fisheye images. Therefore, traditional people detection, tracking and recognition algorithms developed for standard cameras do not perform well on fisheye images. To address this, we introduce several novel people-detection algorithms for overhead fisheye cameras. Our first two algorithms address the issue of radial body orientation by applying a rotating-window approach. This approach leverages a state-of-the-art object-detection algorithm trained on standard images and applies additional pre- and post-processing to detect radially-oriented people. Our third algorithm addresses both the radial body orientation and distortion by applying an end-to-end neural network with a novel angle-aware loss function and training on fisheye images. This algorithm outperforms the first two approaches and is two orders of magnitude faster. Finally, we introduce three spatio-temporal extensions of the end-to-end approach to deal with intermittent misses and false detections. In order to evaluate the performance of our algorithms, we collected, annotated and made publicly available four datasets composed of overhead fisheye videos. We provide a detailed analysis of our algorithms on these datasets and show that they significantly outperform the current state of the art

Boston University Institutional Repository (OpenBU)