3 research outputs found

    An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos

    Full text link
    Videos represent the primary source of information for surveillance applications and are available in large amounts but in most cases contain little or no annotation for supervised learning. This article reviews the state-of-the-art deep learning based methods for video anomaly detection and categorizes them based on the type of model and criteria of detection. We also perform simple studies to understand the different approaches and provide the criteria of evaluation for spatio-temporal anomaly detection.Comment: 15 pages, double colum

    Visual Analysis of Extremely Dense Crowded Scenes

    Get PDF
    Visual analysis of dense crowds is particularly challenging due to large number of individuals, occlusions, clutter, and fewer pixels per person which rarely occur in ordinary surveillance scenarios. This dissertation aims to address these challenges in images and videos of extremely dense crowds containing hundreds to thousands of humans. The goal is to tackle the fundamental problems of counting, detecting and tracking people in such images and videos using visual and contextual cues that are automatically derived from the crowded scenes. For counting in an image of extremely dense crowd, we propose to leverage multiple sources of information to compute an estimate of the number of individuals present in the image. Our approach relies on sources such as low confidence head detections, repetition of texture elements (using SIFT), and frequency-domain analysis to estimate counts, along with confidence associated with observing individuals, in an image region. Furthermore, we employ a global consistency constraint on counts using Markov Random Field which caters for disparity in counts in local neighborhoods and across scales. We tested this approach on crowd images with the head counts ranging from 94 to 4543 and obtained encouraging results. Through this approach, we are able to count people in images of high-density crowds unlike previous methods which are only applicable to videos of low to medium density crowded scenes. However, the counting procedure just outputs a single number for a large patch or an entire image. With just the counts, it becomes difficult to measure the counting error for a query image with unknown number of people. For this, we propose to localize humans by finding repetitive patterns in the crowd image. Starting with detections from an underlying head detector, we correlate them within the image after their selection through several criteria: in a pre-defined grid, locally, or at multiple scales by automatically finding the patches that are most representative of recurring patterns in the crowd image. Finally, the set of generated hypotheses is selected using binary integer quadratic programming with Special Ordered Set (SOS) Type 1 constraints. Human Detection is another important problem in the analysis of crowded scenes where the goal is to place a bounding box on visible parts of individuals. Primarily applicable to images depicting medium to high density crowds containing several hundred humans, it is a crucial pre-requisite for many other visual tasks, such as tracking, action recognition or detection of anomalous behaviors, exhibited by individuals in a dense crowd. For detecting humans, we explore context in dense crowds in the form of locally-consistent scale prior which captures the similarity in scale in local neighborhoods with smooth variation over the image. Using the scale and confidence of detections obtained from an underlying human detector, we infer scale and confidence priors using Markov Random Field. In an iterative mechanism, the confidences of detections are modified to reflect consistency with the inferred priors, and the priors are updated based on the new detections. The final set of detections obtained are then reasoned for occlusion using Binary Integer Programming where overlaps and relations between parts of individuals are encoded as linear constraints. Both human detection and occlusion reasoning in this approach are solved with local neighbor-dependent constraints, thereby respecting the inter-dependence between individuals characteristic to dense crowd analysis. In addition, we propose a mechanism to detect different combinations of body parts without requiring annotations for individual combinations. Once human detection and localization is performed, we then use it for tracking people in dense crowds. Similar to the use of context as scale prior for human detection, we exploit it in the form of motion concurrence for tracking individuals in dense crowds. The proposed method for tracking provides an alternative and complementary approach to methods that require modeling of crowd flow. Simultaneously, it is less likely to fail in the case of dynamic crowd flows and anomalies by minimally relying on previous frames. The approach begins with the automatic identification of prominent individuals from the crowd that are easy to track. Then, we use Neighborhood Motion Concurrence to model the behavior of individuals in a dense crowd, this predicts the position of an individual based on the motion of its neighbors. When the individual moves with the crowd flow, we use Neighborhood Motion Concurrence to predict motion while leveraging five-frame instantaneous flow in case of dynamically changing flow and anomalies. All these aspects are then embedded in a framework which imposes hierarchy on the order in which positions of individuals are updated. The results are reported on eight sequences of medium to high density crowds and our approach performs on par with existing approaches without learning or modeling patterns of crowd flow. We experimentally demonstrate the efficacy and reliability of our algorithms by quantifying the performance of counting, localization, as well as human detection and tracking on new and challenging datasets containing hundreds to thousands of humans in a given scene

    Estudio e implementación de algoritmos para detección de anomalías en entornos de videovigilancia

    Full text link
    Este Trabajo Fin de Grado tiene como objetivo el estudio y desarrollo de distintas técnicas de análisis de secuencias de videovigilancia que permitan la detección de anomalías presentes en las mismas para, a partir de ello, construir un sistema completo que permita llevar a cabo esta tarea. Basándose en la literatura y otros trabajos sobre detección de anomalías en vídeo, se ha diseñado el detector, compuesto por diferentes etapas de procesamiento que se han ido analizando de manera independiente. De este diseño, donde más se ha profundizado ha sido en la elección y extracción de características, decidiéndose por las características espaciotemporales de los gradientes de intensidad para este trabajo. A partir de estas, se ha establecido un método de entrenamiento y aprendizaje y se han implementado y analizado dos algoritmos para modelar el comportamiento de las escenas, evolucionando desde un sistema basado en distancias para detectar las anomalías hasta uno basado en niveles de pertenencia a cada clase, que son K-Means y Fuzzy C-Means. Se analizan las ventajas e inconvenientes de cada uno, y se encuentra el ajuste óptimo de ambos a través de una serie de pruebas predefinidas para ello. El estudio de los resultados se realiza a través de secuencias de videovigilancia reales en diferentes entornos y situaciones, para todas las distintas secuencias utilizadas y para cada uno de los modelos propuestos, siguiendo de esta manera una configuración de lo particular a lo general motivado por la poca similitud existente entre cada una de las secuencias. Para concluir el trabajo, se verifica si se han alcanzado los objetivos marcados y se trata de extraer conclusiones sobre las características empleadas y los algoritmos implementados, así como establecer una serie de posibles mejoras futuras para el sistema.This Bachelor Thesis aims to study and develop different techniques for de analysis of video sequences in a surveillance environment in order to detect anomalies present in the scene, and therefore build a complete system to carry out this task. Based on the literature and other papers about anomaly detection in video, the detector has been designed from blocks, and every block has been analyzed separately. In this design, there has been deepened specially in the feature choice and extraction, being decided in this thesis for the spatio-temporal features of the intensity gradients. From these, it has been stablished a training and learning method, and two algorithms for modeling the scene behavior have been implemented and analyzed, evolving from a distance based system to detect anomalies until one based on the amount of membership to each class which are K-Means and Fuzzy C-Means. The advantages and disadvantages of each method are analyzed and on adequate adjustment is found throughout a series of predefined tests. The analysis of results is done with real video surveillance sequences from different environments and situations, for every sequence and every model proposed, thus following a configuration from particularity to generality, motivated by the small similarity between each of the sequences. In order to finish the thesis, it is verified if the marked objectives have been reached and it is tried to extract conclusions about the features and algorithms used and implemented, as well as propose some possible future improvements for the system
    corecore