517 research outputs found

    Spatio-temporal Texture Modelling for Real-time Crowd Anomaly Detection

    Get PDF
    With the rapidly increasing demands from surveillance and security industries, crowd behaviour analysis has become one of the hotly pursued video event detection frontiers within the computer vision arena in recent years. This research has investigated innovative crowd behaviour detection approaches based on statistical crowd features extracted from video footages. In this paper, a new crowd video anomaly detection algorithm has been developed based on analysing the extracted spatio-temporal textures. The algorithm has been designed for real-time applications by deploying low-level statistical features and alleviating complicated machine learning and recognition processes. In the experiments, the system has been proven a valid solution for detecting anomaly behaviours without strong assumptions on the nature of crowds, for example, subjects and density. The developed prototype shows improved adaptability and efficiency against chosen benchmark systems

    Online encoder-decoder anomaly detection using encoder-decoder architecture with novel self-configuring neural networks & pure linear genetic programming for embedded systems

    Get PDF
    Recent anomaly detection techniques focus on the use of neural networks and an encoder-decoder architecture. However, these techniques lead to trade offs if implemented in an embedded environment such as high heat management, power consumption and hardware costs. This paper presents two related new methods for anomaly detection within data sets gathered from an autonomous mini-vehicle with a CAN bus. The first method which to the best of our knowledge is the first use of encoder-decoder architecture for anomaly detection using linear genetic programming (LGP). Second method uses self-configuring neural network that is created using evolutionary algorithm paradigm learning both architecture and weights suitable for embedded systems. Both approaches have the following advantages: it is inexpensive regarding resource use, can be run on almost any embedded board due to linear register machine advantages in computation. The proposed methods are also faster by at least one order of magnitude, and it includes both inference and complete training

    Urban Anomaly Analytics: Description, Detection, and Prediction

    Get PDF
    Urban anomalies may result in loss of life or property if not handled properly. Automatically alerting anomalies in their early stage or even predicting anomalies before happening is of great value for populations. Recently, data-driven urban anomaly analysis frameworks have been forming, which utilize urban big data and machine learning algorithms to detect and predict urban anomalies automatically. In this survey, we make a comprehensive review of the state-of-the-art research on urban anomaly analytics. We first give an overview of four main types of urban anomalies, traffic anomaly, unexpected crowds, environment anomaly, and individual anomaly. Next, we summarize various types of urban datasets obtained from diverse devices, i.e., trajectory, trip records, CDRs, urban sensors, event records, environment data, social media and surveillance cameras. Subsequently, a comprehensive survey of issues on detecting and predicting techniques for urban anomalies is presented. Finally, research challenges and open problems as discussed.Peer reviewe

    SCALABALE AND DISTRIBUTED METHODS FOR LARGE-SCALE VISUAL COMPUTING

    Get PDF
    The objective of this research work is to develop efficient, scalable, and distributed methods to meet the challenges associated with the processing of immense growth in visual data like images, videos, etc. The motivation stems from the fact that the existing computer vision approaches are computation intensive and cannot scale-up to carry out analysis on the large collection of data as well as to perform the real-time inference on the resourceconstrained devices. Some of the issues encountered are: 1) increased computation time for high-level representation from low-level features, 2) increased training time for classification methods, and 3) carry out analysis in real-time on the live video streams in a city-scale surveillance network. The issue of scalability can be addressed by model approximation and distributed implementation of computer vision algorithms. But existing scalable approaches suffer from the high loss in model approximation and communication overhead. In this thesis, our aim is to address some of the issues by proposing efficient methods for reducing the training time over large datasets in a distributed environment, and for real-time inference on resource-constrained devices by scaling-up computation-intensive methods using the model approximation. A scalable method Fast-BoW is presented for reducing the computation time of bagof-visual-words (BoW) feature generation for both hard and soft vector-quantization with time complexities O(|h| log2 k) and O(|h| k), respectively, where |h| is the size of the hash table used in the proposed approach and k is the vocabulary size. We replace the process of finding the closest cluster center with a softmax classifier which improves the cluster boundaries over k-means and can also be used for both hard and soft BoW encoding. To make the model compact and faster, the real weights are quantized into integer weights which can be represented using few bits (2 − 8) only. Also, on the quantized weights, the hashing is applied to reduce the number of multiplications which accelerate the entire process. Further the effectiveness of the video representation is improved by exploiting the structural information among the various entities or same entity over the time which is generally ignored by BoW representation. The interactions of the entities in a video are formulated as a graph of geometric relations among space-time interest points. The activities represented as graphs are recognized using a SVM with low complexity graph kernels, namely, random walk kernel (O(n3)) and Weisfeiler-Lehman kernel (O(n)). The use of graph kernel provides robustness to slight topological deformations, which may occur due to the presence of noise and viewpoint variation in data. The further issues such as computation and storage of the large kernel matrix are addressed using the Nystrom method for kernel linearization. The second major contribution is in reducing the time taken in learning of kernel supvi port vector machine (SVM) from large datasets using distributed implementation while sustaining classification performance. We propose Genetic-SVM which makes use of the distributed genetic algorithm to reduce the time taken in solving the SVM objective function. Further, the data partitioning approaches achieve better speed-up than distributed algorithm approaches but invariably leads to the loss in classification accuracy as global support vectors may not have been chosen as local support vectors in their respective partitions. Hence, we propose DiP-SVM, a distribution preserving kernel SVM where the first and second order statistics of the entire dataset are retained in each of the partitions. This helps in obtaining local decision boundaries which are in agreement with the global decision boundary thereby reducing the chance of missing important global support vectors. Further, the task of combining the local SVMs hinder the training speed. To address this issue, we propose Projection-SVM, using subspace partitioning where a decision tree is constructed on a projection of data along the direction of maximum variance to obtain smaller partitions of the dataset. On each of these partitions, a kernel SVM is trained independently, thereby reducing the overall training time. Also, it results in reducing the prediction time significantly. Another issue addressed is the recognition of traffic violations and incidents in real-time in a city-scale surveillance scenario. The major issues are accurate detection and real-time inference. The central computing infrastructures are unable to perform in real-time due to large network delay from video sensor to the central computing server. We propose an efficient framework using edge computing for deploying large-scale visual computing applications which reduces the latency and the communication overhead in a camera network. This framework is implemented for two surveillance applications, namely, motorcyclists without a helmet and accident incident detection. An efficient cascade of convolutional neural networks (CNNs) is proposed for incrementally detecting motorcyclists and their helmets in both sparse and dense traffic. This cascade of CNNs shares common representation in order to avoid extra computation and over-fitting. The accidents of the vehicles are modeled as an unusual incident. The deep representation is extracted using denoising stacked auto-encoders trained from the spatio-temporal video volumes of normal traffic videos. The possibility of an accident is determined based on the reconstruction error and the likelihood of the deep representation. For the likelihood of the deep representation, an unsupervised model is trained using one class SVM. Also, the intersection points of the vehicle’s trajectories are used to reduce the false alarm rate and increase the reliability of the overall system. Both the approaches are evaluated on the real traffic videos collected from the video surveillance network of Hyderabad city in India. The experiments on the real traffic videos demonstrate the efficacy of the proposed approache

    Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

    Full text link
    Video anomaly detection (VAD) has been paid increasing attention due to its potential applications, its current dominant tasks focus on online detecting anomalies% at the frame level, which can be roughly interpreted as the binary or multiple event classification. However, such a setup that builds relationships between complicated anomalous events and single labels, e.g., ``vandalism'', is superficial, since single labels are deficient to characterize anomalous events. In reality, users tend to search a specific video rather than a series of approximate videos. Therefore, retrieving anomalous events using detailed descriptions is practical and positive but few researches focus on this. In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e.g., language descriptions and synchronous audios. Unlike the current video retrieval where videos are assumed to be temporally well-trimmed with short duration, VAR is devised to retrieve long untrimmed videos which may be partially relevant to the given query. To achieve this, we present two large-scale VAR benchmarks, UCFCrime-AR and XDViolence-AR, constructed on top of prevalent anomaly datasets. Meanwhile, we design a model called Anomaly-Led Alignment Network (ALAN) for VAR. In ALAN, we propose an anomaly-led sampling to focus on key segments in long untrimmed videos. Then, we introduce an efficient pretext task to enhance semantic associations between video-text fine-grained representations. Besides, we leverage two complementary alignments to further match cross-modal contents. Experimental results on two benchmarks reveal the challenges of VAR task and also demonstrate the advantages of our tailored method.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl
    corecore