84 research outputs found

    Crowd Recognition System Based on Optical Flow Along with SVM classifier

    Get PDF
    The manuscript discusses about abnormalities in a crowded scenario. To prevent the mishap at a public place, there is no much mechanism which could prevent or alert the concerned authority about suspects in a crowd. Usually in a crowded scene, there are chances of some mishap like a terrorist attack or a crime. Our target is finding techniques to identify such activities and to possibly prevent them. If the crowd members exhibit abnormal behavior, we could identify and say that this particular person is a suspect and then the concerned authority would look into the matter. There are various methods to identify the abnormal behavior. The proposed approach is based on optical flow model. It has an ability to detect the sudden changes in motion of an individual among the crowd. First, the main region of motion is extracted by the help of motion heat map. Harris corner detector is used for extracting point of interest of extracted motion area. Based on the point of interest an optical flow is estimated here. After analyzing this optical flow model, a threshold value is fixed. Basically optical flow is an energy level of individual frame. The threshold value is forwarded to SVM classifier, which produces a better result with 99.71% accuracy. This approach is very useful in real time video surveillance system where a machine can monitor unwanted crowd activity.

    Discovery and recognition of motion primitives in human activities

    Get PDF
    We present a novel framework for the automatic discovery and recognition of motion primitives in videos of human activities. Given the 3D pose of a human in a video, human motion primitives are discovered by optimizing the `motion flux', a quantity which captures the motion variation of a group of skeletal joints. A normalization of the primitives is proposed in order to make them invariant with respect to a subject anatomical variations and data sampling rate. The discovered primitives are unknown and unlabeled and are unsupervisedly collected into classes via a hierarchical non-parametric Bayes mixture model. Once classes are determined and labeled they are further analyzed for establishing models for recognizing discovered primitives. Each primitive model is defined by a set of learned parameters. Given new video data and given the estimated pose of the subject appearing on the video, the motion is segmented into primitives, which are recognized with a probability given according to the parameters of the learned models. Using our framework we build a publicly available dataset of human motion primitives, using sequences taken from well-known motion capture datasets. We expect that our framework, by providing an objective way for discovering and categorizing human motion, will be a useful tool in numerous research fields including video analysis, human inspired motion generation, learning by demonstration, intuitive human-robot interaction, and human behavior analysis

    Automatic human behaviour anomaly detection in surveillance video

    Get PDF
    This thesis work focusses upon developing the capability to automatically evaluate and detect anomalies in human behaviour from surveillance video. We work with static monocular cameras in crowded urban surveillance scenarios, particularly air- ports and commercial shopping areas. Typically a person is 100 to 200 pixels high in a scene ranging from 10 - 20 meters width and depth, populated by 5 to 40 peo- ple at any given time. Our procedure evaluates human behaviour unobtrusively to determine outlying behavioural events, agging abnormal events to the operator. In order to achieve automatic human behaviour anomaly detection we address the challenge of interpreting behaviour within the context of the social and physical environment. We develop and evaluate a process for measuring social connectivity between individuals in a scene using motion and visual attention features. To do this we use mutual information and Euclidean distance to build a social similarity matrix which encodes the social connection strength between any two individuals. We de- velop a second contextual basis which acts by segmenting a surveillance environment into behaviourally homogeneous subregions which represent high tra c slow regions and queuing areas. We model the heterogeneous scene in homogeneous subgroups using both contextual elements. We bring the social contextual information, the scene context, the motion, and visual attention features together to demonstrate a novel human behaviour anomaly detection process which nds outlier behaviour from a short sequence of video. The method, Nearest Neighbour Ranked Outlier Clusters (NN-RCO), is based upon modelling behaviour as a time independent se- quence of behaviour events, can be trained in advance or set upon a single sequence. We nd that in a crowded scene the application of Mutual Information-based social context permits the ability to prevent self-justifying groups and propagate anomalies in a social network, granting a greater anomaly detection capability. Scene context uniformly improves the detection of anomalies in all the datasets we test upon. We additionally demonstrate that our work is applicable to other data domains. We demonstrate upon the Automatic Identi cation Signal data in the maritime domain. Our work is capable of identifying abnormal shipping behaviour using joint motion dependency as analogous for social connectivity, and similarly segmenting the shipping environment into homogeneous regions

    Crowd Scene Analysis in Video Surveillance

    Get PDF
    There is an increasing interest in crowd scene analysis in video surveillance due to the ubiquitously deployed video surveillance systems in public places with high density of objects amid the increasing concern on public security and safety. A comprehensive crowd scene analysis approach is required to not only be able to recognize crowd events and detect abnormal events, but also update the innate learning model in an online, real-time fashion. To this end, a set of approaches for Crowd Event Recognition (CER) and Abnormal Event Detection (AED) are developed in this thesis. To address the problem of curse of dimensionality, we propose a video manifold learning method for crowd event analysis. A novel feature descriptor is proposed to encode regional optical flow features of video frames, where adaptive quantization and binarization of the feature code are employed to improve the discriminant ability of crowd motion patterns. Using the feature code as input, a linear dimensionality reduction algorithm that preserves both the intrinsic spatial and temporal properties is proposed, where the generated low-dimensional video manifolds are conducted for CER and AED. Moreover, we introduce a framework for AED by integrating a novel incremental and decremental One-Class Support Vector Machine (OCSVM) with a sliding buffer. It not only updates the model in an online fashion with low computational cost, but also adapts to concept drift by discarding obsolete patterns. Furthermore, the framework has been improved by introducing Multiple Incremental and Decremental Learning (MIDL), kernel fusion, and multiple target tracking, which leads to more accurate and faster AED. In addition, we develop a framework for another video content analysis task, i.e., shot boundary detection. Specifically, instead of directly assessing the pairwise difference between consecutive frames over time, we propose to evaluate a divergence measure between two OCSVM classifiers trained on two successive frame sets, which is more robust to noise and gradual transitions such as fade-in and fade-out. To speed up the processing procedure, the two OCSVM classifiers are updated online by the MIDL proposed for AED. Extensive experiments on five benchmark datasets validate the effectiveness and efficiency of our approaches in comparison with the state of the art

    Analysis Of Behaviors In Crowd Videos

    Get PDF
    In this dissertation, we address the problem of discovery and representation of group activity of humans and objects in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a discriminative representation of human motion in social settings, which captures a wide variety of human activities observable in video sequences. Such motion emerges from the collective behavior of individuals and their interactions and is a significant source of information typically employed for applications such as event detection, behavior recognition, and activity recognition. We present new representations of human group motion for static cameras, and propose algorithms for their application to variety of problems. We first propose a method to model and learn the scene activity of a crowd using Social Force Model for the first time in the computer vision community. We present a method to densely estimate the interaction forces between people in a crowd, observed by a static camera. Latent Dirichlet Allocation (LDA) is used to learn the model of the normal activities over extended periods of time. Randomly selected spatio-temporal volumes of interaction forces are used to learn the model of normal behavior of the scene. The model encodes the latent topics of social interaction forces in the scene for normal behaviors. We classify a short video sequence of n frames as normal or abnormal by using the learnt model. Once a sequence of frames is classified as an abnormal, iii the regions of anomalies in the abnormal frames are localized using the magnitude of interaction forces. The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a global estimation of the interaction forces within the crowd. It, therefore, is incapable of identifying different groups of objects based on motion or behavior in the scene. Although the algorithm is capable of learning the normal behavior and detects the abnormality, but it is incapable of capturing the dynamics of different behaviors. To overcome these limitations, we then propose a method based on the Lagrangian framework for fluid dynamics, by introducing a streakline representation of flow. Streaklines are traced in a fluid flow by injecting color material, such as smoke or dye, which is transported with the flow and used for visualization. In the context of computer vision, streaklines may be used in a similar way to transport information about a scene, and they are obtained by repeatedly initializing a fixed grid of particles at each frame, then moving both current and past particles using optical flow. Streaklines are the locus of points that connect particles which originated from the same initial position. This approach is advantageous over the previous representations in two aspects: first, its rich representation captures the dynamics of the crowd and changes in space and time in the scene where the optical flow representation is not enough, and second, this model is capable of discovering groups of similar behavior within a crowd scene by performing motion segmentation. We propose a method to distinguish different group behaviors such as divergent/convergent motion and lanes using this framework. Finally, we introduce flow potentials as a discriminative feature to iv recognize crowd behaviors in a scene. Results of extensive experiments are presented for multiple real life crowd sequences involving pedestrian and vehicular traffic. The proposed method exploits optical flow as the low level feature and performs integration and clustering to obtain coherent group motion patterns. However, we observe that in crowd video sequences, as well as a variety of other vision applications, the co-occurrence and inter-relation of motion patterns are the main characteristics of group behaviors. In other words, the group behavior of objects is a mixture of individual actions or behaviors in specific geometrical layout and temporal order. We, therefore, propose a new representation for group behaviors of humans using the interrelation of motion patterns in a scene. The representation is based on bag of visual phrases of spatio-temporal visual words. We present a method to match the high-order spatial layout of visual words that preserve the geometry of the visual words under similarity transformations. To perform the experiments we collected a dataset of group choreography performances from the YouTube website. The dataset currently contains four categories of group dances

    Semantic Spaces for Video Analysis of Behaviour

    Get PDF
    PhDThere are ever growing interests from the computer vision community into human behaviour analysis based on visual sensors. These interests generally include: (1) behaviour recognition - given a video clip or specific spatio-temporal volume of interest discriminate it into one or more of a set of pre-defined categories; (2) behaviour retrieval - given a video or textual description as query, search for video clips with related behaviour; (3) behaviour summarisation - given a number of video clips, summarise out representative and distinct behaviours. Although countless efforts have been dedicated into problems mentioned above, few works have attempted to analyse human behaviours in a semantic space. In this thesis, we define semantic spaces as a collection of high-dimensional Euclidean space in which semantic meaningful events, e.g. individual word, phrase and visual event, can be represented as vectors or distributions which are referred to as semantic representations. With the semantic space, semantic texts, visual events can be quantitatively compared by inner product, distance and divergence. The introduction of semantic spaces can bring lots of benefits for visual analysis. For example, discovering semantic representations for visual data can facilitate semantic meaningful video summarisation, retrieval and anomaly detection. Semantic space can also seamlessly bridge categories and datasets which are conventionally treated independent. This has encouraged the sharing of data and knowledge across categories and even datasets to improve recognition performance and reduce labelling effort. Moreover, semantic space has the ability to generalise learned model beyond known classes which is usually referred to as zero-shot learning. Nevertheless, discovering such a semantic space is non-trivial due to (1) semantic space is hard to define manually. Humans always have a good sense of specifying the semantic relatedness between visual and textual instances. But a measurable and finite semantic space can be difficult to construct with limited manual supervision. As a result, constructing semantic space from data is adopted to learn in an unsupervised manner; (2) It is hard to build a universal semantic space, i.e. this space is always contextual dependent. So it is important to build semantic space upon selected data such that it is always meaningful within the context. Even with a well constructed semantic space, challenges are still present including; (3) how to represent visual instances in the semantic space; and (4) how to mitigate the misalignment of visual feature and semantic spaces across categories and even datasets when knowledge/data are generalised. This thesis tackles the above challenges by exploiting data from different sources and building contextual semantic space with which data and knowledge can be transferred and shared to facilitate the general video behaviour analysis. To demonstrate the efficacy of semantic space for behaviour analysis, we focus on studying real world problems including surveillance behaviour analysis, zero-shot human action recognition and zero-shot crowd behaviour recognition with techniques specifically tailored for the nature of each problem. Firstly, for video surveillances scenes, we propose to discover semantic representations from the visual data in an unsupervised manner. This is due to the largely availability of unlabelled visual data in surveillance systems. By representing visual instances in the semantic space, data and annotations can be generalised to new events and even new surveillance scenes. Specifically, to detect abnormal events this thesis studies a geometrical alignment between semantic representation of events across scenes. Semantic actions can be thus transferred to new scenes and abnormal events can be detected in an unsupervised way. To model multiple surveillance scenes simultaneously, we show how to learn a shared semantic representation across a group of semantic related scenes through a multi-layer clustering of scenes. With multi-scene modelling we show how to improve surveillance tasks including scene activity profiling/understanding, crossscene query-by-example, behaviour classification, and video summarisation. Secondly, to avoid extremely costly and ambiguous video annotating, we investigate how to generalise recognition models learned from known categories to novel ones, which is often termed as zero-shot learning. To exploit the limited human supervision, e.g. category names, we construct the semantic space via a word-vector representation trained on large textual corpus in an unsupervised manner. Representation of visual instance in semantic space is obtained by learning a visual-to-semantic mapping. We notice that blindly applying the mapping learned from known categories to novel categories can cause bias and deteriorating the performance which is termed as domain shift. To solve this problem we employed techniques including semisupervised learning, self-training, hubness correction, multi-task learning and domain adaptation. All these methods in combine achieve state-of-the-art performance in zero-shot human action task. In the last, we study the possibility to re-use known and manually labelled semantic crowd attributes to recognise rare and unknown crowd behaviours. This task is termed as zero-shot crowd behaviours recognition. Crucially we point out that given the multi-labelled nature of semantic crowd attributes, zero-shot recognition can be improved by exploiting the co-occurrence between attributes. To summarise, this thesis studies methods for analysing video behaviours and demonstrates that exploring semantic spaces for video analysis is advantageous and more importantly enables multi-scene analysis and zero-shot learning beyond conventional learning strategies

    On Deep Machine Learning Methods for Anomaly Detection within Computer Vision

    Get PDF
    This thesis concerns deep learning approaches for anomaly detection in images. Anomaly detection addresses how to find any kind of pattern that differs from the regularities found in normal data and is receiving increasingly more attention in deep learning research. This is due in part to its wide set of potential applications ranging from automated CCTV surveillance to quality control across a range of industries. We introduce three original methods for anomaly detection applicable to two specific deployment scenarios. In the first, we detect anomalous activity in potentially crowded scenes through imagery captured via CCTV or other video recording devices. In the second, we segment defects in textures and demonstrate use cases representative of automated quality inspection on industrial production lines. In the context of detecting anomalous activity in scenes, we take an existing state-of-the-art method and introduce several enhancements including the use of a region proposal network for region extraction and a more information-preserving feature preprocessing strategy. This results in a simpler method that is significantly faster and suitable for real-time application. In addition, the increased efficiency facilitates building higher-dimensional models capable of improved anomaly detection performance, which we demonstrate on the pedestrian-based UCSD Ped2 dataset. In the context of texture defect detection, we introduce a method based on the idea of texture restoration that surpasses all state-of-the-art methods on the texture classes of the challenging MVTecAD dataset. In the same context, we additionally introduce a method that utilises transformer networks for future pixel and feature prediction. This novel method is able to perform competitive anomaly detection on most of the challenging MVTecAD dataset texture classes and illustrates both the promise and limitations of state-of-the-art deep learning transformers for the task of texture anomaly detection

    Algorithms for anomaly detection in video sequences through discriminative models

    Full text link
    Monitoring public areas with pedestrians is a task that has to be frequently accomplished by means of security systems. Nevertheless, manual detection of these anomalies is a tough task and it is easy to lose interesting events when many areas have to be attended. This is the main reason why the automated detection of these anomalies and interesting events in general has become an important source of research in the past years, specially in the eld of computer vision. Automated anomaly detection is still an open task even though that many methods have been proposed. One of the reasons is that a successful and accurate anomaly detection algorithm strongly depends on the context and the de nition of the anomalies to detect and the objects that produce them. The state of the art included in this work has been developed to make a complete study of all these aspects in detail, as well as a study of advantages and drawbacks of the main methods of the literature, helping to choose the best techniques and strategies for speci c surveillance scenarios. Since there is a great di culty to model every anomaly, we have decided to fashion the normality by means of Gaussian mixture models, which are relatively simple methods compared to others in the literature such as [1, 2], but that have shown potential at detecting anomalies. This can be observed on the methods proposed in [3] and [4]. We have decided to work at pixel level. Thus, to feed the model, discriminative descriptors are built based on a robust optical ow method, that has become the main source of motion and textural information of the scene. This fact makes this work di erent to other state-of-the-art approaches that work at pixel-level, whose optical ow is not capable to give such a detailed information of the scene. Finally, the evaluation of the nal algorithm is performed exhaustively from a baseline method, whose descriptor grows depending on the best results so far on a publicly available dataset. Detection results are compared with the state-of-the-art methods, concluding that our method is at the same level of the methods proposed in the literature

    Human and Group Activity Recognition from Video Sequences

    Get PDF
    A good solution to human activity recognition enables the creation of a wide variety of useful applications such as applications in visual surveillance, vision-based Human-Computer-Interaction (HCI) and gesture recognition. In this thesis, a graph based approach to human activity recognition is proposed which models spatio-temporal features as contextual space-time graphs. In this method, spatio-temporal gradient cuboids were extracted at significant regions of activity, and feature graphs (gradient, space-time, local neighbours, immediate neighbours) are constructed using the similarity matrix. The Laplacian representation of the graph is utilised to reduce the computational complexity and to allow the use of traditional statistical classifiers. A second methodology is proposed to detect and localise abnormal activities in crowded scenes. This approach has two stages: training and identification. During the training stage, specific human activities are identified and characterised by employing modelling of medium-term movement flow through streaklines. Each streakline is formed by multiple optical flow vectors that represent and track locally the movement in the scene. A dictionary of activities is recorded for a given scene during the training stage. During the testing stage, the consistency of each observed activity with those from the dictionary is verified using the Kullback-Leibler (KL) divergence. The anomaly detection of the proposed methodology is compared to state of the art, producing state of the art results for localising anomalous activities. Finally, we propose an automatic group activity recognition approach by modelling the interdependencies of group activity features over time. We propose to model the group interdependences in both motion and location spaces. These spaces are extended to time-space and time-movement spaces and modelled using Kernel Density Estimation (KDE). The recognition performance of the proposed methodology shows an improvement in recognition performance over state of the art results on group activity datasets
    corecore