8 research outputs found

    Discovery and recognition of motion primitives in human activities

    Get PDF
    We present a novel framework for the automatic discovery and recognition of motion primitives in videos of human activities. Given the 3D pose of a human in a video, human motion primitives are discovered by optimizing the `motion flux', a quantity which captures the motion variation of a group of skeletal joints. A normalization of the primitives is proposed in order to make them invariant with respect to a subject anatomical variations and data sampling rate. The discovered primitives are unknown and unlabeled and are unsupervisedly collected into classes via a hierarchical non-parametric Bayes mixture model. Once classes are determined and labeled they are further analyzed for establishing models for recognizing discovered primitives. Each primitive model is defined by a set of learned parameters. Given new video data and given the estimated pose of the subject appearing on the video, the motion is segmented into primitives, which are recognized with a probability given according to the parameters of the learned models. Using our framework we build a publicly available dataset of human motion primitives, using sequences taken from well-known motion capture datasets. We expect that our framework, by providing an objective way for discovering and categorizing human motion, will be a useful tool in numerous research fields including video analysis, human inspired motion generation, learning by demonstration, intuitive human-robot interaction, and human behavior analysis

    Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network

    Get PDF
    The widespread adoption of city surveillance systems has led to an increase in the use of surveillance videos for maintaining public safety and security. This thesis tackles the problem of detecting anomalous events in surveillance videos. The goal is to automatically identify abnormal events by learning from both normal and abnormal videos. Most of previous works consider any deviation from learned normal patterns as an anomaly, which may not always be valid since the same activity could be normal or abnormal under different circumstances. To address this issue, the thesis utilizes the Two-Stream Inflated 3D (I3D) Convolutional Networks to extract spatial and temporal video features and demonstrates how it outperforms the 3D Convolutional Network (C3D) used in prior work as feature extractor. To avoid annotating abnormal activities in training videos, a weakly supervised anomaly detection model is implemented based on the Multiple Instance Learning (MIL) framework. The model considers normal and abnormal videos as bags and video clips as instances, learns a ranking model to predict high anomaly scores for video clips containing anomalies. The thesis further shows that the choice of features input, such as concatenating RGB and flow features, and careful choice of optimization settings, such as optimizer, can significantly improve the performance of the anomaly detection model on some evaluation metrics

    Multi-modal human aggression detection

    Get PDF
    This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of complementary audio and video cues to disambiguate scene activity in real-life environments. From the video side, the system uses overlapping cameras to track persons in 3D and to extract features regarding the limb motion relative to the torso. From the audio side, it classifies instances of speech, screaming, singing, and kicking-object. The audio and video cues are fused with contextual cues (interaction, auxiliary objects); a Dynamic Bayesian Network (DBN) produces an estimate of the ambient aggression level. Our prototype system is validated on a realistic set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting. (C) 2015 Elsevier Inc. All rights reserved
    corecore