2,215 research outputs found

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Understanding Vehicular Traffic Behavior from Video: A Survey of Unsupervised Approaches

    Full text link
    Recent emerging trends for automatic behavior analysis and understanding from infrastructure video are reviewed. Research has shifted from high-resolution estimation of vehicle state and instead, pushed machine learning approaches to extract meaningful patterns in aggregates in an unsupervised fashion. These patterns represent priors on observable motion, which can be utilized to describe a scene, answer behavior questions such as where is a vehicle going, how many vehicles are performing the same action, and to detect an abnormal event. The review focuses on two main methods for scene description, trajectory clustering and topic modeling. Example applications that utilize the behavioral modeling techniques are also presented. In addition, the most popular public datasets for behavioral analysis are presented. Discussion and comment on future directions in the field are also provide

    Identity Retention of Multiple Objects under Extreme Occlusion Scenarios using Feature Descriptors

    Get PDF
    Identity assignment and retention needs multiple object detection and tracking. It plays a vital role in behavior analysis and gait recognition. The objective of Multiple Object Tracking (MOT) is to detect, track and retain identities from an image sequence. An occlusion is a major resistance in identity retention. It is a challenging task to handle occlusion while tracking varying number of person in the complex scene using a monocular camera. In MOT, occlusion remains a challenging task in real world applications. This paper uses Gaussian Mixture Model (GMM) and Hungarian Assignment (HA) for person detection and tracking. We propose an identity retention algorithm using Rotation Scale and Translation (RST) invariant feature descriptors. In addition, a segmentation based optimum demerge handling algorithm is proposed to retain proper identities under occlusion. The proposed approach is evaluated on a standard surveillance dataset sequences and it achieves 97 % object detection accuracy and 85% tracking accuracy for PETS-S2.L1 sequence and 69.7% accuracy as well as 72.3% precision for Town Centre Sequence

    MoWLD: a robust motion image descriptor for violence detection

    Full text link
    © 2015, Springer Science+Business Media New York. Automatic violence detection from video is a hot topic for many video surveillance applications. However, there has been little success in designing an algorithm that can detect violence in surveillance videos with high performance. Existing methods typically apply the Bag-of-Words (BoW) model on local spatiotemporal descriptors. However, traditional spatiotemporal features are not discriminative enough, and also the BoW model roughly assigns each feature vector to only one visual word and therefore ignores the spatial relationships among the features. To tackle these problems, in this paper we propose a novel Motion Weber Local Descriptor (MoWLD) in the spirit of the well-known WLD and make it a powerful and robust descriptor for motion images. We extend the WLD spatial descriptions by adding a temporal component to the appearance descriptor, which implicitly captures local motion information as well as low-level image appear information. To eliminate redundant and irrelevant features, the non-parametric Kernel Density Estimation (KDE) is employed on the MoWLD descriptor. In order to obtain more discriminative features, we adopt the sparse coding and max pooling scheme to further process the selected MoWLDs. Experimental results on three benchmark datasets have demonstrated the superiority of the proposed approach over the state-of-the-arts

    An Unsupervised Framework for Online Spatiotemporal Detection of Activities of Daily Living by Hierarchical Activity Models

    Get PDF
    International audienceAutomatic detection and analysis of human activities captured by various sensors (e.g. 1 sequence of images captured by RGB camera) play an essential role in various research fields in order 2 to understand the semantic content of a captured scene. The main focus of the earlier studies has 3 been widely on supervised classification problem, where a label is assigned for a given short clip. 4 Nevertheless, in real-world scenarios, such as in Activities of Daily Living (ADL), the challenge is 5 to automatically browse long-term (days and weeks) stream of videos to identify segments with 6 semantics corresponding to the model activities and their temporal boundaries. This paper proposes 7 an unsupervised solution to address this problem by generating hierarchical models that combine 8 global trajectory information with local dynamics of the human body. Global information helps in 9 modeling the spatiotemporal evolution of long-term activities and hence, their spatial and temporal 10 localization. Moreover, the local dynamic information incorporates complex local motion patterns of 11 daily activities into the models. Our proposed method is evaluated using realistic datasets captured 12 from observation rooms in hospitals and nursing homes. The experimental data on a variety of 13 monitoring scenarios in hospital settings reveals how this framework can be exploited to provide 14 timely diagnose and medical interventions for cognitive disorders such as Alzheimer's disease. The 15 obtained results show that our framework is a promising attempt capable of generating activity 16 models without any supervision. 1
    corecore