41,770 research outputs found

    3D Trajectories for Action Recognition

    Get PDF
    International audienceRecent development in affordable depth sensors opens new possibilities in action recognition problem. Depth information improves skeleton detection, therefore many authors focused on analyzing pose for action recognition. But still skeleton detection is not robust and fail in more challenging scenarios, where sensor is placed outside of optimal working range and serious occlusions occur. In this paper we investigate state-of-the-art methods designed for RGB videos, which have proved their performance. Then we extend current state-of-the-art algorithms to benefit from depth information without need of skeleton detection. In this paper we propose two novel video descriptors. First combines motion and 3D information. Second improves performance on actions with low movement rate. We validate our approach on challenging MSR DailyActivty3D dataset

    Localized Trajectories for 2D and 3D Action Recognition

    Get PDF
    The Dense Trajectories concept is one of the most successful approaches in action recognition, suitable for scenarios involving a significant amount of motion. However, due to noise and background motion, many generated trajectories are irrelevant to the actual human activity and can potentially lead to performance degradation. In this paper, we propose Localized Trajectories as an improved version of Dense Trajectories where motion trajectories are clustered around human body joints provided by RGB-D cameras and then encoded by local Bag-of-Words. As a result, the Localized Trajectories concept provides an advanced discriminative representation of actions. Moreover, we generalize Localized Trajectories to 3D by using the depth modality. One of the main advantages of 3D Localized Trajectories is that they describe radial displacements that are perpendicular to the image plane. Extensive experiments and analysis were carried out on five different datasets

    Action recognition based on sparse motion trajectories

    Get PDF
    We present a method that extracts effective features in videos for human action recognition. The proposed method analyses the 3D volumes along the sparse motion trajectories of a set of interest points from the video scene. To represent human actions, we generate a Bag-of-Features (BoF) model based on extracted features, and finally a support vector machine is used to classify human activities. Evaluation shows that the proposed features are discriminative and computationally efficient. Our method achieves state-of-the-art performance with the standard human action recognition benchmarks, namely KTH and Weizmann datasets

    3-D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold

    Get PDF
    International audienceRecognizing human actions in 3D video sequences is an important open problem that is currently at the heart of many research domains including surveillance, natural interfaces and rehabilitation. However, the design and development of models for action recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, clothing and appearance. In this paper, we propose a new framework to extract a compact representation of a human action captured through a depth sensor, and enable accurate action recognition. The proposed solution develops on fitting a human skeleton model to acquired data so as to represent the 3D coordinates of the joints and their change over time as a trajectory in a suitable action space. Thanks to such a 3D joint-based framework, the proposed solution is capable to capture both the shape and the dynamics of the human body simultaneously. The action recognition problem is then formulated as the problem of computing the similarity between the shape of trajectories in a Riemannian manifold. Classification using kNN is finally performed on this manifold taking advantage of Riemannian geometry in the open curve shape space. Experiments are carried out on four representative benchmarks to demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Comparative results with state-of-the-art methods are reported

    From Dense 2D to Sparse 3D Trajectories for Human Action Detection and Recognition

    Get PDF

    Action recognition from RGB-D data

    Get PDF
    In recent years, action recognition based on RGB-D data has attracted increasing attention. Different from traditional 2D action recognition, RGB-D data contains extra depth and skeleton modalities. Different modalities have their own characteristics. This thesis presents seven novel methods to take advantages of the three modalities for action recognition. First, effective handcrafted features are designed and frequent pattern mining method is employed to mine the most discriminative, representative and nonredundant features for skeleton-based action recognition. Second, to take advantages of powerful Convolutional Neural Networks (ConvNets), it is proposed to represent spatio-temporal information carried in 3D skeleton sequences in three 2D images by encoding the joint trajectories and their dynamics into color distribution in the images, and ConvNets are adopted to learn the discriminative features for human action recognition. Third, for depth-based action recognition, three strategies of data augmentation are proposed to apply ConvNets to small training datasets. Forth, to take full advantage of the 3D structural information offered in the depth modality and its being insensitive to illumination variations, three simple, compact yet effective images-based representations are proposed and ConvNets are adopted for feature extraction and classification. However, both of previous two methods are sensitive to noise and could not differentiate well fine-grained actions. Fifth, it is proposed to represent a depth map sequence into three pairs of structured dynamic images at body, part and joint levels respectively through bidirectional rank pooling to deal with the issue. The structured dynamic image preserves the spatial-temporal information, enhances the structure information across both body parts/joints and different temporal scales, and takes advantages of ConvNets for action recognition. Sixth, it is proposed to extract and use scene flow for action recognition from RGB and depth data. Last, to exploit the joint information in multi-modal features arising from heterogeneous sources (RGB, depth), it is proposed to cooperatively train a single ConvNet (referred to as c-ConvNet) on both RGB features and depth features, and deeply aggregate the two modalities to achieve robust action recognition
    • …
    corecore