6 research outputs found

    Modeling Spatial Layout of Features for Real World Scenario RGB-D Action Recognition

    Get PDF
    International audienceDepth information improves skeleton detection, thus skeleton based methods are the most popular methods in RGB-D action recognition. But skeleton detection working range is limited in terms of distance and viewpoint. Most of the skeleton based action recognition methods ignore fact that skeleton may be missing. Local points-of-interest (POIs) do not require skeleton detection. But they fail if they cannot detect enough POIs e.g. amount of motion in action is low. Most of them ignore spatial-location of features. We cope with the above problems by employing people detector instead of skeleton detector. We propose method to encode spatial-layout of features inside bounding box. We also introduce descriptor which encodes static information for actions with low amount of motion. We validate our approach on: 3 public data-sets. The results show that our method is competitive to skeleton based methods, while requiring much simpler people detection instead of skeleton detection

    Deep-Temporal LSTM for Daily Living Action Recognition

    Get PDF
    In this paper, we propose to improve the traditional use of RNNs by employing a many to many model for video classification. We analyze the importance of modeling spatial layout and temporal encoding for daily living action recognition. Many RGB methods focus only on short term temporal information obtained from optical flow. Skeleton based methods on the other hand show that modeling long term skeleton evolution improves action recognition accuracy. In this work, we propose a deep-temporal LSTM architecture which extends standard LSTM and allows better encoding of temporal information. In addition, we propose to fuse 3D skeleton geometry with deep static appearance. We validate our approach on public available CAD60, MSRDailyActivity3D and NTU-RGB+D, achieving competitive performance as compared to the state-of-the art.Comment: Submitted in conferenc

    Deep-Temporal LSTM for Daily Living Action Recognition

    Get PDF
    International audienceIn this paper, we propose to improve the traditional use of RNNs by employing a many to many model for video classification. We analyze the importance of modeling spatial layout and temporal encoding for daily living action recognition. Many RGB methods focus only on short term temporal information obtained from optical flow. Skeleton based methods on the other hand show that modeling long term skeleton evolution improves action recognition accuracy. In this work, we propose a deep-temporal LSTM architecture which extends standard LSTM and allows better encoding of temporal information. In addition, we propose to fuse 3D skeleton geometry with deep static appearance. We validate our approach on public available CAD60, MSRDai-lyActivity3D and NTU-RGB+D, achieving competitive performance as compared to the state-of-the art

    A New Hybrid Architecture for Human Activity Recognition from RGB-D videos

    Get PDF
    International audienceActivity Recognition from RGB-D videos is still an open problem due to the presence of large varieties of actions. In this work, we propose a new architecture by mixing a high level handcrafted strategy and machine learning techniques. We propose a novel two level fusion strategy to combine features from different cues to address the problem of large variety of actions. As similar actions are common in daily living activities, we also propose a mechanism for similar action discrimination. We validate our approach on four public datasets, CAD-60, CAD-120, MSRDailyActivity3D, and NTU-RGB+D improving the state-of-the-art results on them

    Spatio-Temporal Grids for Daily Living Action Recognition

    Get PDF
    International audienceThis paper address the recognition of short-term daily living actions from RGB-D videos. The existing approaches ignore spatio-temporal contextual relationships in the action videos. So, we propose to explore the spatial layout to better model the appearance. In order to encode temporal information, we divide the action sequence into temporal grids. We address the challenge of subject invariance by applying clustering on the appearance features and velocity features to partition the temporal grids. We validate our approach on four public datasets. The results show that our method is competitive with the state-of-the-art

    Action Recognition based on a mixture of RGB and Depth based skeleton

    Get PDF
    International audienceIn this paper, we study how different skeleton extraction methods affect the performance of action recognition. As shown in previous work skeleton information can be exploited for action recognition. Nevertheless, skeleton detection problem is already hard and very often it is difficult to obtain reliable skeleton information from videos. In this paper , we compare two skeleton detection methods: the depth-map based method used with Kinect camera and RGB based method that uses Deep Convolutional Neural Networks. In order to balance the pros and cons of mentioned skeleton detection methods w.r.t. action recognition task, we propose a fusion of classifiers trained based on each skeleton detection method. Such fusion lead to performance improvement. We validate our approach on CAD-60 and MSRDailyActiv-ity3D, achieving state-of-the-art results
    corecore