14 research outputs found

    Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

    Get PDF
    Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for Image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatio-temporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos

    Kernelized Multiview Projection for Robust Action Recognition

    Get PDF
    Conventional action recognition algorithms adopt a single type of feature or a simple concatenation of multiple features. In this paper, we propose to better fuse and embed different feature representations for action recognition using a novel spectral coding algorithm called Kernelized Multiview Projection (KMP). Computing the kernel matrices from different features/views via time-sequential distance learning, KMP can encode different features with different weights to achieve a low-dimensional and semantically meaningful subspace where the distribution of each view is sufficiently smooth and discriminative. More crucially, KMP is linear for the reproducing kernel Hilbert space, which allows it to be competent for various practical applications. We demonstrate KMP’s performance for action recognition on five popular action datasets and the results are consistently superior to state-of-the-art techniques

    Dynamic behavior analysis via structured rank minimization

    Get PDF
    Human behavior and affect is inherently a dynamic phenomenon involving temporal evolution of patterns manifested through a multiplicity of non-verbal behavioral cues including facial expressions, body postures and gestures, and vocal outbursts. A natural assumption for human behavior modeling is that a continuous-time characterization of behavior is the output of a linear time-invariant system when behavioral cues act as the input (e.g., continuous rather than discrete annotations of dimensional affect). Here we study the learning of such dynamical system under real-world conditions, namely in the presence of noisy behavioral cues descriptors and possibly unreliable annotations by employing structured rank minimization. To this end, a novel structured rank minimization method and its scalable variant are proposed. The generalizability of the proposed framework is demonstrated by conducting experiments on 3 distinct dynamic behavior analysis tasks, namely (i) conflict intensity prediction, (ii) prediction of valence and arousal, and (iii) tracklet matching. The attained results outperform those achieved by other state-of-the-art methods for these tasks and, hence, evidence the robustness and effectiveness of the proposed approach

    Mhad: Multi-human action dataset

    No full text
    This paper presents a framework for a multi-action recognition method. In this framework, we introduce a new approach to detect and recognize the action of several persons within one scene. Also, considering the scarcity of related data, we provide a new data set involving many persons performing different actions in the same video. Our multi-action recognition method is based on a three-dimensional convolution neural network, and it involves a preprocessing phase to prepare the data to be recognized using the 3DCNN model. The new representation of data consists in extracting each person?s sequence during its presence in the scene. Then, we analyze each sequence to detect the actions in it. The experimental results proved to be accurate, efficient, and robust in real-time multi-human action recognition.This publication was made by NPRP Grant# NPRP8-140-2-065 from the Qatar National Research Fund (a member of the Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu

    Human activity recognition-based path planning for autonomous vehicles

    No full text

    Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition

    Get PDF
    Action recognition in videos is a challenging task due to the complexity of the spatio-temporal patterns to model and the difficulty to acquire and learn on large quantities of video data. Deep learning, although a breakthrough for Image classification and showing promise for videos, has still not clearly superseded action recognition methods using hand-crafted features, even when training on massive datasets. In this paper, we introduce hybrid video classification architectures based on carefully designed unsupervised representations of hand-crafted spatio-temporal features classified by supervised deep networks. As we show in our experiments on five popular benchmarks for action recognition, our hybrid model combines the best of both worlds: it is data efficient (trained on 150 to 10000 short clips) and yet improves significantly on the state of the art, including recent deep models trained on millions of manually labelled images and videos

    Human Activity Identification in Smart Daily Environments

    No full text
    Research in human activity recognition (HAR) benefits many applications such as intelligent surveillance systems to track humans’ abnormal activities. It could also be applied to robots to understand human activity, which improves smart home efficiency and usability. This chapter aims to accurately recognize different sports types in the Sports Video in the Wild data-set (SVW) employing transfer learning. The data-set consists of noisy and similar classes shot in daily environments, not in controlled lab environments. Heretofore, different methods have been used and developed for this purpose. Transfer learning is the process of using pretrained neural networks. The experimental results on different splits of the data-set, size and pre-trained models show that accuracy of 80.7% is achievable. In another experiment, we have used the famous UCF101 dataset which is collected from YouTube and trained a convolutional neural network (CNN) with batch normalization (BN). The achieved accuracy for the test data-set is around 91.2%. One application of the proposed system is to integrate it with a smart home platform to identify sports activities of individuals and track their progress
    corecore