103,987 research outputs found

    Improved Multi-resolution Analysis of the Motion Patterns in Video for Human Action Classification

    Get PDF
    The automatic recognition of human actions in video is of great interest in many applications such as automated surveillance, content-based video summarization, video search, and indexing. The problem is challenging due to a wide range of variations among the motion pattern of a given action such as walking across different subjects and the low variations among similar motions such as running and jogging. This thesis has three contributions in a discriminative bottom-up framework to improve the multi-resolution analysis of the motion patterns in video for better recognition of human actions. The first contribution of this thesis is the introduction of a novel approach for a robust local motion feature detection in video. To this end, four different multi-resolution temporally causal and asymmetric filters of log Gaussian, scale-derivative Gaussian, Poisson, and asymmetric sinc are introduced. The performance of these filters is compared with the widely used multi-resolution Gabor filter in a common framework for detection of local salient motions. The features obtained from the asymmetric filtering are more precise and more robust under geometric deformations such as view change or affine transformations. Moreover, they provide higher classification accuracy when they are used with a standard bag-of-words representation of actions and a single discriminative classifier. The experimental results show that the asymmetric sinc performs the best. The Poisson and the scale-derivative Gaussian perform better than log Gaussian and that better than the symmetric temporal Gabor filter. The second contribution of this thesis is the introduction of an efficient action representation. The observation is that the salient features at different spatial and temporal scales characterize different motion information. A multi-resolution analysis of the motion characteristic should be representative of different actions. A multi-resolution action signature provides a more discriminative video representation. The third contribution of this thesis is on the classification of different human actions. To this end, an ensemble of classifiers in a multiple classifier systems (MCS) framework with a parallel topology is utilized. This framework can fully benefit from the multi-resolution characteristics of the motion patterns in the human actions. The classification combination concept of the MCS has been then extended to address two problems in the configuration setting of a recognition framework, namely the choice of distance metric for comparing the action representations and the size of the codebook by which an action is represented. This implication of MCS at multiple stages of the recognition pipeline provides a multi-stage MCS framework which outperforms the existing methods which use a single classifier. Based on the experimental results of the local feature detection and the action classification, the multi-stage MCS framework, which uses the multi-scale features obtained from the temporal asymmetric sinc filtering, is recommended for the task of human action recognition in video.1 yea

    Histogram of Oriented Principal Components for Cross-View Action Recognition

    Full text link
    Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process pointclouds for cross-view action recognition from unknown and unseen views. We propose the Histogram of Oriented Principal Components (HOPC) descriptor that is robust to noise, viewpoint, scale and action speed variations. At a 3D point, HOPC is computed by projecting the three scaled eigenvectors of the pointcloud within its local spatio-temporal support volume onto the vertices of a regular dodecahedron. HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D pointcloud sequences so that view-invariant STK descriptors (or Local HOPC descriptors) at these key locations only are used for action recognition. We also propose a global descriptor computed from the normalized spatio-temporal distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the performance of our proposed descriptors against nine existing techniques on two cross-view and three single-view human action recognition datasets. The Experimental results show that our techniques provide significant improvement over state-of-the-art methods

    Machine Analysis of Facial Expressions

    Get PDF
    No abstract
    • …
    corecore