154 research outputs found

    Multi-view Geometric Constraints For Human Action Recognition And Tracking

    Get PDF
    Human actions are the essence of a human life and a natural product of the human mind. Analysis of human activities by a machine has attracted the attention of many researchers. This analysis is very important in a variety of domains including surveillance, video retrieval, human-computer interaction, athlete performance investigation, etc. This dissertation makes three major contributions to automatic analysis of human actions. First, we conjecture that the relationship between body joints of two actors in the same posture can be described by a 3D rigid transformation. This transformation simultaneously captures different poses and various sizes and proportions. As a consequence of this conjecture, we show that there exists a fundamental matrix between the imaged positions of the body joints of two actors, if they are in the same posture. Second, we propose a novel projection model for cameras moving at a constant velocity in 3D space, \emph cameras, and derive the Galilean fundamental matrix and apply it to human action recognition. Third, we propose a novel use for the invariant ratio of areas under an affine transformation and utilizing the epipolar geometry between two cameras for 2D model-based tracking of human body joints. In the first part of the thesis, we propose an approach to match human actions using semantic correspondences between human bodies. These correspondences are used to provide geometric constraints between multiple anatomical landmarks ( e.g. hands, shoulders, and feet) to match actions observed from different viewpoints and performed at different rates by actors of differing anthropometric proportions. The fact that the human body has approximate anthropometric proportion allows for innovative use of the machinery of epipolar geometry to provide constraints for analyzing actions performed by people of different anthropometric sizes, while ensuring that changes in viewpoint do not affect matching. A novel measure in terms of rank of matrix constructed only from image measurements of the locations of anatomical landmarks is proposed to ensure that similar actions are accurately recognized. Finally, we describe how dynamic time warping can be used in conjunction with the proposed measure to match actions in the presence of nonlinear time warps. We demonstrate the versatility of our algorithm in a number of challenging sequences and applications including action synchronization , odd one out, following the leader, analyzing periodicity etc. Next, we extend the conventional model of image projection to video captured by a camera moving at constant velocity. We term such moving camera Galilean camera. To that end, we derive the spacetime projection and develop the corresponding epipolar geometry between two Galilean cameras. Both perspective imaging and linear pushbroom imaging form specializations of the proposed model and we show how six different ``fundamental matrices including the classic fundamental matrix, the Linear Pushbroom (LP) fundamental matrix, and a fundamental matrix relating Epipolar Plane Images (EPIs) are related and can be directly recovered from a Galilean fundamental matrix. We provide linear algorithms for estimating the parameters of the the mapping between videos in the case of planar scenes. For applying fundamental matrix between Galilean cameras to human action recognition, we propose a measure that has two important properties. First property makes it possible to recognize similar actions, if their execution rates are linearly related. Second property allows recognizing actions in video captured by Galilean cameras. Thus, the proposed algorithm guarantees that actions can be correctly matched despite changes in view, execution rate, anthropometric proportions of the actor, and even if the camera moves with constant velocity. Finally, we also propose a novel 2D model based approach for tracking human body parts during articulated motion. The human body is modeled as a 2D stick figure of thirteen body joints and an action is considered as a sequence of these stick figures. Given the locations of these joints in every frame of a model video and the first frame of a test video, the joint locations are automatically estimated throughout the test video using two geometric constraints. First, invariance of the ratio of areas under an affine transformation is used for initial estimation of the joint locations in the test video. Second, the epipolar geometry between the two cameras is used to refine these estimates. Using these estimated joint locations, the tracking algorithm determines the exact location of each landmark in the test video using the foreground silhouettes. The novelty of the proposed approach lies in the geometric formulation of human action models, the combination of the two geometric constraints for body joints prediction, and the handling of deviations in anthropometry of individuals, viewpoints, execution rate, and style of performing action. The proposed approach does not require extensive training and can easily adapt to a wide variety of articulated actions

    Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

    Get PDF
    In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159." arXiv admin note: substantial text overlap with arXiv:1805.0386

    Significant Body Point Labeling and Tracking

    Full text link

    Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

    Get PDF
    International audienceIn this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, SFEW and AFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches

    Uniscale and multiscale gait recognition in realistic scenario

    Get PDF
    The performance of a gait recognition method is affected by numerous challenging factors that degrade its reliability as a behavioural biometrics for subject identification in realistic scenario. Thus for effective visual surveillance, this thesis presents five gait recog- nition methods that address various challenging factors to reliably identify a subject in realistic scenario with low computational complexity. It presents a gait recognition method that analyses spatio-temporal motion of a subject with statistical and physical parameters using Procrustes shape analysis and elliptic Fourier descriptors (EFD). It introduces a part- based EFD analysis to achieve invariance to carrying conditions, and the use of physical parameters enables it to achieve invariance to across-day gait variation. Although spatio- temporal deformation of a subject’s shape in gait sequences provides better discriminative power than its kinematics, inclusion of dynamical motion characteristics improves the iden- tification rate. Therefore, the thesis presents a gait recognition method which combines spatio-temporal shape and dynamic motion characteristics of a subject to achieve robust- ness against the maximum number of challenging factors compared to related state-of-the- art methods. A region-based gait recognition method that analyses a subject’s shape in image and feature spaces is presented to achieve invariance to clothing variation and carry- ing conditions. To take into account of arbitrary moving directions of a subject in realistic scenario, a gait recognition method must be robust against variation in view. Hence, the the- sis presents a robust view-invariant multiscale gait recognition method. Finally, the thesis proposes a gait recognition method based on low spatial and low temporal resolution video sequences captured by a CCTV. The computational complexity of each method is analysed. Experimental analyses on public datasets demonstrate the efficacy of the proposed methods

    Marker-less human body part detection, labelling and tracking for human activity recognition

    Get PDF
    This thesis focuses on the development of a real-time and cost effective marker-less computer vision method for significant body point or part detection (i.e., the head, arm, shoulder, knee, and feet), labelling and tracking, and its application to activity recognition. This work comprises of three parts: significantbody point detection and labelling, significant body point tracking, and activity recognition. Implicit body models are proposed based on human anthropometry, kinesiology, and human vision inspired criteria to detect and label significant body points. The key idea of the proposed method is to fit the knowledge from the implicit body models rather than fitting the predefined models in order to detect and label significant body points. The advantages of this method are that it does not require manual annotation, an explicit fitting procedure, and a training (learning) phase, and it is applicable to humans with different anthropometric proportions. The experimental results show that the proposed method robustly detects and labels significant body points in various activities of two different (low and high) resolution data sets. Furthermore, a Particle Filter with memory and feedback is proposed that combines temporal information of the previous observation and estimation with feedback to track significant body points in occlusion. In addition, in order to overcome the problem presented by the most occluded body part, i.e., the arm, a Motion Flow method is proposed. This method considers the human arm as a pendulum attached to the shoulder joint and defines conjectures to track the arm since it is the most occluded body part. The former method is invoked as default and the latter is used as per a user's choice. The experimental results show that the two proposed methods, i.e., Particle Filter and Motion Flow methods, robustly track significant body points in various activities of the above-mentioned two data sets and also enhance the performance of significant body point detection. A hierarchical relaxed partitioning system is then proposed that employs features extracted from the significant body points for activity recognition when multiple overlaps exist in the feature space. The working principle of the proposed method is based on the relaxed hierarchy (postpone uncertain decisions) and hierarchical strategy (group similar or confusing classes) while partitioning each class at different levels of the hierarchy. The advantages of the proposed method lie in its real-time speed, ease of implementation and extension, and non-intensive training. The experimental results show that it acquires valuable features and outperforms relevant state-of-the-art methods while comparable to other methods, i.e., the holistic and local feature approaches. In this context, the contribution of this thesis is three-fold: Pioneering a method for automated human body part detection and labelling. Developing methods for tracking human body parts in occlusion. Designing a method for robust and efficient human action recognition

    3D human action recognition and motion analysis using selective representations

    Get PDF
    With the advent of marker-based motion capture, attempts have been made to recognise and quantify attributes of “type”, “content” and “behaviour” from the motion data. Current work exists to obtain quick and easy identification of human motion for use in multiple settings, such as healthcare and gaming by using activity monitors, wearable technology and low-cost accelerometers. Yet, analysing human motion and generating representative features to enable recognition and analysis in an efficient and comprehensive manner has proved elusive thus far. This thesis proposes practical solutions that are based on insights from clinicians, and learning attributes from motion capture data itself. This culminates in an application framework that learns the type, content and behaviour of human motion for recognition, quantitative clinical analysis and outcome measures. While marker-based motion capture has many uses, it also has major limitations that are explored in this thesis, not least in terms of hardware costs and practical utilisation. These drawbacks have led to the creation of depth sensors capable of providing robust, accurate and low-cost solution to detecting and tracking anatomical landmarks on the human body, without physical markers. This advancement has led researchers to develop low-cost solutions to important healthcare tasks, such as human motion analysis as a clinical aid in prevention care. In this thesis a variety of obstacles in handling markerless motion capture are identified and overcome by employing parameterisation of Axis- Angles, applying Euler Angles transformations to Exponential Maps, and appropriate distance measures between postures. While developing an efficient, usable and deployable application framework for clinicians, this thesis introduces techniques to recognise, analyse and quantify human motion in the context of identifying age-related change and mobility. The central theme of this thesis is the creation of discriminative representations of the human body using novel encoding and extraction approaches usable for both marker-based and marker-less motion capture data. The encoding of the human pose is modelled based on the spatial-temporal characteristics to generate a compact, efficient parameterisation. This combination allows for the detection of multiple known and unknown motions in real-time. However, in the context of benchmarking a major drawback exists, the lack of a clinically valid and relevant dataset to enable benchmarking. Without a dataset of this type, it is difficult to validated algorithms aimed at healthcare application. To this end, this thesis introduces a dataset that will enable the computer science community to benchmark healthcare-related algorithms
    corecore