11,505 research outputs found

    Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming

    Get PDF
    In this paper we consider the problem of human pose estimation in real-world videos of swimmers. Swimming channels allow filming swimmers simultaneously above and below the water surface with a single stationary camera. These recordings can be used to quantitatively assess the athletes' performance. The quantitative evaluation, so far, requires manual annotations of body parts in each video frame. We therefore apply the concept of CNNs in order to automatically infer the required pose information. Starting with an off-the-shelf architecture, we develop extensions to leverage activity information - in our case the swimming style of an athlete - and the continuous nature of the video recordings. Our main contributions are threefold: (a) We apply and evaluate a fine-tuned Convolutional Pose Machine architecture as a baseline in our very challenging aquatic environment and discuss its error modes, (b) we propose an extension to input swimming style information into the fully convolutional architecture and (c) modify the architecture for continuous pose estimation in videos. With these additions we achieve reliable pose estimates with up to +16% more correct body joint detections compared to the baseline architecture.Comment: 10 pages, 9 figures, accepted at WACV 201

    Multi-set canonical correlation analysis for 3D abnormal gait behaviour recognition based on virtual sample generation

    Get PDF
    Small sample dataset and two-dimensional (2D) approach are challenges to vision-based abnormal gait behaviour recognition (AGBR). The lack of three-dimensional (3D) structure of the human body causes 2D based methods to be limited in abnormal gait virtual sample generation (VSG). In this paper, 3D AGBR based on VSG and multi-set canonical correlation analysis (3D-AGRBMCCA) is proposed. First, the unstructured point cloud data of gait are obtained by using a structured light sensor. A 3D parametric body model is then deformed to fit the point cloud data, both in shape and posture. The features of point cloud data are then converted to a high-level structured representation of the body. The parametric body model is used for VSG based on the estimated body pose and shape data. Symmetry virtual samples, pose-perturbation virtual samples and various body-shape virtual samples with multi-views are generated to extend the training samples. The spatial-temporal features of the abnormal gait behaviour from different views, body pose and shape parameters are then extracted by convolutional neural network based Long Short-Term Memory model network. These are projected onto a uniform pattern space using deep learning based multi-set canonical correlation analysis. Experiments on four publicly available datasets show the proposed system performs well under various conditions

    Learning to Refine Human Pose Estimation

    Full text link
    Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still exist a lot of difficult cases where even the state-of-the-art models fail to correctly localize all body joints. This motivates the need for an additional refinement step that addresses these challenging cases and can be easily applied on top of any existing method. In this work, we introduce a pose refinement network (PoseRefiner) which takes as input both the image and a given pose estimate and learns to directly predict a refined pose by jointly reasoning about the input-output space. In order for the network to learn to refine incorrect body joint predictions, we employ a novel data augmentation scheme for training, where we model "hard" human pose cases. We evaluate our approach on four popular large-scale pose estimation benchmarks such as MPII Single- and Multi-Person Pose Estimation, PoseTrack Pose Estimation, and PoseTrack Pose Tracking, and report systematic improvement over the state of the art.Comment: To appear in CVPRW (2018). Workshop: Visual Understanding of Humans in Crowd Scene and the 2nd Look Into Person Challenge (VUHCS-LIP

    Mirror, mirror on the wall, tell me, is the error small?

    Full text link
    Do object part localization methods produce bilaterally symmetric results on mirror images? Surprisingly not, even though state of the art methods augment the training set with mirrored images. In this paper we take a closer look into this issue. We first introduce the concept of mirrorability as the ability of a model to produce symmetric results in mirrored images and introduce a corresponding measure, namely the \textit{mirror error} that is defined as the difference between the detection result on an image and the mirror of the detection result on its mirror image. We evaluate the mirrorability of several state of the art algorithms in two of the most intensively studied problems, namely human pose estimation and face alignment. Our experiments lead to several interesting findings: 1) Surprisingly, most of state of the art methods struggle to preserve the mirror symmetry, despite the fact that they do have very similar overall performance on the original and mirror images; 2) the low mirrorability is not caused by training or testing sample bias - all algorithms are trained on both the original images and their mirrored versions; 3) the mirror error is strongly correlated to the localization/alignment error (with correlation coefficients around 0.7). Since the mirror error is calculated without knowledge of the ground truth, we show two interesting applications - in the first it is used to guide the selection of difficult samples and in the second to give feedback in a popular Cascaded Pose Regression method for face alignment.Comment: 8 pages, 9 figure

    FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

    Full text link
    Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presence, or a five-level intensity according to the Facial Action Coding System. To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D Facial Animation. One hundred and twenty-two participants, including children, young adults and elderly people, were recorded in real-world conditions. In addition, 99,356 frames were manually labeled using Expression Quantitative Tool developed by us to quantify 9 symmetrical FACS action units, 10 asymmetrical (unilateral) FACS action units, 2 symmetrical FACS action descriptors and 2 asymmetrical FACS action descriptors, and each action unit or action descriptor is well-annotated with a floating point number between 0 and 1. To provide a baseline for use in future research, a benchmark for the regression of action unit values based on Convolutional Neural Networks are presented. We also demonstrate the potential of our FEAFA dataset for 3D facial animation. Almost all state-of-the-art algorithms for facial animation are achieved based on 3D face reconstruction. We hence propose a novel method that drives virtual characters only based on action unit value regression of the 2D video frames of source actors.Comment: 9 pages, 7 figure

    Wing and body motion during flight initiation in Drosophila revealed by automated visual tracking

    Get PDF
    The fruit fly Drosophila melanogaster is a widely used model organism in studies of genetics, developmental biology and biomechanics. One limitation for exploiting Drosophila as a model system for behavioral neurobiology is that measuring body kinematics during behavior is labor intensive and subjective. In order to quantify flight kinematics during different types of maneuvers, we have developed a visual tracking system that estimates the posture of the fly from multiple calibrated cameras. An accurate geometric fly model is designed using unit quaternions to capture complex body and wing rotations, which are automatically fitted to the images in each time frame. Our approach works across a range of flight behaviors, while also being robust to common environmental clutter. The tracking system is used in this paper to compare wing and body motion during both voluntary and escape take-offs. Using our automated algorithms, we are able to measure stroke amplitude, geometric angle of attack and other parameters important to a mechanistic understanding of flapping flight. When compared with manual tracking methods, the algorithm estimates body position within 4.4±1.3% of the body length, while body orientation is measured within 6.5±1.9 deg. (roll), 3.2±1.3 deg. (pitch) and 3.4±1.6 deg. (yaw) on average across six videos. Similarly, stroke amplitude and deviation are estimated within 3.3 deg. and 2.1 deg., while angle of attack is typically measured within 8.8 deg. comparing against a human digitizer. Using our automated tracker, we analyzed a total of eight voluntary and two escape take-offs. These sequences show that Drosophila melanogaster do not utilize clap and fling during take-off and are able to modify their wing kinematics from one wingstroke to the next. Our approach should enable biomechanists and ethologists to process much larger datasets than possible at present and, therefore, accelerate insight into the mechanisms of free-flight maneuvers of flying insects

    Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling

    Get PDF
    We present a method for simultaneously estimating 3D human pose and body shape from a sparse set of wide-baseline camera views. We train a symmetric convolutional autoencoder with a dual loss that enforces learning of a latent representation that encodes skeletal joint positions, and at the same time learns a deep representation of volumetric body shape. We harness the latter to up-scale input volumetric data by a factor of 4×4 \times, whilst recovering a 3D estimate of joint positions with equal or greater accuracy than the state of the art. Inference runs in real-time (25 fps) and has the potential for passive human behaviour monitoring where there is a requirement for high fidelity estimation of human body shape and pose
    • …
    corecore