43,011 research outputs found

    P-CNN: Pose-based CNN Features for Action Recognition

    Get PDF
    This work targets human action recognition in video. While recent methods typically represent actions by statistics of local video features, here we argue for the importance of a representation derived from human pose. To this end we propose a new Pose-based Convolutional Neural Network descriptor (P-CNN) for action recognition. The descriptor aggregates motion and appearance information along tracks of human body parts. We investigate different schemes of temporal aggregation and experiment with P-CNN features obtained both for automatically estimated and manually annotated human poses. We evaluate our method on the recent and challenging JHMDB and MPII Cooking datasets. For both datasets our method shows consistent improvement over the state of the art.Comment: ICCV, December 2015, Santiago, Chil

    The DICEMAN description schemes for still images and video sequences

    Get PDF
    To address the problem of visual content description, two Description Schemes (DSs) developed within the context of a European ACTS project known as DICEMAN, are presented. The DSs, designed based on an analogy with well-known tools for document description, describe both the structure and semantics of still images and video sequences. The overall structure of both DSs including the various sub-DSs and descriptors (Ds) of which they are composed is described. In each case, the hierarchical sub-DS for describing structure can be constructed using automatic (or semi-automatic) image/video analysis tools. The hierarchical sub-DSs for describing the semantics, however, are constructed by a user. The integration of the two DSs into a video indexing application currently under development in DICEMAN is also briefly described.Peer ReviewedPostprint (published version

    Towards real-time body pose estimation for presenters in meeting environments

    Get PDF
    This paper describes a computer vision-based approach to body pose estimation.\ud The algorithm can be executed in real-time and processes low resolution,\ud monocular image sequences. A silhouette is extracted and matched against a\ud projection of a 16 DOF human body model. In addition, skin color is used to\ud locate hands and head. No detailed human body model is needed. We evaluate the\ud approach both quantitatively using synthetic image sequences and qualitatively\ud on video test data of short presentations. The algorithm is developed with the\ud aim of using it in the context of a meeting room where the poses of a presenter\ud have to be estimated. The results can be applied in the domain of virtual\ud environments
    • …
    corecore