44,095 research outputs found
P-CNN: Pose-based CNN Features for Action Recognition
This work targets human action recognition in video. While recent methods
typically represent actions by statistics of local video features, here we
argue for the importance of a representation derived from human pose. To this
end we propose a new Pose-based Convolutional Neural Network descriptor (P-CNN)
for action recognition. The descriptor aggregates motion and appearance
information along tracks of human body parts. We investigate different schemes
of temporal aggregation and experiment with P-CNN features obtained both for
automatically estimated and manually annotated human poses. We evaluate our
method on the recent and challenging JHMDB and MPII Cooking datasets. For both
datasets our method shows consistent improvement over the state of the art.Comment: ICCV, December 2015, Santiago, Chil
Towards real-time body pose estimation for presenters in meeting environments
This paper describes a computer vision-based approach to body pose estimation.\ud
The algorithm can be executed in real-time and processes low resolution,\ud
monocular image sequences. A silhouette is extracted and matched against a\ud
projection of a 16 DOF human body model. In addition, skin color is used to\ud
locate hands and head. No detailed human body model is needed. We evaluate the\ud
approach both quantitatively using synthetic image sequences and qualitatively\ud
on video test data of short presentations. The algorithm is developed with the\ud
aim of using it in the context of a meeting room where the poses of a presenter\ud
have to be estimated. The results can be applied in the domain of virtual\ud
environments
The DICEMAN description schemes for still images and video sequences
To address the problem of visual content description, two Description Schemes (DSs) developed within the context of a European ACTS project known as DICEMAN, are presented. The DSs, designed based on an analogy with well-known tools for document description, describe both the structure and semantics of still images and video
sequences. The overall structure of both DSs including the various sub-DSs and descriptors (Ds) of which they are composed is described. In each case, the hierarchical sub-DS for describing structure can be constructed using
automatic (or semi-automatic) image/video analysis tools. The hierarchical sub-DSs for describing the semantics, however, are constructed by a user. The integration of the two DSs into a video indexing application currently
under development in DICEMAN is also briefly described.Peer ReviewedPostprint (published version
- …