Search CORE

5,853 research outputs found

Descriptive temporal template features for visual motion recognition

Author: Aggarwal
Bobick
Bradski
Cristianini
Davis
Farmer
Green
Hongying Meng
Meng
Moeslund
Nick Pears
Ogata
Stauffer
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

In this paper, a human action recognition system is proposed. The system is based on new, descriptive `temporal template' features in order to achieve high-speed recognition in real-time, embedded applications. The limitations of the well known `Motion History Image' (MHI) temporal template are addressed and a new `Motion History Histogram' (MHH) feature is proposed to capture more motion information in the video. MHH not only provides rich motion information, but also remains computationally inexpensive. To further improve classification performance, we combine both MHI and MHH into a low dimensional feature vector which is processed by a support vector machine (SVM). Experimental results show that our new representation can achieve a significant improvement in the performance of human action recognition over existing comparable methods, which use 2D temporal template based representations

Brunel University Research Archive

BodyNet: Volumetric Inference of 3D Human Body Shapes

Author: A Newell
Catalin Ionescu
DJ Butler
F Bogo
FS Nooruddin
H Rhodin
IB Barbosa
J Nocedal
J Yang
ME Yumer
ME Yumer
T Lewiner
Y. LeCun
Publication venue
Publication date: 18/08/2018
Field of study

Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.Comment: Appears in: European Conference on Computer Vision 2018 (ECCV 2018). 27 page

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server