Search CORE

1,092 research outputs found

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Author: Black Michael J.
Bolkart Timo
Choutas Vasileios
Ghorbani Nima
Osman Ahmed A. A.
Pavlakos Georgios
Tzionas Dimitrios
Publication venue
Publication date: 01/01/2019
Field of study

To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.

Author: A Saxena
B Li
C Plagemann
F Liu
H Kim
K Karsch
K Karsch
K Matzen
M Ruder
N Silberman
N Srivastava
O Özyeşil
P Hedman
R Garg
R Hartley
S Li
T Rhee
Y Furukawa
Y Zhang
Publication venue
Publication date: 12/09/2018
Field of study

Recent work on depth estimation up to now has only focused on projective images ignoring 360o content which is now increasingly and more easily produced. We show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360o datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated with acquiring high quality 360o datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360o via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360o images. We show promising results in our synthesized data as well as in unseen realistic images

Crossref

ZENODO

Real Time Object Detection, Tracking, and Distance and Motion Estimation based on Deep Learning: Application to Smart Mobility

Author: Atahouet Amphani
Chen Zhihao
Decoux Benoit
Ertaud Jean-Yves
Khemmar Redouane
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/07/2019
Field of study

International audienceIn this paper, we will introduce our object detection, localization and tracking system for smart mobility applications like traffic road and railway environment. Firstly, an object detection and tracking approach was firstly carried out within two deep learning approaches: You Only Look Once (YOLO) V3 and Single Shot Detector (SSD). A comparison between the two methods allows us to identify their applicability in the traffic environment. Both the performances in road and in railway environments were evaluated. Secondly, object distance estimation based on Monodepth algorithm was developed. This model is trained on stereo images dataset but its inference uses monocular images. As the output data, we have a disparity map that we combine with the output of object detection. To validate our approach, we have tested two models with different backbones including VGG and ResNet used with two datasets : Cityscape and KITTI. As the last step of our approach, we have developed a new method-based SSD to analyse the behavior of pedestrian and vehicle by tracking their movements even in case of no detection on some images of a sequence. We have developed an algorithm based on the coordinates of the output bounding boxes of the SSD algorithm. The objective is to determine if the trajectory of a pedestrian or vehicle can lead to a dangerous situations. The whole of development is tested in real vehicle traffic conditions in Rouen city center, and with videos taken by embedded cameras along the Rouen tramway

HAL - Normandie Université

Crossref

Coupled Depth Learning

Author: Baig Mohammad Haris
Torresani Lorenzo
Publication venue
Publication date: 09/02/2016
Field of study

In this paper we propose a method for estimating depth from a single image using a coarse to fine approach. We argue that modeling the fine depth details is easier after a coarse depth map has been computed. We express a global (coarse) depth map of an image as a linear combination of a depth basis learned from training examples. The depth basis captures spatial and statistical regularities and reduces the problem of global depth estimation to the task of predicting the input-specific coefficients in the linear combination. This is formulated as a regression problem from a holistic representation of the image. Crucially, the depth basis and the regression function are {\bf coupled} and jointly optimized by our learning scheme. We demonstrate that this results in a significant improvement in accuracy compared to direct regression of depth pixel values or approaches learning the depth basis disjointly from the regression function. The global depth estimate is then used as a guidance by a local refinement method that introduces depth details that were not captured at the global level. Experiments on the NYUv2 and KITTI datasets show that our method outperforms the existing state-of-the-art at a considerably lower computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation

arXiv.org e-Print Archive

Crossref