1,092 research outputs found
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
To facilitate the analysis of human actions, interactions and emotions, we
compute a 3D model of human body pose, hand pose, and facial expression from a
single monocular image. To achieve this, we use thousands of 3D scans to train
a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with
fully articulated hands and an expressive face. Learning to regress the
parameters of SMPL-X directly from images is challenging without paired images
and 3D ground truth. Consequently, we follow the approach of SMPLify, which
estimates 2D features and then optimizes model parameters to fit the features.
We improve on SMPLify in several significant ways: (1) we detect 2D features
corresponding to the face, hands, and feet and fit the full SMPL-X model to
these; (2) we train a new neural network pose prior using a large MoCap
dataset; (3) we define a new interpenetration penalty that is both fast and
accurate; (4) we automatically detect gender and the appropriate body models
(male, female, or neutral); (5) our PyTorch implementation achieves a speedup
of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to
both controlled images and images in the wild. We evaluate 3D accuracy on a new
curated dataset comprising 100 images with pseudo ground-truth. This is a step
towards automatic expressive human capture from monocular RGB data. The models,
code, and data are available for research purposes at
https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201
OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas.
Recent work on depth estimation up to now has only focused on projective images ignoring 360o content which is now increasingly and more easily produced. We show that monocular depth estimation models trained on traditional images produce sub-optimal results on omnidirectional images, showcasing the need for training directly on 360o datasets, which however, are hard to acquire. In this work, we circumvent the challenges associated with acquiring high quality 360o datasets with ground truth depth annotations, by re-using recently released large scale 3D datasets and re-purposing them to 360o via rendering. This dataset, which is considerably larger than similar projective datasets, is publicly offered to the community to enable future research in this direction. We use this dataset to learn in an end-to-end fashion the task of depth estimation from 360o images. We show promising results in our synthesized data as well as in unseen realistic images
Real Time Object Detection, Tracking, and Distance and Motion Estimation based on Deep Learning: Application to Smart Mobility
International audienceIn this paper, we will introduce our object detection, localization and tracking system for smart mobility applications like traffic road and railway environment. Firstly, an object detection and tracking approach was firstly carried out within two deep learning approaches: You Only Look Once (YOLO) V3 and Single Shot Detector (SSD). A comparison between the two methods allows us to identify their applicability in the traffic environment. Both the performances in road and in railway environments were evaluated. Secondly, object distance estimation based on Monodepth algorithm was developed. This model is trained on stereo images dataset but its inference uses monocular images. As the output data, we have a disparity map that we combine with the output of object detection. To validate our approach, we have tested two models with different backbones including VGG and ResNet used with two datasets : Cityscape and KITTI. As the last step of our approach, we have developed a new method-based SSD to analyse the behavior of pedestrian and vehicle by tracking their movements even in case of no detection on some images of a sequence. We have developed an algorithm based on the coordinates of the output bounding boxes of the SSD algorithm. The objective is to determine if the trajectory of a pedestrian or vehicle can lead to a dangerous situations. The whole of development is tested in real vehicle traffic conditions in Rouen city center, and with videos taken by embedded cameras along the Rouen tramway
Coupled Depth Learning
In this paper we propose a method for estimating depth from a single image
using a coarse to fine approach. We argue that modeling the fine depth details
is easier after a coarse depth map has been computed. We express a global
(coarse) depth map of an image as a linear combination of a depth basis learned
from training examples. The depth basis captures spatial and statistical
regularities and reduces the problem of global depth estimation to the task of
predicting the input-specific coefficients in the linear combination. This is
formulated as a regression problem from a holistic representation of the image.
Crucially, the depth basis and the regression function are {\bf coupled} and
jointly optimized by our learning scheme. We demonstrate that this results in a
significant improvement in accuracy compared to direct regression of depth
pixel values or approaches learning the depth basis disjointly from the
regression function. The global depth estimate is then used as a guidance by a
local refinement method that introduces depth details that were not captured at
the global level. Experiments on the NYUv2 and KITTI datasets show that our
method outperforms the existing state-of-the-art at a considerably lower
computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation
- …