Search CORE

16,087 research outputs found

Space-Time Joint Multi-layer Segmentation and Depth Estimation

Author: Guillemaut J-Y
Hilton A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Video-based segmentation and reconstruction tech-niques are predominantly extensions of techniques devel-oped for the image domain treating each frame indepen-dently. These approaches ignore the temporal information contained in input videos which can lead to incoherent re-sults. We propose a framework for joint segmentation and reconstruction which explicitly enforces temporal consis-tency by formulating the problem as an energy minimisation generalised to groups of frames. The main idea is to use op-tical flow in combination with a confidence measure to im-pose robust temporal smoothness constraints. Optimisation is performed using recent advances in the field of graph-cuts combined with practical considerations to reduce run-time and memory consumption. Experimental results with real sequences containing rapid motion demonstrate that the method is able to improve spatio-temporal coherence both in terms of segmentation and reconstruction without introducing any degradation in regions where optical flow fails due to fast motion. 1

CiteSeerX

Surrey Research Insight

Temporally coherent 4D reconstruction of complex dynamic scenes

Author: Guillemaut Jean-Yves
Hilton Adrian
Kim Hansung
Mustafa Armin
Publication venue
Publication date: 01/01/2016
Field of study

This paper presents an approach for reconstruction of 4D temporally coherent models of complex dynamic scenes. No prior knowledge is required of scene structure or camera calibration allowing reconstruction from multiple moving cameras. Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dynamic objects. Temporal coherence is exploited to overcome visual ambiguities resulting in improved reconstruction of complex scenes. Robust joint segmentation and reconstruction of dynamic objects is achieved by introducing a geodesic star convexity constraint. Comparative evaluation is performed on a variety of unstructured indoor and outdoor dynamic scenes with hand-held cameras and multiple people. This demonstrates reconstruction of complete temporally coherent 4D scene models with improved nonrigid object segmentation and shape reconstruction.Comment: To appear in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016 . Video available at: https://www.youtube.com/watch?v=bm_P13_-Ds

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Surrey Research Insight

Learning to Reconstruct People in Clothing from a Single RGB Camera

Author: Alldieck T.
Bhatnagar B.
Magnor M.
Pons-Moll G.
Theobalt C.
Publication venue
Publication date: 01/01/2019
Field of study

We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach

Dynamic Body VSLAM with Semantic Constraints

Author: Chari Visesh
Krishna K. Madhava
Reddy N. Dinesh
Singhal Prateek
Publication venue
Publication date: 27/04/2015
Field of study

Image based reconstruction of urban environments is a challenging problem that deals with optimization of large number of variables, and has several sources of errors like the presence of dynamic objects. Since most large scale approaches make the assumption of observing static scenes, dynamic objects are relegated to the noise modeling section of such systems. This is an approach of convenience since the RANSAC based framework used to compute most multiview geometric quantities for static scenes naturally confine dynamic objects to the class of outlier measurements. However, reconstructing dynamic objects along with the static environment helps us get a complete picture of an urban environment. Such understanding can then be used for important robotic tasks like path planning for autonomous navigation, obstacle tracking and avoidance, and other areas. In this paper, we propose a system for robust SLAM that works in both static and dynamic environments. To overcome the challenge of dynamic objects in the scene, we propose a new model to incorporate semantic constraints into the reconstruction algorithm. While some of these constraints are based on multi-layered dense CRFs trained over appearance as well as motion cues, other proposed constraints can be expressed as additional terms in the bundle adjustment optimization process that does iterative refinement of 3D structure and camera / object motion trajectories. We show results on the challenging KITTI urban dataset for accuracy of motion segmentation and reconstruction of the trajectory and shape of moving objects relative to ground truth. We are able to show average relative error reduction by a significant amount for moving object trajectory reconstruction relative to state-of-the-art methods like VISO 2, as well as standard bundle adjustment algorithms

arXiv.org e-Print Archive

BodyNet: Volumetric Inference of 3D Human Body Shapes

Author: A Newell
Catalin Ionescu
DJ Butler
F Bogo
FS Nooruddin
H Rhodin
IB Barbosa
J Nocedal
J Yang
ME Yumer
ME Yumer
T Lewiner
Y. LeCun
Publication venue
Publication date: 18/08/2018
Field of study

Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.Comment: Appears in: European Conference on Computer Vision 2018 (ECCV 2018). 27 page

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server