326 research outputs found
U4D: Unsupervised 4D Dynamic Scene Understanding
We introduce the first approach to solve the challenging problem of
unsupervised 4D visual scene understanding for complex dynamic scenes with
multiple interacting people from multi-view video. Our approach simultaneously
estimates a detailed model that includes a per-pixel semantically and
temporally coherent reconstruction, together with instance-level segmentation
exploiting photo-consistency, semantic and motion information. We further
leverage recent advances in 3D pose estimation to constrain the joint semantic
instance segmentation and 4D temporally coherent reconstruction. This enables
per person semantic instance segmentation of multiple interacting people in
complex dynamic scenes. Extensive evaluation of the joint visual scene
understanding framework against state-of-the-art methods on challenging indoor
and outdoor sequences demonstrates a significant (approx 40%) improvement in
semantic segmentation, reconstruction and scene flow accuracy.Comment: To appear in IEEE International Conference in Computer Vision ICCV
201
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
- …