952 research outputs found
Semantically Informed Multiview Surface Refinement
We present a method to jointly refine the geometry and semantic segmentation
of 3D surface meshes. Our method alternates between updating the shape and the
semantic labels. In the geometry refinement step, the mesh is deformed with
variational energy minimization, such that it simultaneously maximizes
photo-consistency and the compatibility of the semantic segmentations across a
set of calibrated images. Label-specific shape priors account for interactions
between the geometry and the semantic labels in 3D. In the semantic
segmentation step, the labels on the mesh are updated with MRF inference, such
that they are compatible with the semantic segmentations in the input images.
Also, this step includes prior assumptions about the surface shape of different
semantic classes. The priors induce a tight coupling, where semantic
information influences the shape update and vice versa. Specifically, we
introduce priors that favor (i) adaptive smoothing, depending on the class
label; (ii) straightness of class boundaries; and (iii) semantic labels that
are consistent with the surface orientation. The novel mesh-based
reconstruction is evaluated in a series of experiments with real and synthetic
data. We compare both to state-of-the-art, voxel-based semantic 3D
reconstruction, and to purely geometric mesh refinement, and demonstrate that
the proposed scheme yields improved 3D geometry as well as an improved semantic
segmentation
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
Lifting GIS Maps into Strong Geometric Context for Scene Understanding
Contextual information can have a substantial impact on the performance of
visual tasks such as semantic segmentation, object detection, and geometric
estimation. Data stored in Geographic Information Systems (GIS) offers a rich
source of contextual information that has been largely untapped by computer
vision. We propose to leverage such information for scene understanding by
combining GIS resources with large sets of unorganized photographs using
Structure from Motion (SfM) techniques. We present a pipeline to quickly
generate strong 3D geometric priors from 2D GIS data using SfM models aligned
with minimal user input. Given an image resectioned against this model, we
generate robust predictions of depth, surface normals, and semantic labels. We
show that the precision of the predicted geometry is substantially more
accurate other single-image depth estimation methods. We then demonstrate the
utility of these contextual constraints for re-scoring pedestrian detections,
and use these GIS contextual features alongside object detection score maps to
improve a CRF-based semantic segmentation framework, boosting accuracy over
baseline models
Semantically Derived Geometric Constraints for {MVS} Reconstruction of Textureless Areas
Conventional multi-view stereo (MVS) approaches based on photo-consistency measures are generally robust, yet often fail in calculating valid depth pixel estimates in low textured areas of the scene. In this study, a novel approach is proposed to tackle this challenge by leveraging semantic priors into a PatchMatch-based MVS in order to increase confidence and support depth and normal map estimation. Semantic class labels on image pixels are used to impose class-specific geometric constraints during multiview stereo, optimising the depth estimation on weakly supported, textureless areas, commonly present in urban scenarios of building facades, indoor scenes, or aerial datasets. Detecting dominant shapes, e.g., planes, with RANSAC, an adjusted cost function is introduced that combines and weighs both photometric and semantic scores propagating, thus, more accurate depth estimates. Being adaptive, it fills in apparent information gaps and smoothing local roughness in problematic regions while at the same time preserves important details. Experiments on benchmark and custom datasets demonstrate the effectiveness of the presented approach
Spatially Coherent Geometric Class Labeling of Images and Its Applications
Automatic scene analysis is an active research area and is useful in many applications such as robotics and automation, industrial manufacturing, architectural design and multimedia. 3D structural information is one of the most important cues for scene analysis. In this thesis, we present a geometric labeling method to automatically extract rough 3D information from a single 2D image. Our method partitions an image scene into five geometric regions through labeling every image pixel as one of the five geometric classes (namely, “bottom”, “left ”, “center”, “right”, and “top” ). We formulate the geometric labeling problem as an energy minimization problem and optimize the energy with a graph cut based algorithm. In our energy function, we address the spatial consistency of the geometric labels in the scene while preserving discontinuities along image intensity edges. We also incorporate ordering constraints in our energy function. Ordering constraints specify the possible relative positional labels for neighbor pixels. For example, a pixel labeled as the “left” can not be the right of a pixel labeled as the “right” and a pixel labeled as the “bottom” can not be above a pixel labeled as the “top”. Ordering constraints arise naturally in a real scene. We observed that when ordering constraints are used, the commonly used graph-cut based «-expansion is more likely to get stuck in local minima. To overcome this, we developed new graph-cut moves which we call order-preserving moves. Unlike «-expansion which works for two labels in each move, order-preserving moves act on all labels. Although the global minimum is still not guaranteed, we will show that optimization with order-preserving moves is shown to perform significantly better than «-expansion. Experimental results show that it is possible to significantly increase the percentage of reasonably good labeling by promoting spatial consistency and incorporating ordering constraints. It is also shown that the order-preserving moves performs significantly better than the commonly used «-expansion when ordering constraints are used as there is a significantly improvement in computational efficiency and optimality while the improvement in accuracy of pixel labeling is also modest. in We also demonstrate the usefulness of the extracted 3D structure information of a scene in applications such as novel view generation, virtual scene walk-through, semantic segmentation, scene synthesis, and scene text extraction. We also show how we can apply this order-preserving moves for certain simple shape priors in graph-cut segmentation. Our geometric labeling method has the following main contributions: (i) We develop a new class of graph-cut moves called order-preserving moves, which performs significantly better than «-expansion when ordering constraints are used. (ii) We formulate the problem in a global optimization framework where we address the spatial consistency of labels in a scene by formulating an energy function which encourages spatial consistency between neighboring pixels while preserving discontinuities along image intensity edges. (iii) We incorporate relative ordering information about the labels in our energy function. (iv) We show that our ordering constraints can also be used in other applications such as object part segmentation. (v) We also show how the proposed order-preserving moves can be used for certain simple shape priors in graph-cut segmentation
Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View
We present Im2Pano3D, a convolutional neural network that generates a dense
prediction of 3D structure and a probability distribution of semantic labels
for a full 360 panoramic view of an indoor scene when given only a partial
observation (<= 50%) in the form of an RGB-D image. To make this possible,
Im2Pano3D leverages strong contextual priors learned from large-scale synthetic
and real-world indoor scenes. To ease the prediction of 3D structure, we
propose to parameterize 3D surfaces with their plane equations and train the
model to predict these parameters directly. To provide meaningful training
supervision, we use multiple loss functions that consider both pixel level
accuracy and global context consistency. Experiments demon- strate that
Im2Pano3D is able to predict the semantics and 3D structure of the unobserved
scene with more than 56% pixel accuracy and less than 0.52m average distance
error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S
- …