Search CORE

10 research outputs found

Multi-View Stereo with Single-View Semantic Mesh Refinement

Author: Ciccone Marco
Matteucci Matteo
Romanoni Andrea
Visin Francesco
Publication venue
Publication date: 01/01/2017
Field of study

While 3D reconstruction is a well-established and widely explored research topic, semantic 3D reconstruction has only recently witnessed an increasing share of attention from the Computer Vision community. Semantic annotations allow in fact to enforce strong class-dependent priors, as planarity for ground and walls, which can be exploited to refine the reconstruction often resulting in non-trivial performance improvements. State-of-the art methods propose volumetric approaches to fuse RGB image data with semantic labels; even if successful, they do not scale well and fail to output high resolution meshes. In this paper we propose a novel method to refine both the geometry and the semantic labeling of a given mesh. We refine the mesh geometry by applying a variational method that optimizes a composite energy made of a state-of-the-art pairwise photo-metric term and a single-view term that models the semantic consistency between the labels of the 3D mesh and those of the segmented images. We also update the semantic labeling through a novel Markov Random Field (MRF) formulation that, together with the classical data and smoothness terms, takes into account class-specific priors estimated directly from the annotated mesh. This is in contrast to state-of-the-art methods that are typically based on handcrafted or learned priors. We are the first, jointly with the very recent and seminal work of [M. Blaha et al arXiv:1706.08336, 2017], to propose the use of semantics inside a mesh refinement framework. Differently from [M. Blaha et al arXiv:1706.08336, 2017], which adopts a more classical pairwise comparison to estimate the flow of the mesh, we apply a single-view comparison between the semantically annotated image and the current 3D mesh labels; this improves the robustness in case of noisy segmentations.Comment: {\pounds}D Reconstruction Meets Semantic, ICCV worksho

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Semantically Informed Multiview Surface Refinement

Author: Blaha Maros
Oswald Martin R.
Pollefeys Marc
Richard Audrey
Rothermel Mathias
Sattler Torsten
Schindler Konrad
Wegner Jan D.
Publication venue
Publication date: 01/01/2017
Field of study

We present a method to jointly refine the geometry and semantic segmentation of 3D surface meshes. Our method alternates between updating the shape and the semantic labels. In the geometry refinement step, the mesh is deformed with variational energy minimization, such that it simultaneously maximizes photo-consistency and the compatibility of the semantic segmentations across a set of calibrated images. Label-specific shape priors account for interactions between the geometry and the semantic labels in 3D. In the semantic segmentation step, the labels on the mesh are updated with MRF inference, such that they are compatible with the semantic segmentations in the input images. Also, this step includes prior assumptions about the surface shape of different semantic classes. The priors induce a tight coupling, where semantic information influences the shape update and vice versa. Specifically, we introduce priors that favor (i) adaptive smoothing, depending on the class label; (ii) straightness of class boundaries; and (iii) semantic labels that are consistent with the surface orientation. The novel mesh-based reconstruction is evaluated in a series of experiments with real and synthetic data. We compare both to state-of-the-art, voxel-based semantic 3D reconstruction, and to purely geometric mesh refinement, and demonstrate that the proposed scheme yields improved 3D geometry as well as an improved semantic segmentation

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

OctNetFusion: Learning Depth Fusion from Data

Author: Bischof Horst
Geiger Andreas
Riegler Gernot
Ulusoy Ali Osman
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we present a learning based approach to depth fusion, i.e., dense 3D reconstruction from multiple depth images. The most common approach to depth fusion is based on averaging truncated signed distance functions, which was originally proposed by Curless and Levoy in 1996. While this method is simple and provides great results, it is not able to reconstruct (partially) occluded surfaces and requires a large number frames to filter out sensor noise and outliers. Motivated by the availability of large 3D model repositories and recent advances in deep learning, we present a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps. Our learning based method significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression. By learning the structure of real world 3D objects and scenes, our approach is further able to reconstruct occluded regions and to fill in gaps in the reconstruction. We demonstrate that our learning based approach outperforms both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric fusion. Further, we demonstrate state-of-the-art 3D shape completion results.Comment: 3DV 2017, https://github.com/griegler/octnetfusio

arXiv.org e-Print Archive

MPG.PuRe

Semantic Segmentation of 3D Textured Meshes for Urban Scene Analysis

Author: Alliez Pierre
Lafarge Florent
Rouhani Mohammad
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

International audienceClassifying 3D measurement data has become a core problem in photogram-metry and 3D computer vision, since the rise of modern multiview geometry techniques, combined with affordable range sensors. We introduce a Markov Random Field-based approach for segmenting textured meshes generated via multi-view stereo into urban classes of interest. The input mesh is first partitioned into small clusters, referred to as superfacets, from which geometric and photometric features are computed. A random forest is then trained to predict the class of each superfacet as well as its similarity with the neighboring superfacets. Similarity is used to assign the weights of the Markov Random Field pairwise-potential and accounts for contextual information between the classes. The experimental results illustrate the efficacy and accuracy of the proposed framework

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Three-dimensional reconstruction and measurements of zebrafish larvae from high-throughput axial-view in vivo imaging

Author: Guo Y.
Spaink H.P.
Veneman W.J.
Verbeek F.J.
Publication venue: 'The Optical Society'
Publication date: 01/01/2017
Field of study

Animal science

Leiden University Scholary Publications

Recommended from our members

Inertial-aided Visual Perception of Geometry and Semantics

Author: Fei Xiaohan
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

We describe components of a visual perception system to understand the geometry and semantics of the three-dimensional scene by utilizing monocular cameras and inertial measurement units (IMUs). The use of the two sensor modalities is motivated by the wide availability of the camera-IMU sensor packages present in mobile devices from phones to cars, and their complementary sensing capabilities: IMUs can track the motion of the sensor platform over a short period of time accurately, and provide a scaled and gravity-aligned global reference frame, while cameras can capture rich photometric signatures of the scene, and provide relative motion constraints between images up to scale. We first show that visual 3D reconstruction can be improved by leveraging the global orientation frame -- easily inferred from inertials. In the gravity-aligned global orientation frame, a shape prior can be imposed in depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or orthogonal to it. Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the power of utilizing inertials in 3D reconstruction. The global reference provided by inertials is not only gravity-aligned but also scaled, which is exploited in depth completion: We describe a method to infer dense metric depth from camera motion and sparse depth as estimated using a visual-inertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to “self-supervision,” measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud. We also launch the first visual-inertial + depth dataset (dubbed ``VOID''), which we hope will foster additional exploration into combining the complementary strengths of visual and inertial sensors. To compare our method to prior work, we adopt the unsupervised KITTI depth completion benchmark, and show state-of-the-art performance on it.In addition to dense geometry, the camera-IMU sensor package can also be used to recover the semantics of the scene. We present two methods to augment a point cloud map with class-labeled objects represented in the form of either scaled and oriented bounding boxes or CAD models. The tradeoff of the two shape representation resides in their generality and capability to model detailed structures. While being more generic, 3D bounding boxes fail to model the details of the objects, whereas CAD models preserve the finest shape details but require more computation and are limited to previously seen objects. Nevertheless, both methods populate an unknown environment with 3D objects placed in a Euclidean reference frame inferred causally and on-line using monocular video along with inertial sensors. Besides, both methods include bottom-up and top-down components, whereby deep networks trained for detection provide likelihood scores for object hypotheses provided by a nonlinear filter, whose state serves as memory. We test our methods on KITTI and SceneNN datasets, and also introduce the VISMA dataset, which contains ground truth pose, point-cloud map, and object models, along with time-stamped inertial measurements.To reduce the drift of the visual-inertial SLAM system -- a building block of all the visual perception systems we have built, we introduce an efficient loop closure detection approach based on the idea of hierarchical pooling of image descriptors. We also open-sourced a full-fledged SLAM system equipped with mapping and loop closure capabilities. The code is publicly available at https://github.com/ucla-vision/xivo

eScholarship - University of California