458 research outputs found
PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single View
3D plane recovery from a single image can usually be divided into several
subtasks of plane detection, segmentation, parameter estimation and possibly
depth estimation. Previous works tend to solve this task by either extending
the RCNN-based segmentation network or the dense pixel embedding-based
clustering framework. However, none of them tried to integrate above related
subtasks into a unified framework but treat them separately and sequentially,
which we suspect is potentially a main source of performance limitation for
existing approaches. Motivated by this finding and the success of query-based
learning in enriching reasoning among semantic entities, in this paper, we
propose PlaneRecTR, a Transformer-based architecture, which for the first time
unifies all subtasks related to single-view plane recovery with a single
compact model. Extensive quantitative and qualitative experiments demonstrate
that our proposed unified learning achieves mutual benefits across subtasks,
obtaining a new state-of-the-art performance on public ScanNet and NYUv2-Plane
datasets. Codes are available at https://github.com/SJingjia/PlaneRecTR.Comment: To be published in Proceedings of IEEE International Conference on
Computer Vision (ICCV 2023). Camera Ready Version. Codes:
https://github.com/SJingjia/PlaneRecTR , Video: https://youtu.be/YBB7totHGJ
RoSI: Recovering 3D Shape Interiors from Few Articulation Images
The dominant majority of 3D models that appear in gaming, VR/AR, and those we
use to train geometric deep learning algorithms are incomplete, since they are
modeled as surface meshes and missing their interior structures. We present a
learning framework to recover the shape interiors (RoSI) of existing 3D models
with only their exteriors from multi-view and multi-articulation images. Given
a set of RGB images that capture a target 3D object in different articulated
poses, possibly from only few views, our method infers the interior planes that
are observable in the input images. Our neural architecture is trained in a
category-agnostic manner and it consists of a motion-aware multi-view analysis
phase including pose, depth, and motion estimations, followed by interior plane
detection in images and 3D space, and finally multi-view plane fusion. In
addition, our method also predicts part articulations and is able to realize
and even extrapolate the captured motions on the target 3D object. We evaluate
our method by quantitative and qualitative comparisons to baselines and
alternative solutions, as well as testing on untrained object categories and
real image inputs to assess its generalization capabilities
X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth Estimation with Cross-Task Distillation and Boundary Correction
Segmentation of planar regions from a single RGB image is a particularly
important task in the perception of complex scenes. To utilize both visual and
geometric properties in images, recent approaches often formulate the problem
as a joint estimation of planar instances and dense depth through feature
fusion mechanisms and geometric constraint losses. Despite promising results,
these methods do not consider cross-task feature distillation and perform
poorly in boundary regions. To overcome these limitations, we propose X-PDNet,
a framework for the multitask learning of plane instance segmentation and depth
estimation with improvements in the following two aspects. Firstly, we
construct the cross-task distillation design which promotes early information
sharing between dual-tasks for specific task improvements. Secondly, we
highlight the current limitations of using the ground truth boundary to develop
boundary regression loss, and propose a novel method that exploits depth
information to support precise boundary region segmentation. Finally, we
manually annotate more than 3000 images from Stanford 2D-3D-Semantics dataset
and make available for evaluation of plane instance segmentation. Through the
experiments, our proposed methods prove the advantages, outperforming the
baseline with large improvement margins in the quantitative results on the
ScanNet and the Stanford 2D-3D-S dataset, demonstrating the effectiveness of
our proposals.Comment: Accepted to BMVC 202
3D detection of roof sections from a single satellite image and application to LOD2-building reconstruction
Reconstructing urban areas in 3D out of satellite raster images has been a
long-standing and challenging goal of both academical and industrial research.
The rare methods today achieving this objective at a Level Of Details rely
on procedural approaches based on geometry, and need stereo images and/or LIDAR
data as input. We here propose a method for urban 3D reconstruction named
KIBS(\textit{Keypoints Inference By Segmentation}), which comprises two novel
features: i) a full deep learning approach for the 3D detection of the roof
sections, and ii) only one single (non-orthogonal) satellite raster image as
model input. This is achieved in two steps: i) by a Mask R-CNN model performing
a 2D segmentation of the buildings' roof sections, and after blending these
latter segmented pixels within the RGB satellite raster image, ii) by another
identical Mask R-CNN model inferring the heights-to-ground of the roof
sections' corners via panoptic segmentation, unto full 3D reconstruction of the
buildings and city. We demonstrate the potential of the KIBS method by
reconstructing different urban areas in a few minutes, with a Jaccard index for
the 2D segmentation of individual roof sections of and on
our two data sets resp., and a height's mean error of such correctly segmented
pixels for the 3D reconstruction of m and m on our two data sets
resp., hence within the LOD2 precision range
- …