37 research outputs found
Holistically-Attracted Wireframe Parsing
This paper presents a fast and parsimonious parsing method to accurately and
robustly detect a vectorized wireframe in an input image with a single forward
pass. The proposed method is end-to-end trainable, consisting of three
components: (i) line segment and junction proposal generation, (ii) line
segment and junction matching, and (iii) line segment and junction
verification. For computing line segment proposals, a novel exact dual
representation is proposed which exploits a parsimonious geometric
reparameterization for line segments and forms a holistic 4-dimensional
attraction field map for an input image. Junctions can be treated as the
"basins" in the attraction field. The proposed method is thus called
Holistically-Attracted Wireframe Parser (HAWP). In experiments, the proposed
method is tested on two benchmarks, the Wireframe dataset, and the YorkUrban
dataset. On both benchmarks, it obtains state-of-the-art performance in terms
of accuracy and efficiency. For example, on the Wireframe dataset, compared to
the previous state-of-the-art method L-CNN, it improves the challenging mean
structural average precision (msAP) by a large margin ( absolute
improvements) and achieves 29.5 FPS on single GPU ( relative
improvement). A systematic ablation study is performed to further justify the
proposed method.Comment: Accepted by CVPR 202
Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised Learning
This article presents Holistically-Attracted Wireframe Parsing (HAWP), a
method for geometric analysis of 2D images containing wireframes formed by line
segments and junctions. HAWP utilizes a parsimonious Holistic Attraction (HAT)
field representation that encodes line segments using a closed-form 4D
geometric vector field. The proposed HAWP consists of three sequential
components empowered by end-to-end and HAT-driven designs: (1) generating a
dense set of line segments from HAT fields and endpoint proposals from
heatmaps, (2) binding the dense line segments to sparse endpoint proposals to
produce initial wireframes, and (3) filtering false positive proposals through
a novel endpoint-decoupled line-of-interest aligning (EPD LOIAlign) module that
captures the co-occurrence between endpoint proposals and HAT fields for better
verification. Thanks to our novel designs, HAWPv2 shows strong performance in
fully supervised learning, while HAWPv3 excels in self-supervised learning,
achieving superior repeatability scores and efficient training (24 GPU hours on
a single GPU). Furthermore, HAWPv3 exhibits a promising potential for wireframe
parsing in out-of-distribution images without providing ground truth labels of
wireframes.Comment: Journal extension of arXiv:2003.01663; Accepted by IEEE TPAMI; Code
is available at https://github.com/cherubicxn/haw
Volumetric Wireframe Parsing from Neural Attraction Fields
The primal sketch is a fundamental representation in Marr's vision theory,
which allows for parsimonious image-level processing from 2D to 2.5D
perception. This paper takes a further step by computing 3D primal sketch of
wireframes from a set of images with known camera poses, in which we take the
2D wireframes in multi-view images as the basis to compute 3D wireframes in a
volumetric rendering formulation. In our method, we first propose a NEural
Attraction (NEAT) Fields that parameterizes the 3D line segments with
coordinate Multi-Layer Perceptrons (MLPs), enabling us to learn the 3D line
segments from 2D observation without incurring any explicit feature
correspondences across views. We then present a novel Global Junction
Perceiving (GJP) module to perceive meaningful 3D junctions from the NEAT
Fields of 3D line segments by optimizing a randomly initialized
high-dimensional latent array and a lightweight decoding MLP. Benefitting from
our explicit modeling of 3D junctions, we finally compute the primal sketch of
3D wireframes by attracting the queried 3D line segments to the 3D junctions,
significantly simplifying the computation paradigm of 3D wireframe parsing. In
experiments, we evaluate our approach on the DTU and BlendedMVS datasets with
promising performance obtained. As far as we know, our method is the first
approach to achieve high-fidelity 3D wireframe parsing without requiring
explicit matching.Comment: Technical report; Video can be found at https://youtu.be/qtBQYbOpVp
DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients
Line segments are ubiquitous in our human-made world and are increasingly
used in vision tasks. They are complementary to feature points thanks to their
spatial extent and the structural information they provide. Traditional line
detectors based on the image gradient are extremely fast and accurate, but lack
robustness in noisy images and challenging conditions. Their learned
counterparts are more repeatable and can handle challenging images, but at the
cost of a lower accuracy and a bias towards wireframe lines. We propose to
combine traditional and learned approaches to get the best of both worlds: an
accurate and robust line detector that can be trained in the wild without
ground truth lines. Our new line segment detector, DeepLSD, processes images
with a deep network to generate a line attraction field, before converting it
to a surrogate image gradient magnitude and angle, which is then fed to any
existing handcrafted line detector. Additionally, we propose a new optimization
tool to refine line segments based on the attraction field and vanishing
points. This refinement improves the accuracy of current deep detectors by a
large margin. We demonstrate the performance of our method on low-level line
detection metrics, as well as on several downstream tasks using multiple
challenging datasets. The source code and models are available at
https://github.com/cvg/DeepLSD.Comment: Accepted at CVPR 202
AirLine: Efficient Learnable Line Detection with Local Edge Voting
Line detection is widely used in many robotic tasks such as scene
recognition, 3D reconstruction, and simultaneous localization and mapping
(SLAM). Compared to points, lines can provide both low-level and high-level
geometrical information for downstream tasks. In this paper, we propose a novel
edge-based line detection algorithm, AirLine, which can be applied to various
tasks. In contrast to existing learnable endpoint-based methods which are
sensitive to the geometrical condition of environments, AirLine can extract
line segments directly from edges, resulting in a better generalization ability
for unseen environments. Also to balance efficiency and accuracy, we introduce
a region-grow algorithm and local edge voting scheme for line parameterization.
To the best of our knowledge, AirLine is one of the first learnable edge-based
line detection methods. Our extensive experiments show that it retains
state-of-the-art-level precision yet with a 3-80 times runtime acceleration
compared to other learning-based methods, which is critical for low-power
robots
CornerFormer: Boosting Corner Representation for Fine-Grained Structured Reconstruction
Structured reconstruction is a non-trivial dense prediction problem, which
extracts structural information (\eg, building corners and edges) from a raster
image, then reconstructs it to a 2D planar graph accordingly. Compared with
common segmentation or detection problems, it significantly relays on the
capability that leveraging holistic geometric information for structural
reasoning. Current transformer-based approaches tackle this challenging problem
in a two-stage manner, which detect corners in the first model and classify the
proposed edges (corner-pairs) in the second model. However, they separate
two-stage into different models and only share the backbone encoder. Unlike the
existing modeling strategies, we present an enhanced corner representation
method: 1) It fuses knowledge between the corner detection and edge prediction
by sharing feature in different granularity; 2) Corner candidates are proposed
in four heatmap channels w.r.t its direction. Both qualitative and quantitative
evaluations demonstrate that our proposed method can better reconstruct
fine-grained structures, such as adjacent corners and tiny edges. Consequently,
it outperforms the state-of-the-art model by +1.9\%@F-1 on Corner and
+3.0\%@F-1 on Edge