282 research outputs found
Holistically-Attracted Wireframe Parsing
This paper presents a fast and parsimonious parsing method to accurately and
robustly detect a vectorized wireframe in an input image with a single forward
pass. The proposed method is end-to-end trainable, consisting of three
components: (i) line segment and junction proposal generation, (ii) line
segment and junction matching, and (iii) line segment and junction
verification. For computing line segment proposals, a novel exact dual
representation is proposed which exploits a parsimonious geometric
reparameterization for line segments and forms a holistic 4-dimensional
attraction field map for an input image. Junctions can be treated as the
"basins" in the attraction field. The proposed method is thus called
Holistically-Attracted Wireframe Parser (HAWP). In experiments, the proposed
method is tested on two benchmarks, the Wireframe dataset, and the YorkUrban
dataset. On both benchmarks, it obtains state-of-the-art performance in terms
of accuracy and efficiency. For example, on the Wireframe dataset, compared to
the previous state-of-the-art method L-CNN, it improves the challenging mean
structural average precision (msAP) by a large margin ( absolute
improvements) and achieves 29.5 FPS on single GPU ( relative
improvement). A systematic ablation study is performed to further justify the
proposed method.Comment: Accepted by CVPR 202
Neural Wireframe Renderer: Learning Wireframe to Image Translations
In architecture and computer-aided design, wireframes (i.e., line-based
models) are widely used as basic 3D models for design evaluation and fast
design iterations. However, unlike a full design file, a wireframe model lacks
critical information, such as detailed shape, texture, and materials, needed by
a conventional renderer to produce 2D renderings of the objects or scenes. In
this paper, we bridge the information gap by generating photo-realistic
rendering of indoor scenes from wireframe models in an image translation
framework. While existing image synthesis methods can generate visually
pleasing images for common objects such as faces and birds, these methods do
not explicitly model and preserve essential structural constraints in a
wireframe model, such as junctions, parallel lines, and planar surfaces. To
this end, we propose a novel model based on a structure-appearance joint
representation learned from both images and wireframes. In our model,
structural constraints are explicitly enforced by learning a joint
representation in a shared encoder network that must support the generation of
both images and wireframes. Experiments on a wireframe-scene dataset show that
our wireframe-to-image translation model significantly outperforms the
state-of-the-art methods in both visual quality and structural integrity of
generated images.Comment: ECCV 202
Applications in Monocular Computer Vision using Geometry and Learning : Map Merging, 3D Reconstruction and Detection of Geometric Primitives
As the dream of autonomous vehicles moving around in our world comes closer, the problem of robust localization and mapping is essential to solve. In this inherently structured and geometric problem we also want the agents to learn from experience in a data driven fashion. How the modern Neural Network models can be combined with Structure from Motion (SfM) is an interesting research question and this thesis studies some related problems in 3D reconstruction, feature detection, SfM and map merging.In Paper I we study how a Bayesian Neural Network (BNN) performs in Semantic Scene Completion, where the task is to predict a semantic 3D voxel grid for the Field of View of a single RGBD image. We propose an extended task and evaluate the benefits of the BNN when encountering new classes at inference time. It is shown that the BNN outperforms the deterministic baseline.Papers II-III are about detection of points, lines and planes defining a Room Layout in an RGB image. Due to the repeated textures and homogeneous colours of indoor surfaces it is not ideal to only use point features for Structure from Motion. The idea is to complement the point features by detecting a Wireframe – a connected set of line segments – which marks the intersection of planes in the Room Layout. Paper II concerns a task for detecting a Semantic Room Wireframe and implements a Neural Network model utilizing a Graph Convolutional Network module. The experiments show that the method is more flexible than previous Room Layout Estimation methods and perform better than previous Wireframe Parsing methods. Paper III takes the task closer to Room Layout Estimation by detecting a connected set of semantic polygons in an RGB image. The end-to-end trainable model is a combination of a Wireframe Parsing model and a Heterogeneous Graph Neural Network. We show promising results by outperforming state of the art models for Room Layout Estimation using synthetic Wireframe detections. However, the joint Wireframe and Polygon detector requires further research to compete with the state of the art models.In Paper IV we propose minimal solvers for SfM with parallel cylinders. The problem may be reduced to estimating circles in 2D and the paper contributes with theory for the twoview relative motion and two-circle relative structure problem. Fast solvers are derived and experiments show good performance in both simulation and on real data.Papers V-VII cover the task of map merging. That is, given a set of individually optimized point clouds with camera poses from a SfM pipeline, how can the solutions be effectively merged without completely resolving the Structure from Motion problem? Papers V-VI introduce an effective method for merging and shows the effectiveness through experiments of real and simulated data. Paper VII considers the matching problem for point clouds and proposes minimal solvers that allows for deformation ofeach point cloud. Experiments show that the method robustly matches point clouds with drift in the SfM solution
UI Layout Generation with LLMs Guided by UI Grammar
The recent advances in Large Language Models (LLMs) have stimulated interest
among researchers and industry professionals, particularly in their application
to tasks concerning mobile user interfaces (UIs). This position paper
investigates the use of LLMs for UI layout generation. Central to our
exploration is the introduction of UI grammar -- a novel approach we proposed
to represent the hierarchical structure inherent in UI screens. The aim of this
approach is to guide the generative capacities of LLMs more effectively and
improve the explainability and controllability of the process. Initial
experiments conducted with GPT-4 showed the promising capability of LLMs to
produce high-quality user interfaces via in-context learning. Furthermore, our
preliminary comparative study suggested the potential of the grammar-based
approach in improving the quality of generative results in specific aspects.Comment: ICML 2023 Workshop on AI and HC
- …