38 research outputs found
A survey on deep geometry learning: from a representation perspective
Researchers have achieved great success in dealing with 2D images using deep learning. In recent years, 3D computer vision and geometry deep learning have gained ever more attention. Many advanced techniques for 3D shapes have been proposed for different applications. Unlike 2D images, which can be uniformly represented by a regular grid of pixels, 3D shapes have various representations, such as depth images, multi-view images, voxels, point clouds, meshes, implicit surfaces, etc. The performance achieved in different applications largely depends on the representation used, and there is no unique representation that works well for all applications. Therefore, in this survey, we review recent developments in deep learning for 3D geometry from a representation perspective, summarizing the advantages and disadvantages of different representations for different applications. We also present existing datasets in these representations and further discuss future research directions
Holistic indoor scene understanding, modelling and reconstruction from single images.
3D indoor scene understanding in computer vision refers to perceiving the semantic and geometric information in a 3D indoor environment from partial observations (e.g. images or depth scans). Semantics in a scene generally involves the conceptual knowledge such as the room layout, object categories, and their interrelationships (e.g. support relationship). These scene semantics are usually coupled with object and room geometry for 3D scene understanding, for example, layout plan (i.e. location of walls, ceiling and floor), shape of in-room objects, and a camera pose of observer. This thesis focuses on the problem of holistic 3D scene understanding from single images to model or reconstruct the in- door geometry with enriched scene semantics. This challenging task requires computers to perform equivalently as human vision system to perceive and understand indoor contents from colour intensities. Existing works either focus on a sub-problem (e.g. layout estimation, 3D detection or object reconstruction), or ad- dressing this entire problem with independent subtasks, while this thesis aims to an integrated and unified solution toward semantic scene understanding and reconstruction. In this thesis, scene semantics and geometry are regarded inter- twined and complementary. Understanding each part (semantics or geometry) helps to perceive the other one, which enables joint scene understanding, modelling & reconstruction. We start by the problem of semantic scene modelling. To estimate the object semantics and shapes from a single image, a feasible scene modelling streamline is proposed. It is backboned with fully convolutional networks to learn 2D semantics and geometry, and powered by a top-down shape retrieval for object modelling. After this, We build a unified and more efficient visual system for semantic scene modelling. Scene semantics are divided into relational (i.e. support relationship) and non-relational (i.e. object segmentation & geometry, room layout) knowledge. A Relation Network is proposed to estimate the support relations between objects to guide the object modelling process. Afterwards, We focus on the problem of holistic and end-to-end scene understanding and reconstruction. Instead of modelling scenes by top-down shape retrieval, this method bridges the gap between scene understanding and object mesh reconstruction. It does not rely on any external CAD repositories. Camera poses, room lay- out, object bounding boxes and meshes are end-to-end predicted from an RGB image with a single network architecture. At the end, We extend our work by using a different input modality, single-view depth scan, to explore the object reconstruction performance. A skeleton-bridged approach is proposed to predict the meso-skeleton of shapes as an intermediate representation to guide surface reconstruction, which outperforms the prior-arts in shape completion. Overall, this thesis provides a series of novel approaches towards holistic 3D indoor scene understanding, modelling and reconstruction. It aims at automatic 3D scene perception that enables machines to understand and predict 3D contents as human vision, which we hope could advance the boundaries of 3D vision in machine perception, robotics and Artificial Intelligence
Neural Wavelet-domain Diffusion for 3D Shape Generation
This paper presents a new approach for 3D shape generation, enabling direct
generative modeling on a continuous implicit representation in wavelet domain.
Specifically, we propose a compact wavelet representation with a pair of coarse
and detail coefficient volumes to implicitly represent 3D shapes via truncated
signed distance functions and multi-scale biorthogonal wavelets, and formulate
a pair of neural networks: a generator based on the diffusion model to produce
diverse shapes in the form of coarse coefficient volumes; and a detail
predictor to further produce compatible detail coefficient volumes for
enriching the generated shapes with fine structures and details. Both
quantitative and qualitative experimental results manifest the superiority of
our approach in generating diverse and high-quality shapes with complex
topology and structures, clean surfaces, and fine details, exceeding the 3D
generation capabilities of the state-of-the-art models
Differentiable SAR Renderer and SAR Target Reconstruction
Forward modeling of wave scattering and radar imaging mechanisms is the key
to information extraction from synthetic aperture radar (SAR) images. Like
inverse graphics in optical domain, an inherently-integrated forward-inverse
approach would be promising for SAR advanced information retrieval and target
reconstruction. This paper presents such an attempt to the inverse graphics for
SAR imagery. A differentiable SAR renderer (DSR) is developed which
reformulates the mapping and projection algorithm of SAR imaging mechanism in
the differentiable form of probability maps. First-order gradients of the
proposed DSR are then analytically derived which can be back-propagated from
rendered image/silhouette to the target geometry and scattering attributes. A
3D inverse target reconstruction algorithm from SAR images is devised. Several
simulation and reconstruction experiments are conducted, including targets with
and without background, using both synthesized data or real measured inverse
SAR (ISAR) data by ground radar. Results demonstrate the efficacy of the
proposed DSR and its inverse approach