86 research outputs found
Pointwise Convolutional Neural Networks
Deep learning with 3D data such as reconstructed point clouds and CAD models
has received great research interests recently. However, the capability of
using point clouds with convolutional neural network has been so far not fully
explored. In this paper, we present a convolutional neural network for semantic
segmentation and object recognition with 3D point clouds. At the core of our
network is pointwise convolution, a new convolution operator that can be
applied at each point of a point cloud. Our fully convolutional network design,
while being surprisingly simple to implement, can yield competitive accuracy in
both semantic segmentation and object recognition task.Comment: 10 pages, 6 figures, 10 tables. Paper accepted to CVPR 201
ShellNet: Efficient Point Cloud Convolutional Neural Networks using Concentric Shells Statistics
Deep learning with 3D data has progressed significantly since the
introduction of convolutional neural networks that can handle point order
ambiguity in point cloud data. While being able to achieve good accuracies in
various scene understanding tasks, previous methods often have low training
speed and complex network architecture. In this paper, we address these
problems by proposing an efficient end-to-end permutation invariant convolution
for point cloud deep learning. Our simple yet effective convolution operator
named ShellConv uses statistics from concentric spherical shells to define
representative features and resolve the point order ambiguity, allowing
traditional convolution to perform on such features. Based on ShellConv we
further build an efficient neural network named ShellNet to directly consume
the point clouds with larger receptive fields while maintaining less layers. We
demonstrate the efficacy of ShellNet by producing state-of-the-art results on
object classification, object part segmentation, and semantic scene
segmentation while keeping the network very fast to train.Comment: International Conference on Computer Vision (ICCV) 2019 Ora
Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
The integration of neural rendering and the SLAM system recently showed
promising results in joint localization and photorealistic view reconstruction.
However, existing methods, fully relying on implicit representations, are so
resource-hungry that they cannot run on portable devices, which deviates from
the original intention of SLAM. In this paper, we present Photo-SLAM, a novel
SLAM framework with a hyper primitives map. Specifically, we simultaneously
exploit explicit geometric features for localization and learn implicit
photometric features to represent the texture information of the observed
environment. In addition to actively densifying hyper primitives based on
geometric features, we further introduce a Gaussian-Pyramid-based training
method to progressively learn multi-level features, enhancing photorealistic
mapping performance. The extensive experiments with monocular, stereo, and
RGB-D datasets prove that our proposed system Photo-SLAM significantly
outperforms current state-of-the-art SLAM systems for online photorealistic
mapping, e.g., PSNR is 30% higher and rendering speed is hundreds of times
faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time
speed using an embedded platform such as Jetson AGX Orin, showing the potential
of robotics applications.Comment: CVPR 2024. Code: https://github.com/HuajianUP/Photo-SLAM - Project
Page: https://huajianup.github.io/research/Photo-SLAM
Locally Stylized Neural Radiance Fields
In recent years, there has been increasing interest in applying stylization
on 3D scenes from a reference style image, in particular onto neural radiance
fields (NeRF). While performing stylization directly on NeRF guarantees
appearance consistency over arbitrary novel views, it is a challenging problem
to guide the transfer of patterns from the style image onto different parts of
the NeRF scene. In this work, we propose a stylization framework for NeRF based
on local style transfer. In particular, we use a hash-grid encoding to learn
the embedding of the appearance and geometry components, and show that the
mapping defined by the hash table allows us to control the stylization to a
certain extent. Stylization is then achieved by optimizing the appearance
branch while keeping the geometry branch fixed. To support local style
transfer, we propose a new loss function that utilizes a segmentation network
and bipartite matching to establish region correspondences between the style
image and the content images obtained from volume rendering. Our experiments
show that our method yields plausible stylization results with novel view
synthesis while having flexible controllability via manipulating and
customizing the region correspondences.Comment: ICCV 202
360Roam: Real-Time Indoor Roaming Using Geometry-Aware 360 Radiance Fields
Virtual tour among sparse 360 images is widely used while hindering
smooth and immersive roaming experiences. The emergence of Neural Radiance
Field (NeRF) has showcased significant progress in synthesizing novel views,
unlocking the potential for immersive scene exploration. Nevertheless, previous
NeRF works primarily focused on object-centric scenarios, resulting in
noticeable performance degradation when applied to outward-facing and
large-scale scenes due to limitations in scene parameterization. To achieve
seamless and real-time indoor roaming, we propose a novel approach using
geometry-aware radiance fields with adaptively assigned local radiance fields.
Initially, we employ multiple 360 images of an indoor scene to
progressively reconstruct explicit geometry in the form of a probabilistic
occupancy map, derived from a global omnidirectional radiance field.
Subsequently, we assign local radiance fields through an adaptive
divide-and-conquer strategy based on the recovered geometry. By incorporating
geometry-aware sampling and decomposition of the global radiance field, our
system effectively utilizes positional encoding and compact neural networks to
enhance rendering quality and speed. Additionally, the extracted floorplan of
the scene aids in providing visual guidance, contributing to a realistic
roaming experience. To demonstrate the effectiveness of our system, we curated
a diverse dataset of 360 images encompassing various real-life scenes,
on which we conducted extensive experiments. Quantitative and qualitative
comparisons against baseline approaches illustrated the superior performance of
our system in large-scale indoor scene roaming
- β¦