28 research outputs found
Pointwise Convolutional Neural Networks
Deep learning with 3D data such as reconstructed point clouds and CAD models
has received great research interests recently. However, the capability of
using point clouds with convolutional neural network has been so far not fully
explored. In this paper, we present a convolutional neural network for semantic
segmentation and object recognition with 3D point clouds. At the core of our
network is pointwise convolution, a new convolution operator that can be
applied at each point of a point cloud. Our fully convolutional network design,
while being surprisingly simple to implement, can yield competitive accuracy in
both semantic segmentation and object recognition task.Comment: 10 pages, 6 figures, 10 tables. Paper accepted to CVPR 201
TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation
In this paper, we conduct a study on the state-of-the-art methods for
text-to-image synthesis and propose a framework to evaluate these methods. We
consider syntheses where an image contains a single or multiple objects. Our
study outlines several issues in the current evaluation pipeline: (i) for image
quality assessment, a commonly used metric, e.g., Inception Score (IS), is
often either miscalibrated for the single-object case or misused for the
multi-object case; (ii) for text relevance and object accuracy assessment,
there is an overfitting phenomenon in the existing R-precision (RP) and
Semantic Object Accuracy (SOA) metrics, respectively; (iii) for multi-object
case, many vital factors for evaluation, e.g., object fidelity, positional
alignment, counting alignment, are largely dismissed; (iv) the ranking of the
methods based on current metrics is highly inconsistent with real images. To
overcome these issues, we propose a combined set of existing and new metrics to
systematically evaluate the methods. For existing metrics, we offer an improved
version of IS named IS* by using temperature scaling to calibrate the
confidence of the classifier used by IS; we also propose a solution to mitigate
the overfitting issues of RP and SOA. For new metrics, we develop counting
alignment, positional alignment, object-centric IS, and object-centric FID
metrics for evaluating the multi-object case. We show that benchmarking with
our bag of metrics results in a highly consistent ranking among existing
methods that is well-aligned with human evaluation. As a by-product, we create
AttnGAN++, a simple but strong baseline for the benchmark by stabilizing the
training of AttnGAN using spectral normalization. We also release our toolbox,
so-called TISE, for advocating fair and consistent evaluation of text-to-image
models.Comment: Accepted to ECCV 2022; TISE toolbox is available at
https://github.com/VinAIResearch/tise-toolbo
ShellNet: Efficient Point Cloud Convolutional Neural Networks using Concentric Shells Statistics
Deep learning with 3D data has progressed significantly since the
introduction of convolutional neural networks that can handle point order
ambiguity in point cloud data. While being able to achieve good accuracies in
various scene understanding tasks, previous methods often have low training
speed and complex network architecture. In this paper, we address these
problems by proposing an efficient end-to-end permutation invariant convolution
for point cloud deep learning. Our simple yet effective convolution operator
named ShellConv uses statistics from concentric spherical shells to define
representative features and resolve the point order ambiguity, allowing
traditional convolution to perform on such features. Based on ShellConv we
further build an efficient neural network named ShellNet to directly consume
the point clouds with larger receptive fields while maintaining less layers. We
demonstrate the efficacy of ShellNet by producing state-of-the-art results on
object classification, object part segmentation, and semantic scene
segmentation while keeping the network very fast to train.Comment: International Conference on Computer Vision (ICCV) 2019 Ora
ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution
Existing 3D instance segmentation methods are predominated by the bottom-up
design -- manually fine-tuned algorithm to group points into clusters followed
by a refinement network. However, by relying on the quality of the clusters,
these methods generate susceptible results when (1) nearby objects with the
same semantic class are packed together, or (2) large objects with loosely
connected regions. To address these limitations, we introduce ISBNet, a novel
cluster-free method that represents instances as kernels and decodes instance
masks via dynamic convolution. To efficiently generate high-recall and
discriminative kernels, we propose a simple strategy named Instance-aware
Farthest Point Sampling to sample candidates and leverage the local aggregation
layer inspired by PointNet++ to encode candidate features. Moreover, we show
that predicting and leveraging the 3D axis-aligned bounding boxes in the
dynamic convolution further boosts performance. Our method set new
state-of-the-art results on ScanNetV2 (55.9), S3DIS (60.8), and STPLS3D (49.2)
in terms of AP and retains fast inference time (237ms per scene on ScanNetV2).
The source code and trained models are available at
https://github.com/VinAIResearch/ISBNet.Comment: Accepted to CVPR 202
GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers
Instance segmentation on 3D point clouds (3DIS) is a longstanding challenge
in computer vision, where state-of-the-art methods are mainly based on full
supervision. As annotating ground truth dense instance masks is tedious and
expensive, solving 3DIS with weak supervision has become more practical. In
this paper, we propose GaPro, a new instance segmentation for 3D point clouds
using axis-aligned 3D bounding box supervision. Our two-step approach involves
generating pseudo labels from box annotations and training a 3DIS network with
the resulting labels. Additionally, we employ the self-training strategy to
improve the performance of our method further. We devise an effective Gaussian
Process to generate pseudo instance masks from the bounding boxes and resolve
ambiguities when they overlap, resulting in pseudo instance masks with their
uncertainty values. Our experiments show that GaPro outperforms previous weakly
supervised 3D instance segmentation methods and has competitive performance
compared to state-of-the-art fully supervised ones. Furthermore, we demonstrate
the robustness of our approach, where we can adapt various state-of-the-art
fully supervised methods to the weak supervision task by using our pseudo
labels for training. The source code and trained models are available at
https://github.com/VinAIResearch/GaPro.Comment: Accepted to ICCV 202
Locally Stylized Neural Radiance Fields
In recent years, there has been increasing interest in applying stylization
on 3D scenes from a reference style image, in particular onto neural radiance
fields (NeRF). While performing stylization directly on NeRF guarantees
appearance consistency over arbitrary novel views, it is a challenging problem
to guide the transfer of patterns from the style image onto different parts of
the NeRF scene. In this work, we propose a stylization framework for NeRF based
on local style transfer. In particular, we use a hash-grid encoding to learn
the embedding of the appearance and geometry components, and show that the
mapping defined by the hash table allows us to control the stylization to a
certain extent. Stylization is then achieved by optimizing the appearance
branch while keeping the geometry branch fixed. To support local style
transfer, we propose a new loss function that utilizes a segmentation network
and bipartite matching to establish region correspondences between the style
image and the content images obtained from volume rendering. Our experiments
show that our method yields plausible stylization results with novel view
synthesis while having flexible controllability via manipulating and
customizing the region correspondences.Comment: ICCV 202