29 research outputs found
Differentiable algorithms with data-driven parameterization in 3D vision
This thesis is concerned with designing and analyzing efficient differentiable data flow for representations in the field of 3D vision and applying it to different 3D vision tasks. To this end, the topic is looked upon from the perspective of differentiable algorithms, a more general variant of Deep Learning, utilizing the recently emerged tools in the field of differentiable programming. Contributions are made in the subfields of Graph Neural Networks (GNNs), differentiable matrix decompositions and implicit neural functions, which serve as important building blocks for differentiable algorithms in 3D vision. The contributions include SplineCNN, a neural network consisting of operators for continuous convolution on irregularly structured data, Local Spatial Graph Transformers, a GNN to infer local surface orientations on point clouds, and a parallel GPU solver for Eigendecomposition on a large number of symmetric matrices. For all methods, efficient forward and backward GPU implementations are provided.
Consequently, two differentiable algorithms are introduced, composed of building blocks from these concept areas. The first algorithm, Differentiable Iterative Surface Normal Estimation, is an iterative algorithm for surface normal estimation on unstructured point clouds. The second algorithm, Group Equivariant Capsule Networks, is a version of capsule networks grounded in group theory for unsupervised pose estimation and, in general, for inferring disentangled representations from 2D and 3D data.
The thesis concludes that a favorable trade-off in the metrics of efficiency, quality and interpretability can be found by combining prior geometric knowledge about algorithms and data types with the representational power of Deep Learning
Part-Whole Relational Few-Shot 3D Point Cloud Semantic Segmentation
The author wishes to extend sincere appreciation to Professor Lin Shi for the generous provision of equipment support, which significantly aided in the successful completion of this research. Furthermore, the author expresses gratitude to Associate Professor Ning Li and Teacher Wei Guan for their invaluable academic guidance and unwavering support. Their expertise and advice played a crucial role in shaping the direction and quality of this research.Peer reviewe
Enhanced Capsule-based Networks and Their Applications
Current deep models have achieved human-like accuracy in many computer vision tasks, even defeating humans sometimes. However, these deep models still suffer from significant weaknesses. To name a few, it is hard to interpret how they reach decisions, and it is easy to attack them with tiny perturbations.
A capsule, usually implemented as a vector, represents an object or object part. Capsule networks and GLOM consist of classic and generalized capsules respectively, where the difference is whether the capsule is limited to representing a fixed thing. Both models are designed to parse their input into a part-whole hierarchy as humans do, where each capsule corresponds to an entity of the hierarchy. That is, the first layer finds the lowest-level vision patterns, and the following layers assemble the larger patterns till the entire object, e.g., from nostril to nose, face, and person.
This design enables capsule networks and GLOM the potential of solving the above problems of current deep models, by mimicking how humans overcome these problems with the part-whole hierarchy. However, their current implementations are not perfect on fulfilling their potentials and require further improvements, including intrinsic interpretability, guaranteed equivariance, robustness to adversarial attacks, a more efficient routing algorithm, compatibility with other models, etc.
In this dissertation, I first briefly introduce the motivations, essential ideas, and existing implementations of capsule networks and GLOM, then focus on addressing some limitations of these implementations. The improvements are briefly summarized as
follows. First, a fast non-iterative routing algorithm is proposed for capsule networks, which facilitates their applications in many tasks such as image classification and segmentation. Second, a new architecture, named Twin-Islands, is proposed based on GLOM, which achieves the many desired properties such as equivariance, model interpretability, and adversarial robustness. Lastly, the essential idea of capsule networks and GLOM is re-implemented in a small group ensemble block, which can also be used along with other types of neural networks, e.g., CNNs, on various tasks such as image classification, segmentation, and retrieval
You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors
In this paper, we propose a novel local descriptor-based framework, called
You Only Hypothesize Once (YOHO), for the registration of two unaligned point
clouds. In contrast to most existing local descriptors which rely on a fragile
local reference frame to gain rotation invariance, the proposed descriptor
achieves the rotation invariance by recent technologies of group equivariant
feature learning, which brings more robustness to point density and noise.
Meanwhile, the descriptor in YOHO also has a rotation equivariant part, which
enables us to estimate the registration from just one correspondence
hypothesis. Such property reduces the searching space for feasible
transformations, thus greatly improves both the accuracy and the efficiency of
YOHO. Extensive experiments show that YOHO achieves superior performances with
much fewer needed RANSAC iterations on four widely-used datasets, the
3DMatch/3DLoMatch datasets, the ETH dataset and the WHU-TLS dataset. More
details are shown in our project page: https://hpwang-whu.github.io/YOHO/.Comment: Accepted by ACM Multimedia(MM) 2022, Project page:
https://hpwang-whu.github.io/YOHO
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
Rotation invariance is an important requirement for the analysis of 3D point
clouds. In this paper, we present a learnable descriptor for rotation- and
reflection-invariant 3D point cloud analysis based on recently introduced
steerable 3D spherical neurons and vector neurons. Specifically, we show the
compatibility of the two approaches and apply steerable neurons in an
end-to-end method, which both constitute the technical novelty. In our
approach, we perform TetraTransform -- which lifts the 3D input to an
equivariant 4D representation, constructed by the steerable neurons -- and
extract deeper rotation-equivariant features using vector neurons. This
integration of the TetraTransform into the VN-DGCNN framework, termed
TetraSphere, inexpensively increases the number of parameters by less than
0.0007%. Taking only points as input, TetraSphere sets a new state-of-the-art
performance classifying randomly rotated real-world object scans of the hardest
subset of ScanObjectNN, even when trained on data without additional rotation
augmentation. Additionally, TetraSphere demonstrates the second-best
performance segmenting parts of the synthetic ShapeNet, consistently
outperforming the baseline VN-DGCNN. All in all, our results reveal the
practical value of steerable 3D spherical neurons for learning in 3D Euclidean
space