29 research outputs found

    Differentiable algorithms with data-driven parameterization in 3D vision

    Get PDF
    This thesis is concerned with designing and analyzing efficient differentiable data flow for representations in the field of 3D vision and applying it to different 3D vision tasks. To this end, the topic is looked upon from the perspective of differentiable algorithms, a more general variant of Deep Learning, utilizing the recently emerged tools in the field of differentiable programming. Contributions are made in the subfields of Graph Neural Networks (GNNs), differentiable matrix decompositions and implicit neural functions, which serve as important building blocks for differentiable algorithms in 3D vision. The contributions include SplineCNN, a neural network consisting of operators for continuous convolution on irregularly structured data, Local Spatial Graph Transformers, a GNN to infer local surface orientations on point clouds, and a parallel GPU solver for Eigendecomposition on a large number of symmetric matrices. For all methods, efficient forward and backward GPU implementations are provided. Consequently, two differentiable algorithms are introduced, composed of building blocks from these concept areas. The first algorithm, Differentiable Iterative Surface Normal Estimation, is an iterative algorithm for surface normal estimation on unstructured point clouds. The second algorithm, Group Equivariant Capsule Networks, is a version of capsule networks grounded in group theory for unsupervised pose estimation and, in general, for inferring disentangled representations from 2D and 3D data. The thesis concludes that a favorable trade-off in the metrics of efficiency, quality and interpretability can be found by combining prior geometric knowledge about algorithms and data types with the representational power of Deep Learning

    Part-Whole Relational Few-Shot 3D Point Cloud Semantic Segmentation

    Get PDF
    The author wishes to extend sincere appreciation to Professor Lin Shi for the generous provision of equipment support, which significantly aided in the successful completion of this research. Furthermore, the author expresses gratitude to Associate Professor Ning Li and Teacher Wei Guan for their invaluable academic guidance and unwavering support. Their expertise and advice played a crucial role in shaping the direction and quality of this research.Peer reviewe

    Enhanced Capsule-based Networks and Their Applications

    Get PDF
    Current deep models have achieved human-like accuracy in many computer vision tasks, even defeating humans sometimes. However, these deep models still suffer from significant weaknesses. To name a few, it is hard to interpret how they reach decisions, and it is easy to attack them with tiny perturbations. A capsule, usually implemented as a vector, represents an object or object part. Capsule networks and GLOM consist of classic and generalized capsules respectively, where the difference is whether the capsule is limited to representing a fixed thing. Both models are designed to parse their input into a part-whole hierarchy as humans do, where each capsule corresponds to an entity of the hierarchy. That is, the first layer finds the lowest-level vision patterns, and the following layers assemble the larger patterns till the entire object, e.g., from nostril to nose, face, and person. This design enables capsule networks and GLOM the potential of solving the above problems of current deep models, by mimicking how humans overcome these problems with the part-whole hierarchy. However, their current implementations are not perfect on fulfilling their potentials and require further improvements, including intrinsic interpretability, guaranteed equivariance, robustness to adversarial attacks, a more efficient routing algorithm, compatibility with other models, etc. In this dissertation, I first briefly introduce the motivations, essential ideas, and existing implementations of capsule networks and GLOM, then focus on addressing some limitations of these implementations. The improvements are briefly summarized as follows. First, a fast non-iterative routing algorithm is proposed for capsule networks, which facilitates their applications in many tasks such as image classification and segmentation. Second, a new architecture, named Twin-Islands, is proposed based on GLOM, which achieves the many desired properties such as equivariance, model interpretability, and adversarial robustness. Lastly, the essential idea of capsule networks and GLOM is re-implemented in a small group ensemble block, which can also be used along with other types of neural networks, e.g., CNNs, on various tasks such as image classification, segmentation, and retrieval

    You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

    Full text link
    In this paper, we propose a novel local descriptor-based framework, called You Only Hypothesize Once (YOHO), for the registration of two unaligned point clouds. In contrast to most existing local descriptors which rely on a fragile local reference frame to gain rotation invariance, the proposed descriptor achieves the rotation invariance by recent technologies of group equivariant feature learning, which brings more robustness to point density and noise. Meanwhile, the descriptor in YOHO also has a rotation equivariant part, which enables us to estimate the registration from just one correspondence hypothesis. Such property reduces the searching space for feasible transformations, thus greatly improves both the accuracy and the efficiency of YOHO. Extensive experiments show that YOHO achieves superior performances with much fewer needed RANSAC iterations on four widely-used datasets, the 3DMatch/3DLoMatch datasets, the ETH dataset and the WHU-TLS dataset. More details are shown in our project page: https://hpwang-whu.github.io/YOHO/.Comment: Accepted by ACM Multimedia(MM) 2022, Project page: https://hpwang-whu.github.io/YOHO

    TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis

    Full text link
    Rotation invariance is an important requirement for the analysis of 3D point clouds. In this paper, we present a learnable descriptor for rotation- and reflection-invariant 3D point cloud analysis based on recently introduced steerable 3D spherical neurons and vector neurons. Specifically, we show the compatibility of the two approaches and apply steerable neurons in an end-to-end method, which both constitute the technical novelty. In our approach, we perform TetraTransform -- which lifts the 3D input to an equivariant 4D representation, constructed by the steerable neurons -- and extract deeper rotation-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework, termed TetraSphere, inexpensively increases the number of parameters by less than 0.0007%. Taking only points as input, TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the hardest subset of ScanObjectNN, even when trained on data without additional rotation augmentation. Additionally, TetraSphere demonstrates the second-best performance segmenting parts of the synthetic ShapeNet, consistently outperforming the baseline VN-DGCNN. All in all, our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space
    corecore