351 research outputs found
3D Point Cloud Denoising via Deep Neural Network based Local Surface Estimation
We present a neural-network-based architecture for 3D point cloud denoising
called neural projection denoising (NPD). In our previous work, we proposed a
two-stage denoising algorithm, which first estimates reference planes and
follows by projecting noisy points to estimated reference planes. Since the
estimated reference planes are inevitably noisy, multi-projection is applied to
stabilize the denoising performance. NPD algorithm uses a neural network to
estimate reference planes for points in noisy point clouds. With more accurate
estimations of reference planes, we are able to achieve better denoising
performances with only one-time projection. To the best of our knowledge, NPD
is the first work to denoise 3D point clouds with deep learning techniques. To
conduct the experiments, we sample 40000 point clouds from the 3D data in
ShapeNet to train a network and sample 350 point clouds from the 3D data in
ModelNet10 to test. Experimental results show that our algorithm can estimate
normal vectors of points in noisy point clouds. Comparing to five competitive
methods, the proposed algorithm achieves better denoising performance and
produces much smaller variances
Geometric Structure Extraction and Reconstruction
Geometric structure extraction and reconstruction is a long-standing problem in research communities including computer graphics, computer vision, and machine learning. Within different communities, it can be interpreted as different subproblems such as skeleton extraction from the point cloud, surface reconstruction from multi-view images, or manifold learning from high dimensional data. All these subproblems are building blocks of many modern applications, such as scene reconstruction for AR/VR, object recognition for robotic vision and structural analysis for big data. Despite its importance, the extraction and reconstruction of a geometric structure from real-world data are ill-posed, where the main challenges lie in the incompleteness, noise, and inconsistency of the raw input data. To address these challenges, three studies are conducted in this thesis: i) a new point set representation for shape completion, ii) a structure-aware data consolidation method, and iii) a data-driven deep learning technique for multi-view consistency. In addition to theoretical contributions, the algorithms we proposed significantly improve the performance of several state-of-the-art geometric structure extraction and reconstruction approaches, validated by extensive experimental results
Gated networks: an inventory
Gated networks are networks that contain gating connections, in which the
outputs of at least two neurons are multiplied. Initially, gated networks were
used to learn relationships between two input sources, such as pixels from two
images. More recently, they have been applied to learning activity recognition
or multi-modal representations. The aims of this paper are threefold: 1) to
explain the basic computations in gated networks to the non-expert, while
adopting a standpoint that insists on their symmetric nature. 2) to serve as a
quick reference guide to the recent literature, by providing an inventory of
applications of these networks, as well as recent extensions to the basic
architecture. 3) to suggest future research directions and applications.Comment: Unpublished manuscript, 17 page
Human motion convolutional autoencoders using different rotation representations
This research proposes the application of four different techniques of animation storage
(Axis Angle, Quaternions, Rotation Matrices and Euler Angles), in order to determine the advantages
and disadvantages of each method through the training and evaluation of autoencoders for
reconstructing and denoising parsed data, when passing through a convolutional neural network.
The designed autoencoders provide a novel insight into the comparative performance of these animation
representation methods in an analog architecture, making them measurable in the same
conditions, and thus possible to evaluate with quantitative metrics such as Minimum Square Error
(MSE), and Root Mean Square Error (RMSE), as well as qualitatively through close observation of
the naturality, its real-time performance after being decoded in full output sequences.
My results show that the most accurate method for this purpose qualitatively is Quaternions, followed
by Rotation Matrices, Euler Angles and finally with the least accurate results:e Axis Angles.
These results persist in decoding and in simple encoding-decoding. Consistent denoising results
were achieved in the representations, up until sequences with 25% of added gaussian noise
Unsupervised Human Action Recognition with Skeletal Graph Laplacian and Self-Supervised Viewpoints Invariance
This paper presents a novel end-to-end method for the problem of
skeleton-based unsupervised human action recognition. We propose a new
architecture with a convolutional autoencoder that uses graph Laplacian
regularization to model the skeletal geometry across the temporal dynamics of
actions. Our approach is robust towards viewpoint variations by including a
self-supervised gradient reverse layer that ensures generalization across
camera views. The proposed method is validated on NTU-60 and NTU-120
large-scale datasets in which it outperforms all prior unsupervised
skeleton-based approaches on the cross-subject, cross-view, and cross-setup
protocols. Although unsupervised, our learnable representation allows our
method even to surpass a few supervised skeleton-based action recognition
methods. The code is available in:
www.github.com/IIT-PAVIS/UHAR_Skeletal_Laplacia
- …