5 research outputs found
SMA-Net: Deep learning-based identification and fitting of CAD models from point clouds
Identifcation and ftting is an important task in reverse engineering and virtual/augmented reality. Compared to the traditional
approaches, carrying out such tasks with a deep learning-based method have much room to exploit. This paper presents
SMA-Net (Spatial Merge Attention Network), a novel deep learning-based end-to-end bottom-up architecture, specifcally
focused on fast identifcation and ftting of CAD models from point clouds. The network is composed of three parts whose
strengths are clearly highlighted: voxel-based multi-resolution feature extractor, spatial merge attention mechanism and
multi-task head. It is trained with both virtually-generated point clouds and as-scanned ones created from multiple instances
of CAD models, themselves obtained with randomly generated parameter values. Using this data generation pipeline, the
proposed approach is validated on two diferent data sets that have been made publicly available: robot data set for Industry
4.0 applications, and furniture data set for virtual/augmented reality. Experiments show that this reconstruction strategy
achieves compelling and accurate results in a very high speed, and that it is very robust on real data obtained for instance
by laser scanner and Kinect
On the sample consensus robust estimation paradigm: comprehensive survey and novel algorithms with applications.
Master of Science in Statistics and Computer Science.University of KwaZulu-Natal, Durban 2016.This study begins with a comprehensive survey of existing variants of the Random Sample Consensus (RANSAC) algorithm. Then, five new ones are contributed. RANSAC, arguably the most popular robust estimation algorithm in computer vision, has limitations in accuracy, efficiency and repeatability. Research into techniques for overcoming these drawbacks, has been active for about two decades. In the last one-and-half decade, nearly every single year had at least one variant published: more than ten, in the last two years. However, many existing variants compromise two attractive properties of the original RANSAC: simplicity and generality. Some introduce new operations, resulting in loss of simplicity, while many of those that do not introduce new operations, require problem-specific priors. In this way, they trade off generality and introduce some complexity, as well as dependence on other steps of the workflow of applications. Noting that these observations may explain the persisting trend, of finding only the older, simpler variants in ‘mainstream’ computer vision software libraries, this work adopts an approach that preserves the two mentioned properties. Modification of the original algorithm, is restricted to only search strategy replacement, since many drawbacks of RANSAC are consequences of the search strategy it adopts. A second constraint, serving the purpose of preserving generality, is that this ‘ideal’ strategy, must require no problem-specific priors. Such a strategy is developed, and reported in this dissertation. Another limitation, yet to be overcome in literature, but is successfully addressed in this study, is the inherent variability, in RANSAC.
A few theoretical discoveries are presented, providing insights on the generic robust estimation problem. Notably, a theorem proposed as an original contribution of this research, reveals insights, that are foundational to newly proposed algorithms. Experiments on both generic and computer-vision-specific data, show that all proposed algorithms, are generally more accurate and more consistent, than RANSAC. Moreover, they are simpler in the sense that, they do not require some of the input parameters of RANSAC. Interestingly, although non-exhaustive in search like the typical RANSAC-like algorithms, three of these new algorithms, exhibit absolute non-randomness, a property that is not claimed by any existing variant. One of the proposed algorithms, is fully automatic, eliminating all requirements of user-supplied input parameters. Two of the proposed algorithms, are implemented as contributed alternatives to the homography estimation function, provided in MATLAB’s computer vision toolbox, after being shown to improve on the performance of M-estimator Sample Consensus (MSAC). MSAC has been the choice in all releases of the toolbox, including the latest 2015b. While this research is motivated by computer vision applications, the proposed algorithms, being generic, can be applied to any model-fitting problem from other scientific fields
Multiple View Texture Mapping: A Rendering Approach Designed for Driving Simulation
Simulation provides a safe and controlled environment ideal for human
testing [49, 142, 120]. Simulation of real environments has reached
new heights in terms of photo-realism. Often, a team of professional
graphical artists would have to be hired to compete with modern commercial
simulators. Meanwhile, machine vision methods are currently
being developed that attempt to automatically provide geometrically
consistent and photo-realistic 3D models of real scenes [189, 139, 115,
19, 140, 111, 132]. Often the only requirement is a set of images of
that scene. A road engineer wishing to simulate the environment of a
real road for driving experiments could potentially use these tools.
This thesis develops a driving simulator that uses machine vision
methods to reconstruct a real road automatically. A computer graphics
method called projective texture mapping is applied to enhance
the photo-realism of the 3D models[144, 43]. This essentially creates
a virtual projector in the 3D environment to automatically assign image
coordinates to a 3D model. These principles are demonstrated
using custom shaders developed for an OpenGL rendering pipeline.
Projective texture mapping presents a list of challenges to overcome,
these include reverse projection and projection onto surfaces not immediately
in front of the projector [53]. A significant challenge was
the removal of dynamic foreground objects. 3D reconstruction systems
create 3D models based on static objects captured in images.
Dynamic objects are rarely reconstructed. Projective texture mapping
of images, including these dynamic objects, can result in visual
artefacts. A workflow is developed to resolve this, resulting in videos
and 3D reconstructions of streets with no moving vehicles on the scene.
The final simulator using 3D reconstruction and projective texture
mapping is then developed. The rendering camera had a motion
model introduced to enable human interaction. The final system is
presented, experimentally tested, and future potential works are discussed