189 research outputs found

    Dense RGB-D SLAM and object localisation for robotics and industrial applications

    Get PDF
    Dense reconstruction and object localisation are two critical steps in robotic and industrial applications. The former entails a joint estimation of camera egomotion and the structure of the surrounding environment, also known as Simultaneous Localisation and Mapping (SLAM), and the latter aims to locate the object in the reconstructed scenes. This thesis addresses the challenges of dense SLAM with RGB-D cameras and object localisation towards robotic and industrial applications. Camera drift is an essential issue in camera egomotion estimation. Due to the accumulated error in camera pose estimation, the estimated camera trajectory is inaccurate, and the reconstruction of the environment is inconsistent. This thesis analyses camera drift in SLAM under the probabilistic inference framework and proposes an online map fusion strategy with standard deviation estimation based on frame-to-model camera tracking. The camera pose is estimated by aligning the input image with the global map model, and the global map merges the information in the images by weighted fusion with standard deviation modelling. In addition, a pre-screening step is applied before map fusion to preclude the adverse effect of accumulated errors and noises on camera egomotion estimation. Experimental results indicated that the proposed method mitigates camera drift and improves the global consistency of camera trajectories. Another critical challenge for dense RGB-D SLAM in industrial scenarios is to handle mechanical and plastic components that usually have reflective and shiny surfaces. Photometric alignment in frame-to-model camera tracking tends to fail on such objects due to the inconsistency in intensity patterns of the images and the global map model. This thesis addresses this problem and proposes RSO-SLAM, namely a SLAM approach to reflective and shiny object reconstruction. RSO-SLAM adopts frame-to-model camera tracking and combines local photometric alignment and global geometric registration. This study revealed the effectiveness and excellent performance of the proposed RSO-SLAM on both plastic and metallic objects. In addition, a case study involving the cover of a electric vehicle battery with metallic surface demonstrated the superior performance of the RSO-SLAM approach in the reconstruction of a common industrial product. With the reconstructed point cloud model of the object, the problem of object localisation is tackled as point cloud registration in the thesis. Iterative Closest Point (ICP) is arguably the best-known method for point cloud registration, but it is susceptible to sub-optimal convergence due to the multimodal solution space. This thesis proposes the Bees Algorithm (BA) enhanced with the Singular Value Decomposition (SVD) procedure for point cloud registration. SVD accelerates the speed of the local search of the BA, helping the algorithm to rapidly identify the local optima. It also enhances the precision of the obtained solutions. At the same time, the global outlook of the BA ensures adequate exploration of the whole solution space. Experimental results demonstrated the remarkable performance of the SVD-enhanced BA in terms of consistency and precision. Additional tests on noisy datasets demonstrated the robustness of the proposed procedure to imprecision in the models

    Neural representations for object capture and rendering

    Get PDF
    Photometric stereo is a classical computer vision problem with applications ranging from gaming, VR/AR avatars to movie visual effects which requires a faithful reconstruction of an object in a new space, and thus, there is a need to thoroughly understand the object’s visual properties. With the advent of Neural Radiance Fields (NeRFs) in the early 2020s, we witnessed the incredible photorealism provided by the method and its potential beyond. However, original NeRFs do not provide any information about the material and lighting of the objects in focus. Therefore, we propose to tackle the multiview photometric stereo problem using an extension of NeRFs. We provide three novel contributions through this work. First, the Relightable NeRF model, an extension of the original NeRF, where appearance is conditioned on a point light source direction. It provides two use cases - it is able to learn from varying lighting and relight under arbitrary conditions. Second, the Neural BRDF Fields which extends the relightable NeRF by introducing explicit models for surface reflectance and shadowing. The parameters of the BRDF are learnable as a neural field, enabling spatially varying reflectance. The local surface normal direction as another neural field is learned as well. We experiment with both a fixed BRDF (Lambertian) and a learnable (i.e. neural) reflectance model which guarantees a realistic BRDF by tieing the neural network to BRDF physical properties. In addition, it learns local shadowing as a function of light source direction enabling the reconstruction of cast shadows. Finally, the Neural Implicit Fields for Merging Monocular Photometric Stereo switches from NeRF’s volume density function to a signed distance function representation. This provides a straightforward means to compute the surface normal direction and, thus, ties normal-based losses directly to the geometry. We use this representation to address the problem of merging the output of monocular photometric stereo methods into a single unified model: a neural SDF and a neural field capturing diffuse albedo from which we can extract a textured mesh

    Modelling the world in 3D : aspects of the acquisition, processing, management and analysis of spatial 3D data

    Get PDF

    Improved non-contact 3D field and processing techniques to achieve macrotexture characterisation of pavements

    Get PDF
    Macrotexture is required on pavements to provide skid resistance for vehicle safety in wet conditions. Increasingly, correlations between macrotexture measurements captured using non-contact techniques and tyre-pavement contact friction are being investigated in order to enable more robust and widescale measurement and monitoring of skid resistance. There is a notable scarcity of research into the respective accuracy of the non-contact measurement techniques at these scales. This paper compares three techniques: a laser profile scanner, Structure from Motion photogrammetry and Terrestrial Laser Scanning (TLS). We use spectral analysis, areal surface texture parameters and 2D cross-correlation analysis to evaluate the suitability of each approach for characterising and monitoring pavement macrotexture. The results show that SfM can produce successful measures of the areal root mean square height (Sq), which represents pavement texture depth and is positively correlated with skid resistance. Significant noise in the TLS data prevented agreement with the laser profiler but we show that new filtering procedures result in significantly improved values for the peak density (Spd) and the arithmetic peak mean curvature (Spc), which together define the shape and distribution of pavement aggregates forming macrotexture. However, filtering the TLS data results in a trade-off with vertical accuracy, thus altering the reliability of Sq. Finally, we show the functional areal parameters Spd and Spc are sensitive to sample size. This means that pavement specimen size of 150 mm × 150 mm or smaller, when used in laboratory or field observations, are inadequate to capture the true value of areal surface texture parameters. The deployment of wider scale approaches such as SfM and spectrally filtered TLS are required in order to successfully capture the functional areal parameters (Spc and Spd) for road surfaces

    Map-Based Localization for Unmanned Aerial Vehicle Navigation

    Get PDF
    Unmanned Aerial Vehicles (UAVs) require precise pose estimation when navigating in indoor and GNSS-denied / GNSS-degraded outdoor environments. The possibility of crashing in these environments is high, as spaces are confined, with many moving obstacles. There are many solutions for localization in GNSS-denied environments, and many different technologies are used. Common solutions involve setting up or using existing infrastructure, such as beacons, Wi-Fi, or surveyed targets. These solutions were avoided because the cost should be proportional to the number of users, not the coverage area. Heavy and expensive sensors, for example a high-end IMU, were also avoided. Given these requirements, a camera-based localization solution was selected for the sensor pose estimation. Several camera-based localization approaches were investigated. Map-based localization methods were shown to be the most efficient because they close loops using a pre-existing map, thus the amount of data and the amount of time spent collecting data are reduced as there is no need to re-observe the same areas multiple times. This dissertation proposes a solution to address the task of fully localizing a monocular camera onboard a UAV with respect to a known environment (i.e., it is assumed that a 3D model of the environment is available) for the purpose of navigation for UAVs in structured environments. Incremental map-based localization involves tracking a map through an image sequence. When the map is a 3D model, this task is referred to as model-based tracking. A by-product of the tracker is the relative 3D pose (position and orientation) between the camera and the object being tracked. State-of-the-art solutions advocate that tracking geometry is more robust than tracking image texture because edges are more invariant to changes in object appearance and lighting. However, model-based trackers have been limited to tracking small simple objects in small environments. An assessment was performed in tracking larger, more complex building models, in larger environments. A state-of-the art model-based tracker called ViSP (Visual Servoing Platform) was applied in tracking outdoor and indoor buildings using a UAVs low-cost camera. The assessment revealed weaknesses at large scales. Specifically, ViSP failed when tracking was lost, and needed to be manually re-initialized. Failure occurred when there was a lack of model features in the cameras field of view, and because of rapid camera motion. Experiments revealed that ViSP achieved positional accuracies similar to single point positioning solutions obtained from single-frequency (L1) GPS observations standard deviations around 10 metres. These errors were considered to be large, considering the geometric accuracy of the 3D model used in the experiments was 10 to 40 cm. The first contribution of this dissertation proposes to increase the performance of the localization system by combining ViSP with map-building incremental localization, also referred to as simultaneous localization and mapping (SLAM). Experimental results in both indoor and outdoor environments show sub-metre positional accuracies were achieved, while reducing the number of tracking losses throughout the image sequence. It is shown that by integrating model-based tracking with SLAM, not only does SLAM improve model tracking performance, but the model-based tracker alleviates the computational expense of SLAMs loop closing procedure to improve runtime performance. Experiments also revealed that ViSP was unable to handle occlusions when a complete 3D building model was used, resulting in large errors in its pose estimates. The second contribution of this dissertation is a novel map-based incremental localization algorithm that improves tracking performance, and increases pose estimation accuracies from ViSP. The novelty of this algorithm is the implementation of an efficient matching process that identifies corresponding linear features from the UAVs RGB image data and a large, complex, and untextured 3D model. The proposed model-based tracker improved positional accuracies from 10 m (obtained with ViSP) to 46 cm in outdoor environments, and improved from an unattainable result using VISP to 2 cm positional accuracies in large indoor environments. The main disadvantage of any incremental algorithm is that it requires the camera pose of the first frame. Initialization is often a manual process. The third contribution of this dissertation is a map-based absolute localization algorithm that automatically estimates the camera pose when no prior pose information is available. The method benefits from vertical line matching to accomplish a registration procedure of the reference model views with a set of initial input images via geometric hashing. Results demonstrate that sub-metre positional accuracies were achieved and a proposed enhancement of conventional geometric hashing produced more correct matches - 75% of the correct matches were identified, compared to 11%. Further the number of incorrect matches was reduced by 80%

    Review of Automatic Processing of Topography and Surface Feature Identification LiDAR Data Using Machine Learning Techniques

    Get PDF
    Machine Learning (ML) applications on Light Detection And Ranging (LiDAR) data have provided promising results and thus this topic has been widely addressed in the literature during the last few years. This paper reviews the essential and the more recent completed studies in the topography and surface feature identification domain. Four areas, with respect to the suggested approaches, have been analyzed and discussed: the input data, the concepts of point cloud structure for applying ML, the ML techniques used, and the applications of ML on LiDAR data. Then, an overview is provided to underline the advantages and the disadvantages of this research axis. Despite the training data labelling problem, the calculation cost, and the undesirable shortcutting due to data downsampling, most of the proposed methods use supervised ML concepts to classify the downsampled LiDAR data. Furthermore, despite the occasional highly accurate results, in most cases the results still require filtering. In fact, a considerable number of adopted approaches use the same data structure concepts employed in image processing to profit from available informatics tools. Knowing that the LiDAR point clouds represent rich 3D data, more effort is needed to develop specialized processing tools

    Perceptually Optimized Visualization on Autostereoscopic 3D Displays

    Get PDF
    The family of displays, which aims to visualize a 3D scene with realistic depth, are known as "3D displays". Due to technical limitations and design decisions, such displays create visible distortions, which are interpreted by the human vision as artefacts. In absence of visual reference (e.g. the original scene is not available for comparison) one can improve the perceived quality of the representations by making the distortions less visible. This thesis proposes a number of signal processing techniques for decreasing the visibility of artefacts on 3D displays. The visual perception of depth is discussed, and the properties (depth cues) of a scene which the brain uses for assessing an image in 3D are identified. Following the physiology of vision, a taxonomy of 3D artefacts is proposed. The taxonomy classifies the artefacts based on their origin and on the way they are interpreted by the human visual system. The principles of operation of the most popular types of 3D displays are explained. Based on the display operation principles, 3D displays are modelled as a signal processing channel. The model is used to explain the process of introducing distortions. It also allows one to identify which optical properties of a display are most relevant to the creation of artefacts. A set of optical properties for dual-view and multiview 3D displays are identified, and a methodology for measuring them is introduced. The measurement methodology allows one to derive the angular visibility and crosstalk of each display element without the need for precision measurement equipment. Based on the measurements, a methodology for creating a quality profile of 3D displays is proposed. The quality profile can be either simulated using the angular brightness function or directly measured from a series of photographs. A comparative study introducing the measurement results on the visual quality and position of the sweet-spots of eleven 3D displays of different types is presented. Knowing the sweet-spot position and the quality profile allows for easy comparison between 3D displays. The shape and size of the passband allows depth and textures of a 3D content to be optimized for a given 3D display. Based on knowledge of 3D artefact visibility and an understanding of distortions introduced by 3D displays, a number of signal processing techniques for artefact mitigation are created. A methodology for creating anti-aliasing filters for 3D displays is proposed. For multiview displays, the methodology is extended towards so-called passband optimization which addresses Moiré, fixed-pattern-noise and ghosting artefacts, which are characteristic for such displays. Additionally, design of tuneable anti-aliasing filters is presented, along with a framework which allows the user to select the so-called 3d sharpness parameter according to his or her preferences. Finally, a set of real-time algorithms for view-point-based optimization are presented. These algorithms require active user-tracking, which is implemented as a combination of face and eye-tracking. Once the observer position is known, the image on a stereoscopic display is optimised for the derived observation angle and distance. For multiview displays, the combination of precise light re-direction and less-precise face-tracking is used for extending the head parallax. For some user-tracking algorithms, implementation details are given, regarding execution of the algorithm on a mobile device or on desktop computer with graphical accelerator

    Surfaces from the visual past : recovering high-resolution terrain data from historic aerial imagery for multitemporal landscape analysis

    No full text
    Historic aerial images are invaluable sources of aid to archaeological research. Often collected with large-format photogrammetric quality cameras, these images are potential archives of multidimensional data that can be used to recover information about historic landscapes that have been lost to modern development. However, a lack of camera information for many historic images coupled with physical degradation of their media has often made it difficult to compute geometrically rigorous 3D content from such imagery. While advances in photogrammetry and computer vision over the last two decades have made possible the extraction of accurate and detailed 3D topographical data from high-quality digital images emanating from uncalibrated or unknown cameras, the target source material for these algorithms is normally digital content and thus not negatively affected by the passage of time. In this paper, we present refinements to a computer vision-based workflow for the extraction of 3D data from historic aerial imagery, using readily available software, specific image preprocessing techniques and in-field measurement observations to mitigate some shortcomings of archival imagery and improve extraction of historical digital elevation models (hDEMs) for use in landscape archaeological research. We apply the developed method to a series of historic image sets and modern topographic data covering a period of over 70 years in western Sicily (Italy) and evaluate the outcome. The resulting series of hDEMs form a temporal data stack which is compared with modern high-resolution terrain data using a geomorphic change detection approach, providing a quantification of landscape change through time in extent and depth, and the impact of this change on archaeological resources

    Modeling and Simulation in Engineering

    Get PDF
    This book provides an open platform to establish and share knowledge developed by scholars, scientists, and engineers from all over the world, about various applications of the modeling and simulation in the design process of products, in various engineering fields. The book consists of 12 chapters arranged in two sections (3D Modeling and Virtual Prototyping), reflecting the multidimensionality of applications related to modeling and simulation. Some of the most recent modeling and simulation techniques, as well as some of the most accurate and sophisticated software in treating complex systems, are applied. All the original contributions in this book are jointed by the basic principle of a successful modeling and simulation process: as complex as necessary, and as simple as possible. The idea is to manipulate the simplifying assumptions in a way that reduces the complexity of the model (in order to make a real-time simulation), but without altering the precision of the results

    A Novel Inpainting Framework for Virtual View Synthesis

    Get PDF
    Multi-view imaging has stimulated significant research to enhance the user experience of free viewpoint video, allowing interactive navigation between views and the freedom to select a desired view to watch. This usually involves transmitting both textural and depth information captured from different viewpoints to the receiver, to enable the synthesis of an arbitrary view. In rendering these virtual views, perceptual holes can appear due to certain regions, hidden in the original view by a closer object, becoming visible in the virtual view. To provide a high quality experience these holes must be filled in a visually plausible way, in a process known as inpainting. This is challenging because the missing information is generally unknown and the hole-regions can be large. Recently depth-based inpainting techniques have been proposed to address this challenge and while these generally perform better than non-depth assisted methods, they are not very robust and can produce perceptual artefacts. This thesis presents a new inpainting framework that innovatively exploits depth and textural self-similarity characteristics to construct subjectively enhanced virtual viewpoints. The framework makes three significant contributions to the field: i) the exploitation of view information to jointly inpaint textural and depth hole regions; ii) the introduction of the novel concept of self-similarity characterisation which is combined with relevant depth information; and iii) an advanced self-similarity characterising scheme that automatically determines key spatial transform parameters for effective and flexible inpainting. The presented inpainting framework has been critically analysed and shown to provide superior performance both perceptually and numerically compared to existing techniques, especially in terms of lower visual artefacts. It provides a flexible robust framework to develop new inpainting strategies for the next generation of interactive multi-view technologies
    • …
    corecore