299 research outputs found

    Adaptive Methods for Point Cloud and Mesh Processing

    Get PDF
    Point clouds and 3D meshes are widely used in numerous applications ranging from games to virtual reality to autonomous vehicles. This dissertation proposes several approaches for noise removal and calibration of noisy point cloud data and 3D mesh sharpening methods. Order statistic filters have been proven to be very successful in image processing and other domains as well. Different variations of order statistics filters originally proposed for image processing are extended to point cloud filtering in this dissertation. A brand-new adaptive vector median is proposed in this dissertation for removing noise and outliers from noisy point cloud data. The major contributions of this research lie in four aspects: 1) Four order statistic algorithms are extended, and one adaptive filtering method is proposed for the noisy point cloud with improved results such as preserving significant features. These methods are applied to standard models as well as synthetic models, and real scenes, 2) A hardware acceleration of the proposed method using Microsoft parallel pattern library for filtering point clouds is implemented using multicore processors, 3) A new method for aerial LIDAR data filtering is proposed. The objective is to develop a method to enable automatic extraction of ground points from aerial LIDAR data with minimal human intervention, and 4) A novel method for mesh color sharpening using the discrete Laplace-Beltrami operator is proposed. Median and order statistics-based filters are widely used in signal processing and image processing because they can easily remove outlier noise and preserve important features. This dissertation demonstrates a wide range of results with median filter, vector median filter, fuzzy vector median filter, adaptive mean, adaptive median, and adaptive vector median filter on point cloud data. The experiments show that large-scale noise is removed while preserving important features of the point cloud with reasonable computation time. Quantitative criteria (e.g., complexity, Hausdorff distance, and the root mean squared error (RMSE)), as well as qualitative criteria (e.g., the perceived visual quality of the processed point cloud), are employed to assess the performance of the filters in various cases corrupted by different noisy models. The adaptive vector median is further optimized for denoising or ground filtering aerial LIDAR data point cloud. The adaptive vector median is also accelerated on multi-core CPUs using Microsoft Parallel Patterns Library. In addition, this dissertation presents a new method for mesh color sharpening using the discrete Laplace-Beltrami operator, which is an approximation of second order derivatives on irregular 3D meshes. The one-ring neighborhood is utilized to compute the Laplace-Beltrami operator. The color for each vertex is updated by adding the Laplace-Beltrami operator of the vertex color weighted by a factor to its original value. Different discretizations of the Laplace-Beltrami operator have been proposed for geometrical processing of 3D meshes. This work utilizes several discretizations of the Laplace-Beltrami operator for sharpening 3D mesh colors and compares their performance. Experimental results demonstrated the effectiveness of the proposed algorithms

    A review on deep learning techniques for 3D sensed data classification

    Get PDF
    Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches including; RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.Comment: 25 pages, 9 figures. Review pape

    Radar and RGB-depth sensors for fall detection: a review

    Get PDF
    This paper reviews recent works in the literature on the use of systems based on radar and RGB-Depth (RGB-D) sensors for fall detection, and discusses outstanding research challenges and trends related to this research field. Systems to detect reliably fall events and promptly alert carers and first responders have gained significant interest in the past few years in order to address the societal issue of an increasing number of elderly people living alone, with the associated risk of them falling and the consequences in terms of health treatments, reduced well-being, and costs. The interest in radar and RGB-D sensors is related to their capability to enable contactless and non-intrusive monitoring, which is an advantage for practical deployment and users’ acceptance and compliance, compared with other sensor technologies, such as video-cameras, or wearables. Furthermore, the possibility of combining and fusing information from The heterogeneous types of sensors is expected to improve the overall performance of practical fall detection systems. Researchers from different fields can benefit from multidisciplinary knowledge and awareness of the latest developments in radar and RGB-D sensors that this paper is discussing

    Impact of Camera Viewing Angle for Estimating Leaf Parameters of Wheat Plants from 3D Point Clouds

    Get PDF
    Estimation of plant canopy using low-altitude imagery can help monitor the normal growth status of crops and is highly beneficial for various digital farming applications such as precision crop protection. However, extracting 3D canopy information from raw images requires studying the effect of sensor viewing angle by taking into accounts the limitations of the mobile platform routes inside the field. The main objective of this research was to estimate wheat (Triticum aestivum L.) leaf parameters, including leaf length and width, from the 3D model representation of the plants. For this purpose, experiments with different camera viewing angles were conducted to find the optimum setup of a mono-camera system that would result in the best 3D point clouds. The angle-control analytical study was conducted on a four-row wheat plot with a row spacing of 0.17 m and with two seeding densities and growth stages as factors. Nadir and six oblique view image datasets were acquired from the plot with 88% overlapping and were then reconstructed to point clouds using Structure from Motion (SfM) and Multi-View Stereo (MVS) methods. Point clouds were first categorized into three classes as wheat canopy, soil background, and experimental plot. The wheat canopy class was then used to extract leaf parameters, which were then compared with those values from manual measurements. The comparison between results showed that (i) multiple-view dataset provided the best estimation for leaf length and leaf width, (ii) among the single-view dataset, canopy, and leaf parameters were best modeled with angles vertically at -45⸰_ and horizontally at 0⸰_ (VA -45, HA 0), while (iii) in nadir view, fewer underlying 3D points were obtained with a missing leaf rate of 70%. It was concluded that oblique imagery is a promising approach to effectively estimate wheat canopy 3D representation with SfM-MVS using a single camera platform for crop monitoring. This study contributes to the improvement of the proximal sensing platform for crop health assessment. © 2021 by the authors. Licensee MDPI, Basel, Switzerland

    An investigation into common challenges of 3D scene understanding in visual surveillance

    Get PDF
    Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics – methods to extract optimal features for a specific CCTV application – as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object “classification” and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a scene’s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future

    Performance evaluation of depth completion neural networks for various RGB-D camera technologies

    Get PDF
    Le telecamere RGB-D sono dispositivi utilizzati oggi in vari applicazioni e settori di ricerca che riguardano e richiedono una conoscenza tridimensionale dell'ambiente, espressa come un'immagine di profondità dove ciascun pixel rappresenta la distanza dalla telecamera dell'oggetto a cui appartiene. Le tecniche di acquisizione più diffuse includono la stereoscopia attiva, che triangola due immagini da due punti diversi della telecamera, e le telecamere a luce strutturata, che fanno lo stesso con un'immagine della telecamera e un proiettore laser. Un'altra tecnologia popolare che non richiede la triangolazione, utilizzata nelle telecamere LiDAR, è il ToF (Time of Flight): il rilevamento della profondità si basa sul tempo di ricezione di un segnale emesso, ad esempio un segnale IR, in tutto il campo visivo della telecamera. Le maggiori difficoltà riscontrate con l'uso delle telecamere RGB-D si basano sull'ambiente di acquisizione delle immagini e sulle caratteristiche della telecamera stessa: la presenza di bordi e variazioni nelle condizioni di illuminazione possono portare a mappe di profondità rumorose o incomplete, con un impatto negativo sulle prestazioni delle applicazioni di computer vision e robotica che si basano su informazioni precise sull'immagine di profondità. Negli ultimi anni sono state proposte diverse tecniche di miglioramento della profondità, tra cui l'uso di reti neurali per il completamento dell'immagine di profondità. L'obiettivo del completamento della profondità è quello di generare una previsione di profondità densa, quindi continua sull'intera immagine, a partire dalla conoscenza dell'immagine RGB e dell'immagine grezza di profondità acquisita dal sensore RGB-D. I metodi di completamento della profondità utilizzano input RGB e di profondità grezzi attraverso la tecnologia encoder-decoder, con aggiornamenti recenti che utilizzano processi di raffinazione ed informazioni aggiuntive come i dati semantici per migliorare la precisione ed analizzare i bordi degli oggetti. Tuttavia, gli unici metodi utilizzati al momento sono quelli che si basano su un piccolo campo recettivo, come le CNN e le reti di propagazione spaziale locale. Se ci sono zone di pixel non validi che sono troppo grandi, l'utilizzo di un campo ricettivo limitato presenta lo svantaggio di di produrre previsioni errate. In questa tesi viene proposta una valutazione delle prestazioni dell'attuale stato dell'arte del completamento delle immagini di profondità su uno scenario reale indoor. Per la valutazione sperimentale sono stati presi in considerazione diversi sensori RGB-D, evidenziando i pro e i contro delle diverse tecnologie per la misurazione della profondità con le telecamere. Le varie acquisizioni sono state effettuate in ambienti diversi e con telecamere che utilizzano tecnologie diverse per analizzare la criticità delle profondità ottenute prima direttamente con le telecamere e poi applicando le reti neurali allo stato dell'arte. Secondo i risultati di questo lavoro di tesi, le reti allo stato dell'arte non sono ancora abbastanza mature per essere utilizzate in scenari troppo diversi da quelli utilizzati nel rispettivo training. In particolare, sono state scoperte le seguenti limitazioni: per le reti testate con dati indoor, il training su dati outdoor è meno efficace di un approccio diretto basato su operatori morfologici.RGB-D cameras are devices that are used these days in various fields that benefit from the knowledge of depth in an image. The most popular acquisition techniques include active stereoscopic, which triangulates two camera views, and structured light cameras, which do the same with a camera image and a laser projector. Another popular technology that doesn’t require triangulation, used in LiDAR cameras, is ToF (Time of Flight): depth detection is based on the detection time of an emitted signal, such as an IR signal, throughout the camera’s Field of View. The major complexities encountered with the use of RGB-D cameras are based on the image acquisition environment and the camera characteristics themselves: poorly defined edges and variations in light conditions can lead to noisy or incomplete depth maps, which can negatively impact the performance of computer vision and robotics applications that rely on accurate depth information. Several depth enhancement techniques have been proposed in recent years, many of them making use of neural networks for depth completion. The goal of the depth completion task is to generate a dense depth prediction, continuous over the entire image, from knowledge of the RGB image and raw depth image acquired by the RGB-D sensor. Depth completion methods use RGB and sparse depth inputs through encoder-decoder technology, with recent upgrades using refinement and additional information such as semantic data to improve accuracy and analyze object edges and occluded items. However, the only methods used at this time are those that rely on a small receptive field, like CNNs and Local Spatial Propagation networks. If there are invalid pixel holes that are too big and lack a value in the depth map, this limited receptive field has the disadvantage of producing incorrect predictions. In this thesis, a performance evaluation of the current depth completion state-of-the-art on a real indoor scenario is proposed. Several RGB-D sensors have been taken into account for the experimental evaluation, highlighting the pros and cons of different technologies for depth measurements with cameras. The various acquisitions were carried out in different environments and with cameras using different technologies to analyze the criticality of the depths obtained first directly with the cameras and then applying the state-of-the-art depth completion networks. According to the findings of this thesis work, state-of-the-art networks are not yet mature enough to be used in scenarios that are too dissimilar from those used by the respective authors. We discovered the following limitations in particular: deep networks trained using outdoor scenes are not effective when analyzing indoor scenes. In such cases, a straightforward approach based on morphologic operators is more accurate

    The Use of Agricultural Robots in Orchard Management

    Full text link
    Book chapter that summarizes recent research on agricultural robotics in orchard management, including Robotic pruning, Robotic thinning, Robotic spraying, Robotic harvesting, Robotic fruit transportation, and future trends.Comment: 22 page

    A comparison of terrestrial laser scanning and structure-from-motion photogrammetry as methods for digital outcrop acquisition

    Get PDF
    Terrestrial laser scanning (TLS) has been used extensively in Earth Science for acquisition of digital outcrop data over the past decade. Structure-from-­motion (SfM) photogrammetry has recently emerged as an alternative and competing technology. The real-world performance of these technologies for ground-based digital outcrop acquisition is assessed using outcrops from North East England and the United Arab Emirates. Both TLS and SfM are via­ble methods, although no single technology is universally best suited to all situations. There are a range of practical considerations and operating conditions where each method has clear advantages. In comparison to TLS, SfM benefits from being lighter, more compact, cheaper, more easily replaced and repaired, with lower power requirements. TLS in comparison to SfM provides intrinsically validated data and more robust data acquisition in a wide range of operating conditions. Data post-processing is also swifter. The SfM data sets were found to contain systematic inaccuracies when compared to their TLS counterparts. These inaccuracies are related to the triangulation approach of the SfM, which is distinct from the time-of-flight principle employed by TLS. An elaborate approach is required for SfM to produce comparable results to TLS under most circumstances
    corecore