37 research outputs found

    Robust statistical approaches for feature extraction in laser scanning 3D point cloud data

    Get PDF
    Three dimensional point cloud data acquired from mobile laser scanning system commonly contain outliers and/or noise. The presence of outliers and noise means most of the frequently used methods for feature extraction produce inaccurate and non-robust results. We investigate the problems of outliers and how to accommodate them for automatic robust feature extraction. This thesis develops algorithms for outlier detection, point cloud denoising, robust feature extraction, segmentation and ground surface extraction

    Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

    Get PDF
    Logistic regression is well known to the data mining research community as a tool for modeling and classification. The presence of outliers is an unavoidable phenomenon in data analysis. Detection of outliers is important to increase the accuracy of the required estimates and for reliable knowledge discovery from the underlying databases. Most of the existing outlier detection methods in regression analysis are based on the single case deletion approach that is inefficient in the presence of multiple outliers because of the well known masking and swamping effects. To avoid these effects the multiple case deletion approach has been introduced. We propose a group deletion approach based diagnostic measure for identifying multiple influential observations in logistic regression. At the same time we introduce a plotting technique that can classify data into outliers, high leverage points, as well as influential and regular observations. This paper has two objectives. First, it investigates the problems of outlier detection in logistic regression, proposes a new method that can find multiple influential observations, and classifies the types of outlier. Secondly, it shows the necessity for proper identification of outliers and influential observations as a prelude for reliable knowledge discovery from modeling and classification via logistic regression. We demonstrate the efficiency of our method, compare the performance with the existing popular diagnostic methods, and explore the necessity of outlier detection for reliability and robustness in modeling and classification by using real datasets

    Robust segmentation in laser scanning 3D point cloud data

    Get PDF
    Segmentation is a most important intermediate step in point cloud data processing and understanding. Covariance statistics based local saliency features from Principal Component Analysis (PCA) are frequently used for point cloud segmentation. However it is well known that PCA is sensitive to outliers. Hence segmentation results can be erroneous and unreliable. The problems of surface segmentation in laser scanning point cloud data are investigated in this paper. We propose a region growing based statistically robust segmentation algorithm that uses a recently introduced fast Minimum Covariance Determinant (MCD) based robust PCA approach. Experiments for several real laser scanning datasets show that PCA gives unreliable and non-robust results whereas the proposed robust PCA based method has intrinsic ability to deal with noisy data and gives more accurate and robust results for planar and non planar smooth surface segmentation

    Diagnostic-robust statistical analysis for Local Surface Fitting in 3D Point Cloud Data

    Get PDF
    Objectives: Surface reconstruction and fitting for geometric primitives and three Dimensional (3D) modeling is a fundamental task in the field of photogrammetry and reverse engineering. However it is impractical to get point cloud data without outliers/noise being present. The noise in the data acquisition process induces rough and uneven surfaces, and reduces the precision/accuracy of the acquired model. This paper investigates the problem of local surface reconstruction and best fitting from unorganized outlier contaminated 3D point cloud data. Methods: Least Squares (LS) method, Principal Component Analysis (PCA) and RANSAC are the three most popular techniques for fitting planar surfaces to 2D and 3D data. All three methods are affected by outliers and do not give reliable and robust parameter estimation. In the statistics literature, robust techniques and outlier diagnostics are two complementary approaches but any one alone is not sufficient for outlier detection and robust parameter estimation. We propose a diagnostic-robust statistical algorithm that uses both approaches in combination for fitting planar surfaces in the presence of outliers.Robust distance is used as a multivariate diagnostic technique for outlier detection and robust PCA is used as an outlier resistant technique for plane fitting. The robust distance is the robustification of the well-known Mohalanobis distance by using the recently introduced high breakdown Minimum Covariance Determinant (MCD) location and scatter estimates. The classical PCA measures data variability through the variance and the corresponding directions are the latent vectors which are sensitive to outlying observations. In contrast, the robust PCA which combines the 'projection pursuit' approach with a robust scatter matrix based on the MCD of the covariance matrix, is robust with outlying observations in the dataset. In addition, robust PCA produces graphical displays of orthogonal distance and score distance as the by-products which can detects outliers and aids better robust fitting by using robust PCA for a second time in the final plane fitting stage. In summary, the proposed method removes the outliers first and then fits the local surface in a robust way.Results and conclusions: We present a new diagnostic-robust statistical technique for local surface fitting in 3D point cloud data. Finally, the benefits of the new diagnostic-robust algorithm are demonstrated through an artificial dataset and several terrestrial mobile mapping laser scanning point cloud datasets. Comparative results show that the classical LS and PCA methods are very sensitive to outliers and failed to reliably fit planes. The RANSAC algorithm is not completely free from the effect of outliers and requires more processing time for large datasets. The proposed method smooths away noise and is significantly better and efficient than the other three methods for local planar surface fitting even in the presence of roughness. This method is applicable for 3D straight line fitting as well and has great potential for local normal estimation and different types of surface fitting

    A discordance analysis in manual labelling of urban mobile laser scanning data used for deep learning based semantic segmentation

    Get PDF
    Labelled point clouds are crucial to train supervised Deep Learning (DL) methods used for semantic segmentation. The objective of this research is to quantify discordances between the labels made by different people in order to assess whether such discordances can influence the success rates of a DL based semantic segmentation algorithm. An urban point cloud of 30 m road length in Santiago de Compostela (Spain) was labelled two times by ten persons. Discordances and its significance in manual labelling between individuals and rounds were calculated. In addition, a ratio test to signify discordance and concordance was proposed. Results show that most of the points were labelled accordingly with the same class by all the people. However, there were many points that were labelled with two or more classes. Class curb presented 5.9% of discordant points and 3.2 discordances for each point with concordance by all people. In addition, the percentage of significative labelling differences of the class curb was 86.7% comparing all the people in the same round and 100% comparing rounds of each person. Analysing the semantic segmentation results with a DL based algorithm, PointNet++, the percentage of concordance points are related with F-score value in R2 = 0.765, posing that manual labelling has significant impact on results of DL-based semantic segmentation methods.Xunta de Galicia | Ref. ED481B-2019-061Ministerio de Ciencia e Innovación | Ref. PID2019-105221RB-C43Ministère de l’Economie of the G. D. of Luxembourg | Ref. SOLSTICE 2019-05-030-24Universidade de Vigo/CISU

    Resampling methods for a reliable validation set in deep learning based point cloud classification

    Get PDF
    A validation data set plays a pivotal role in tweaking a machine learning model trained in a supervised manner. Many existing algorithms select a part of available data by using random sampling to produce a validation set. However, this approach can be prone to overfitting. One should follow careful data splitting to have reliable training and validation sets that can produce a generalized model with a good performance for the unseen (test) data. Data splitting based on resampling techniques involves repeatedly drawing samples from the available data. Hence, resampling methods can give better generalization power to a model, because they can produce and use many training and/or validation sets. These techniques are computationally expensive, but with increasingly available high-performance computing facilities, one can exploit them. Though a multitude of resampling methods exist, investigation of their influence on the generality of deep learning (DL) algorithms is limited due to its non-linear black-box nature. This paper contributes by: (1) investigating the generalization capability of the four most popular resampling methods: k-fold cross-validation (k-CV), repeated k-CV (Rk-CV), Monte Carlo CV (MC-CV) and bootstrap for creating training and validation data sets used for developing, training and validating DL based point cloud classifiers (e.g., PointNet; Qi et al., 2017a), (2) justifying Mean Square Error (MSE) as a statistically consistent estimator, and (3) exploring the use of MSE as a reliable performance metric for supervised DL. Experiments in this paper are performed on both synthetic and real-world aerial laser scanning (ALS) point clouds

    A TWO-STEP FEATURE EXTRACTION ALGORITHM: APPLICATION TO DEEP LEARNING FOR POINT CLOUD CLASSIFICATION

    Get PDF
    Most deep learning (DL) methods that are not end-to-end use several multi-scale and multi-type hand-crafted features that make the network challenging, more computationally intensive and vulnerable to overfitting. Furthermore, reliance on empirically-based feature dimensionality reduction may lead to misclassification. In contrast, efficient feature management can reduce storage and computational complexities, builds better classifiers, and improves overall performance. Principal Component Analysis (PCA) is a well-known dimension reduction technique that has been used for feature extraction. This paper presents a two-step PCA based feature extraction algorithm that employs a variant of feature-based PointNet (Qi et al., 2017a) for point cloud classification. This paper extends the PointNet framework for use on large-scale aerial LiDAR data, and contributes by (i) developing a new feature extraction algorithm, (ii) exploring the impact of dimensionality reduction in feature extraction, and (iii) introducing a non-end-to-end PointNet variant for per point classification in point clouds. This is demonstrated on aerial laser scanning (ALS) point clouds. The algorithm successfully reduces the dimension of the feature space without sacrificing performance, as benchmarked against the original PointNet algorithm. When tested on the well-known Vaihingen data set, the proposed algorithm achieves an Overall Accuracy (OA) of 74.64% by using 9 input vectors and 14 shape features, whereas with the same 9 input vectors and only 5PCs (principal components built by the 14 shape features) it actually achieves a higher OA of 75.36% which demonstrates the effect of efficient dimensionality reduction

    Robust Approach for Urban Road Surface Extraction Using Mobile Laser Scanning Data

    Get PDF
    Road surface extraction is crucial for 3D city analysis. Mobile laser scanning (MLS) is the most appropriate data acquisition system for the road environment because of its efficient vehicle-based on-road scanning opportunity. Many methods are available for road pavement, curb and roadside way extraction. Most of them use classical approaches that do not mitigate problems caused by the presence of noise and outliers. In practice, however, laser scanning point clouds are not free from noise and outliers, and it is apparent that the presence of a very small portion of outliers and noise can produce unreliable and non-robust results. A road surface usually consists of three key parts: road pavement, curb and roadside way. This paper investigates the problem of road surface extraction in the presence of noise and outliers, and proposes a robust algorithm for road pavement, curb, road divider/islands, and roadside way extraction using MLS point clouds. The proposed algorithm employs robust statistical approaches to remove the consequences of the presence of noise and outliers. It consists of five sequential steps for road ground and non-ground surface separation, and road related components determination. Demonstration on two different MLS data sets shows that the new algorithm is efficient for road surface extraction and for classifying road pavement, curb, road divider/island and roadside way. The success can be rated in one experiment in this paper, where we extract curb points; the results achieve 97.28%, 100% and 0.986 of precision, recall and Matthews correlation coefficient, respectively
    corecore