1,436 research outputs found

    Diagnostic-robust statistical analysis for Local Surface Fitting in 3D Point Cloud Data

    Get PDF
    Objectives: Surface reconstruction and fitting for geometric primitives and three Dimensional (3D) modeling is a fundamental task in the field of photogrammetry and reverse engineering. However it is impractical to get point cloud data without outliers/noise being present. The noise in the data acquisition process induces rough and uneven surfaces, and reduces the precision/accuracy of the acquired model. This paper investigates the problem of local surface reconstruction and best fitting from unorganized outlier contaminated 3D point cloud data. Methods: Least Squares (LS) method, Principal Component Analysis (PCA) and RANSAC are the three most popular techniques for fitting planar surfaces to 2D and 3D data. All three methods are affected by outliers and do not give reliable and robust parameter estimation. In the statistics literature, robust techniques and outlier diagnostics are two complementary approaches but any one alone is not sufficient for outlier detection and robust parameter estimation. We propose a diagnostic-robust statistical algorithm that uses both approaches in combination for fitting planar surfaces in the presence of outliers.Robust distance is used as a multivariate diagnostic technique for outlier detection and robust PCA is used as an outlier resistant technique for plane fitting. The robust distance is the robustification of the well-known Mohalanobis distance by using the recently introduced high breakdown Minimum Covariance Determinant (MCD) location and scatter estimates. The classical PCA measures data variability through the variance and the corresponding directions are the latent vectors which are sensitive to outlying observations. In contrast, the robust PCA which combines the 'projection pursuit' approach with a robust scatter matrix based on the MCD of the covariance matrix, is robust with outlying observations in the dataset. In addition, robust PCA produces graphical displays of orthogonal distance and score distance as the by-products which can detects outliers and aids better robust fitting by using robust PCA for a second time in the final plane fitting stage. In summary, the proposed method removes the outliers first and then fits the local surface in a robust way.Results and conclusions: We present a new diagnostic-robust statistical technique for local surface fitting in 3D point cloud data. Finally, the benefits of the new diagnostic-robust algorithm are demonstrated through an artificial dataset and several terrestrial mobile mapping laser scanning point cloud datasets. Comparative results show that the classical LS and PCA methods are very sensitive to outliers and failed to reliably fit planes. The RANSAC algorithm is not completely free from the effect of outliers and requires more processing time for large datasets. The proposed method smooths away noise and is significantly better and efficient than the other three methods for local planar surface fitting even in the presence of roughness. This method is applicable for 3D straight line fitting as well and has great potential for local normal estimation and different types of surface fitting

    Challenges of Big Data Analysis

    Full text link
    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

    Robust statistical approaches for local planar surface fitting in 3D laser scanning data

    Get PDF
    This paper proposes robust methods for local planar surface fitting in 3D laser scanning data. Searching through the literature revealed that many authors frequently used Least Squares (LS) and Principal Component Analysis (PCA) for point cloud processing without any treatment of outliers. It is known that LS and PCA are sensitive to outliers and can give inconsistent and misleading estimates. RANdom SAmple Consensus (RANSAC) is one of the most well-known robust methods used for model fitting when noise and/or outliers are present. We concentrate on the recently introduced Deterministic Minimum Covariance Determinant estimator and robust PCA, and propose two variants of statistically robust algorithms for fitting planar surfaces to 3D laser scanning point cloud data. The performance of the proposed robust methods is demonstrated by qualitative and quantitative analysis through several synthetic and mobile laser scanning 3D data sets for different applications. Using simulated data, and comparisons with LS, PCA, RANSAC, variants of RANSAC and other robust statistical methods, we demonstrate that the new algorithms are significantly more efficient, faster, and produce more accurate fits and robust local statistics (e.g. surface normals), necessary for many point cloud processing tasks.Consider one example data set used consisting of 100 points with 20% outliers representing a plane. The proposed methods called DetRD-PCA and DetRPCA, produce bias angles (angle between the fitted planes with and without outliers) of 0.20° and 0.24° respectively, whereas LS, PCA and RANSAC produce worse bias angles of 52.49°, 39.55° and 0.79° respectively. In terms of speed, DetRD-PCA takes 0.033 s on average for fitting a plane, which is approximately 6.5, 25.4 and 25.8 times faster than RANSAC, and two other robust statistical methods, respectively. The estimated robust surface normals and curvatures from the new methods have been used for plane fitting, sharp feature preservation and segmentation in 3D point clouds obtained from laser scanners. The results are significantly better and more efficiently computed than those obtained by existing methods

    A procedure for robust estimation and diagnostics in regression

    Get PDF
    We propose a new procedure for computing an approximation to regression estimates based on the minimization of a robust scale. The procedure can be applied with a large number of independent variables where the usual methods based on resampling require an unfeasible or extremely costly computer time. An important advantage of the procedure is that it can be incorporated in any high breakdown procedure and improve it with just a few seconds of computer time. The procedure minimizes the robust scale over a set of tentative parameter vectors. Each of these parameter vector is obtained as follows. We represent each data point by the vector of changes of the least squares forecasts of that observation, when each of the observations is deleted. Then the sets of possible outliers are obtained as the extreme points of the principal components of these vectors, or as the set of points with large residuals. The good performance of the procedure allows the identification of multiple outliers avoiding masking effects. The efficiency of the procedure for robust estimation and its power as an outlier detection tool are investigated in a simulation study and some examples

    Factor PD-Clustering

    Full text link
    Factorial clustering methods have been developed in recent years thanks to the improving of computational power. These methods perform a linear transformation of data and a clustering on transformed data optimizing a common criterion. Factorial PD-clustering is based on Probabilistic Distance clustering (PD-clustering). PD-clustering is an iterative, distribution free, probabilistic, clustering method. Factor PD-clustering make a linear transformation of original variables into a reduced number of orthogonal ones using a common criterion with PD-Clustering. It is demonstrated that Tucker 3 decomposition allows to obtain this transformation. Factor PD-clustering makes alternatively a Tucker 3 decomposition and a PD-clustering on transformed data until convergence. This method could significantly improve the algorithm performance and allows to work with large dataset, to improve the stability and the robustness of the method

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    Sensitivity of principal Hessian direction analysis

    Full text link
    We provide sensitivity comparisons for two competing versions of the dimension reduction method principal Hessian directions (pHd). These comparisons consider the effects of small perturbations on the estimation of the dimension reduction subspace via the influence function. We show that the two versions of pHd can behave completely differently in the presence of certain observational types. Our results also provide evidence that outliers in the traditional sense may or may not be highly influential in practice. Since influential observations may lurk within otherwise typical data, we consider the influence function in the empirical setting for the efficient detection of influential observations in practice.Comment: Published at http://dx.doi.org/10.1214/07-EJS064 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore