7,954 research outputs found

    Robust Statistics

    Get PDF
    The first example involves the real data given in Table 1 which are the results of an interlaboratory test. The boxplots are shown in Fig. 1 where the dotted line denotes the mean of the observations and the solid line the median. We note that only the results of the Laboratories 1 and 3 lie below the mean whereas all the remaining laboratories return larger values. In the case of the median, 7 of the readings coincide with the median, 24 readings are smaller and 24 are larger. A glance at Fig. 1 suggests that in the absence of further information the Laboratories 1 and 3 should be treated as outliers. This is the course which we recommend although the issues involved require careful thought. For the moment we note simply that the median is a robust statistic whereas the mean is not. --

    Partial robust M-regression.

    Get PDF
    Partial Least Squares (PLS) is a standard statistical method in chemometrics. It can be considered as an incomplete, or 'partial', version of the Least Squares estimator of regression, applicable when high or perfect multicollinearity is present in the predictor variables. The Least Squares estimator is well-known to be an optimal estimator for regression, but only when the error terms are normally distributed. In the absence of normality, and in particular when outliers are in the data set, other more robust regression estimators have better properties. In this paper a 'partial' version of M-regression estimators will be defined. If an appropriate weighting scheme is chosen, partial M-estimators become entirely robust to any type of outlying points, and are called Partial Robust M-estimators. It is shown that partial robust M-regression outperforms existing methods for robust PLS regression in terms of statistical precision and computational speed, while keeping good robustness properties. The method is applied to a data set consisting of EPXMA spectra of archaeological glass vessels. This data set contains several outliers, and the advantages of partial robust M-regression are illustrated. Applying partial robust M-regression yields much smaller prediction errors for noisy calibration samples than PLS. On the other hand, if the data follow perfectly well a normal model, the loss in efficiency to be paid for is very small.Advantages; Applications; Calibration; Data; Distribution; Efficiency; Estimator; Least-squares; M-estimators; Methods; Model; Optimal; Ordinary least squares; Outliers; Partial least squares; Precision; Prediction; Projection-pursuit; Regression; Robust regression; Robustness; Simulation; Spectometric quantization; Squares; Studies; Variables; Yield;

    Evaluating the Differences of Gridding Techniques for Digital Elevation Models Generation and Their Influence on the Modeling of Stony Debris Flows Routing: A Case Study From Rovina di Cancia Basin (North-Eastern Italian Alps)

    Get PDF
    Debris \ufb02ows are among the most hazardous phenomena in mountain areas. To cope with debris \ufb02ow hazard, it is common to delineate the risk-prone areas through routing models. The most important input to debris \ufb02ow routing models are the topographic data, usually in the form of Digital Elevation Models (DEMs). The quality of DEMs depends on the accuracy, density, and spatial distribution of the sampled points; on the characteristics of the surface; and on the applied gridding methodology. Therefore, the choice of the interpolation method affects the realistic representation of the channel and fan morphology, and thus potentially the debris \ufb02ow routing modeling outcomes. In this paper, we initially investigate the performance of common interpolation methods (i.e., linear triangulation, natural neighbor, nearest neighbor, Inverse Distance to a Power, ANUDEM, Radial Basis Functions, and ordinary kriging) in building DEMs with the complex topography of a debris \ufb02ow channel located in the Venetian Dolomites (North-eastern Italian Alps), by using small footprint full- waveform Light Detection And Ranging (LiDAR) data. The investigation is carried out through a combination of statistical analysis of vertical accuracy, algorithm robustness, and spatial clustering of vertical errors, and multi-criteria shape reliability assessment. After that, we examine the in\ufb02uence of the tested interpolation algorithms on the performance of a Geographic Information System (GIS)-based cell model for simulating stony debris \ufb02ows routing. In detail, we investigate both the correlation between the DEMs heights uncertainty resulting from the gridding procedure and that on the corresponding simulated erosion/deposition depths, both the effect of interpolation algorithms on simulated areas, erosion and deposition volumes, solid-liquid discharges, and channel morphology after the event. The comparison among the tested interpolation methods highlights that the ANUDEM and ordinary kriging algorithms are not suitable for building DEMs with complex topography. Conversely, the linear triangulation, the natural neighbor algorithm, and the thin-plate spline plus tension and completely regularized spline functions ensure the best trade-off among accuracy and shape reliability. Anyway, the evaluation of the effects of gridding techniques on debris \ufb02ow routing modeling reveals that the choice of the interpolation algorithm does not signi\ufb01cantly affect the model outcomes

    Forecasting Performance Of Cascade Forward Back Propagation Neural Network For Data With Outliers

    Get PDF
    Dalam kajian ini, satu rangkaian neural berasaskan pengelompokan telah dibangunkan untuk menyiasat dan membandingkan prestasinya dengan teknik-teknik pe- modelan lain bagi kes penyimpangan andaian berkaitan hubungan homoskedastik dalam set data In this research, a clustering based neural network was developed with the aim of investigating and comparing its performance with the performance of other model techniques in the case of deviation from the assumption of homoscedastic relation- ship in datase

    Predicción de los efectos coligativos en el sistema Agua + NaCl mediante Machine Learning

    Get PDF
    graficas, tablasThe use of traditional models such asthemodifiedDebye-Hückelmodel, the Pitzer model, MSE (MixedSolvent Electrolyte), or e-NRTL (Non-Random Two Liquid - Electrolyte) for predicting colligative effects in the Water + NaCl system is challenging. While these models have shown good results in terms of predictions, their statistical and computational implementation has required significant effort. On the other hand, certain Machine Learning algorithms have been studied for phase equilibrium prediction in systems with dissolved electrolytes. In this study, the implementation of three Machine Learning algorithms (Neural Networks, Least Squares Support Vector Machines, and Regression Decision Trees) was evaluated for predicting the decrease in melting temperature and saturation pressure of the Water + NaCl system. The results were compared with the prediction provided by an empirical variant of the Debye-Hückel model. Zero mean, normality, and residual independence tests were conducted for all models to statistically evaluate the regression results. It was found that machine learning models have the potential to predict colligative effects in electrolyte solutions, particularly the Regression Decision Tree model, which met all the assumptions studied for both effects and proved to be a reliable prediction tool. Finally, it was demonstrated that computationally, the implementation of machine learning models was straightforward, and their implementation for new studies in property prediction is a promising research area. (Texto tomado de la fuente)El uso de los modelos tradicionales como el modelo modificado de Debye-Hückel, el modelo de Pitzer, MSE (Mixed-Solvent Electrolyte) o e-NRTL (Non-Random Two Liquid - Electrolyte) para la predicción de los efectos coligativos del sistema Agua + NaC es díficil porque aunque han tenido buenos resultados en términos predicciones, su implementación de forma estadística y computacional ha requerido diferentes esfuerzos. Por otro lado, se ha estudiado la aplicación de algoritmos de Machine Learning para la predicción de equilibrios de fase en sistemas con electrolitos disueltos. En este trabajo se evaluó la implementación de 3 algoritmos de Machine Learning (Redes Neuronales, Máquinas de Soporte de Vectores de Mínimos Cuadrados y Árboles de Decisión de Regresión) para la predicción de la disminución en la temperatura de fusión y la presión de saturación del sistema Agua + NaCl. Los resultados se compararon con la predicción dada por una variante empírica del modelo de Debye-Hückel. Para todos los modelos se realizaron pruebas de media cero, normalidad e independencia de residuales con el objetivo de evaluar estadísticamente los resultados de regresión. Se comprobó que los modelos de aprendizaje de máquina tienen potencial para la predicción de los efectos coligativos de soluciones de electrolitos; especialmente se encontró que el modelo árbol de decisión de regresión cumplio con todos los supuestos estudiados para ambos efectos, y es una herramienta de precisión fiable. Finalmente, se mostró que computacionalmente los modelos de aprendizaje automático fueron sencillos de implementar y que su implementación para nuevos estudios en la predicción de propiedades es un área de estudios prometedora.MaestríaMagíster en Ingeniería - Ingeniería Químic

    Predicción de los efectos coligativos en el sistema Agua + NaCl mediante Machine Learning

    Get PDF
    graficas, tablasThe use of traditional models such asthemodifiedDebye-Hückelmodel, the Pitzer model, MSE (MixedSolvent Electrolyte), or e-NRTL (Non-Random Two Liquid - Electrolyte) for predicting colligative effects in the Water + NaCl system is challenging. While these models have shown good results in terms of predictions, their statistical and computational implementation has required significant effort. On the other hand, certain Machine Learning algorithms have been studied for phase equilibrium prediction in systems with dissolved electrolytes. In this study, the implementation of three Machine Learning algorithms (Neural Networks, Least Squares Support Vector Machines, and Regression Decision Trees) was evaluated for predicting the decrease in melting temperature and saturation pressure of the Water + NaCl system. The results were compared with the prediction provided by an empirical variant of the Debye-Hückel model. Zero mean, normality, and residual independence tests were conducted for all models to statistically evaluate the regression results. It was found that machine learning models have the potential to predict colligative effects in electrolyte solutions, particularly the Regression Decision Tree model, which met all the assumptions studied for both effects and proved to be a reliable prediction tool. Finally, it was demonstrated that computationally, the implementation of machine learning models was straightforward, and their implementation for new studies in property prediction is a promising research area. (Texto tomado de la fuente)El uso de los modelos tradicionales como el modelo modificado de Debye-Hückel, el modelo de Pitzer, MSE (Mixed-Solvent Electrolyte) o e-NRTL (Non-Random Two Liquid - Electrolyte) para la predicción de los efectos coligativos del sistema Agua + NaC es díficil porque aunque han tenido buenos resultados en términos predicciones, su implementación de forma estadística y computacional ha requerido diferentes esfuerzos. Por otro lado, se ha estudiado la aplicación de algoritmos de Machine Learning para la predicción de equilibrios de fase en sistemas con electrolitos disueltos. En este trabajo se evaluó la implementación de 3 algoritmos de Machine Learning (Redes Neuronales, Máquinas de Soporte de Vectores de Mínimos Cuadrados y Árboles de Decisión de Regresión) para la predicción de la disminución en la temperatura de fusión y la presión de saturación del sistema Agua + NaCl. Los resultados se compararon con la predicción dada por una variante empírica del modelo de Debye-Hückel. Para todos los modelos se realizaron pruebas de media cero, normalidad e independencia de residuales con el objetivo de evaluar estadísticamente los resultados de regresión. Se comprobó que los modelos de aprendizaje de máquina tienen potencial para la predicción de los efectos coligativos de soluciones de electrolitos; especialmente se encontró que el modelo árbol de decisión de regresión cumplio con todos los supuestos estudiados para ambos efectos, y es una herramienta de precisión fiable. Finalmente, se mostró que computacionalmente los modelos de aprendizaje automático fueron sencillos de implementar y que su implementación para nuevos estudios en la predicción de propiedades es un área de estudios prometedora.MaestríaMagíster en Ingeniería - Ingeniería Químic

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework
    corecore