56,051 research outputs found

    RIDGE LEAST ABSOLUTE DEVIATION PERFORMANCE IN ADDRESSING MULTICOLLINEARITY AND DIFFERENT LEVELS OF OUTLIER SIMULTANEOUSLY

    Get PDF
    If there is multicollinearity and outliers in the data, the inference about parameter estimation in the LS method will deviate due to the inefficiency of this method in estimating. To overcome these two problems simultaneously, it can be done using robust regression, one of which is ridge least absolute deviation method. This study aims to evaluate the performance of the ridge least absolute deviation method in surmounting multicollinearity in divers sample sizes and percentage of outliers using simulation data. The Monte Carlo study was designed in a multiple regression model with multicollinearity (Ļ=0.99) between variables  and  and outliers 10%, 20%, 30% on response variables with different sample sizes (n = 25, 50,75,100,200; =0, and Ī²=1 otherwise). The existence of multicollinearity in the data is done by calculating the correlation value between the independent variables and the VIF value. Outlier detection is done by using boxplot. Parameter estimation was carried out using the RLAD and LS methods. Furthermore, a comparison of the MSE values of the two methods is carried out to see which method is better in overcoming multicollinearity and outliers. The results showed that RLAD had a lower MSE than LS. This signifies that RLAD is more precise in estimating the regression coefficients for each sample size and various outlier levels studied

    Outlier Detection Using Nonconvex Penalized Regression

    Full text link
    This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the nn data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1L_1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1L_1 penalty corresponds to soft thresholding. We introduce a thresholding (denoted by Ī˜\Theta) based iterative procedure for outlier detection (Ī˜\Theta-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that Ī˜\Theta-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most O(np)O(np) (and sometimes much less) avoiding an O(np2)O(np^2) least squares estimate. We describe the connection between Ī˜\Theta-IPOD and MM-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression coefficients. A data-dependent choice can be made based on BIC. The tuned Ī˜\Theta-IPOD shows outstanding performance in identifying outliers in various situations in comparison to other existing approaches. This methodology extends to high-dimensional modeling with pā‰«np\gg n, if both the coefficient vector and the outlier pattern are sparse

    Detecting Outliers in Data with Correlated Measures

    Full text link
    Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

    A Parametric Framework for the Comparison of Methods of Very Robust Regression

    Full text link
    There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter Ī»\lambda that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of Ī»\lambda, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

    Get PDF
    Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work
    • ā€¦
    corecore