45,977 research outputs found

    Identification of Outlying Observations with Quantile Regression for Censored Data

    Full text link
    Outlying observations, which significantly deviate from other measurements, may distort the conclusions of data analysis. Therefore, identifying outliers is one of the important problems that should be solved to obtain reliable results. While there are many statistical outlier detection algorithms and software programs for uncensored data, few are available for censored data. In this article, we propose three outlier detection algorithms based on censored quantile regression, two of which are modified versions of existing algorithms for uncensored or censored data, while the third is a newly developed algorithm to overcome the demerits of previous approaches. The performance of the three algorithms was investigated in simulation studies. In addition, real data from SEER database, which contains a variety of data sets related to various cancers, is illustrated to show the usefulness of our methodology. The algorithms are implemented into an R package OutlierDC which can be conveniently employed in the \proglang{R} environment and freely obtained from CRAN

    Detecting Outliers in Data with Correlated Measures

    Full text link
    Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

    Bayesian outlier detection in Capital Asset Pricing Model

    Full text link
    We propose a novel Bayesian optimisation procedure for outlier detection in the Capital Asset Pricing Model. We use a parametric product partition model to robustly estimate the systematic risk of an asset. We assume that the returns follow independent normal distributions and we impose a partition structure on the parameters of interest. The partition structure imposed on the parameters induces a corresponding clustering of the returns. We identify via an optimisation procedure the partition that best separates standard observations from the atypical ones. The methodology is illustrated with reference to a real data set, for which we also provide a microeconomic interpretation of the detected outliers

    Robust correlation analyses: false positive and power validation using a new open source Matlab toolbox

    Get PDF
    Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab(R) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand
    corecore