45,977 research outputs found
Identification of Outlying Observations with Quantile Regression for Censored Data
Outlying observations, which significantly deviate from other measurements,
may distort the conclusions of data analysis. Therefore, identifying outliers
is one of the important problems that should be solved to obtain reliable
results. While there are many statistical outlier detection algorithms and
software programs for uncensored data, few are available for censored data. In
this article, we propose three outlier detection algorithms based on censored
quantile regression, two of which are modified versions of existing algorithms
for uncensored or censored data, while the third is a newly developed algorithm
to overcome the demerits of previous approaches. The performance of the three
algorithms was investigated in simulation studies. In addition, real data from
SEER database, which contains a variety of data sets related to various
cancers, is illustrated to show the usefulness of our methodology. The
algorithms are implemented into an R package OutlierDC which can be
conveniently employed in the \proglang{R} environment and freely obtained from
CRAN
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale
datasets. Such datasets can be extremely noisy and often contain a significant
amount of outliers that result from sensor malfunction or human operation
faults. In order to utilize such data for real-world applications, it is
critical to detect outliers so that models built from these datasets will not
be skewed by outliers.
In this paper, we propose a new outlier detection method that utilizes the
correlations in the data (e.g., taxi trip distance vs. trip time). Different
from existing outlier detection methods, we build a robust regression model
that explicitly models the outliers and detects outliers simultaneously with
the model fitting.
We validate our approach on real-world datasets against methods specifically
designed for each dataset as well as the state of the art outlier detectors.
Our outlier detection method achieves better performances, demonstrating the
robustness and generality of our method. Last, we report interesting case
studies on some outliers that result from atypical events.Comment: 10 page
Bayesian outlier detection in Capital Asset Pricing Model
We propose a novel Bayesian optimisation procedure for outlier detection in
the Capital Asset Pricing Model. We use a parametric product partition model to
robustly estimate the systematic risk of an asset. We assume that the returns
follow independent normal distributions and we impose a partition structure on
the parameters of interest. The partition structure imposed on the parameters
induces a corresponding clustering of the returns. We identify via an
optimisation procedure the partition that best separates standard observations
from the atypical ones. The methodology is illustrated with reference to a real
data set, for which we also provide a microeconomic interpretation of the
detected outliers
Robust correlation analyses: false positive and power validation using a new open source Matlab toolbox
Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab(R) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand
- …