Search CORE

2 research outputs found

Outlier detection algorithms over fuzzy data with weighted least squares

Author: Kolev Kraszimir
Nikolova N
Rodríguez RM
Symes M
Tenekedjiev K
Toneva D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

In the classical leave-one-out procedure for outlier detection in regression analysis, we exclude an observation and then construct a model on the remaining data. If the difference between predicted and observed value is high we declare this value an outlier. As a rule, those procedures utilize single comparison testing. The problem becomes much harder when the observations can be associated with a given degree of membership to an underlying population, and the outlier detection should be generalized to operate over fuzzy data. We present a new approach for outlier detection that operates over fuzzy data using two inter-related algorithms. Due to the way outliers enter the observation sample, they may be of various order of magnitude. To account for this, we divided the outlier detection procedure into cycles. Furthermore, each cycle consists of two phases. In Phase 1, we apply a leave-one-out procedure for each non-outlier in the dataset. In Phase 2, all previously declared outliers are subjected to Benjamini–Hochberg step-up multiple testing procedure controlling the false-discovery rate, and the non-confirmed outliers can return to the dataset. Finally, we construct a regression model over the resulting set of non-outliers. In that way, we ensure that a reliable and high-quality regression model is obtained in Phase 1 because the leave-one-out procedure comparatively easily purges the dubious observations due to the single comparison testing. In the same time, the confirmation of the outlier status in relation to the newly obtained high-quality regression model is much harder due to the multiple testing procedure applied hence only the true outliers remain outside the data sample. The two phases in each cycle are a good trade-off between the desire to construct a high-quality model (i.e., over informative data points) and the desire to use as much data points as possible (thus leaving as much observations as possible in the data sample). The number of cycles is user defined, but the procedures can finalize the analysis in case a cycle with no new outliers is detected. We offer one illustrative example and two other practical case studies (from real-life thrombosis studies) that demonstrate the application and strengths of our algorithms. In the concluding section, we discuss several limitations of our approach and also offer directions for future research

University of Tasmania Open Access Repository

Semmelweis Repository

Outlier detection algorithms over fuzzy data with weighted least squares

Author: Kolev K
Nikolova N
Rodriguez RM
Symes M
Tenekedjiev K
Toneva D
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2021
Field of study

University of Tasmania Open Access Repository

Repository of the Academy's Library

Semmelweis Repository