21,356 research outputs found
A Rough Set Approach to Spatio-temporal Outlier Detection
Abstract. Detecting outliers which are grossly different from or inconsistent with the remaining spatio-temporal dataset is a major challenge in real-world knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatio-temporal data and we describe a rough set approach that finds the top outliers in an unlabeled spatio-temporal dataset. The proposed method, called Rough Outlier Set Extraction (ROSE), relies on a rough set theoretic representation of the outlier set using the rough set approximations, i.e. lower and upper approximations. It is also introduced a new set, called Kernel set, a representative subset of the original dataset, significative to outlier detection. Experimental results on real world datasets demonstrate its superiority over results obtained by various clustering algorithms. It is also shown that the kernel set is able to detect the same outliers set but with such less computational time
Analisis dan Implementasi Rough Set Outlier Factor (RSetOF) untuk Deteksi Outlier
ABSTRAKSI: Outlier merupakan suatu data yang memiliki karakteristik yang berbeda dari data pada umumnya. Outlier ini seringkali mengandung knowldegde yang tak terduga. Oleh karena itu dalam banyak aplikasi Knowledge Discovery menemukan outlier lebih menarik daripada menemukan inlier pada dataset. Deteksi outlier merupakan salah satu fungsionalitas dalam data mining yang bertujuan untuk outlier dalam suatu dataset. Ada banyak metode untuk mendeteksi outlier, namun kebanyakan mengalami kendala dalam menangani skalabilitas pada data. Masalah skalabilitas menyebabkan penggunaan jarak tidak tepat untuk menemukan outlier pada data berdimensi tinggi. RsetOF (Rough Set Outlier Factor) merupakan suatu metode untuk mendeteksi outlier pada data dengan dimensi tinggi dengan menggunakan konsep Non-Reduct dari pendekatan Rough Set. Nilai RSetOF untuk tiap data akan dihitung berdasarkan rule dari Non-Reduct untuk menentukan data tersebut outlier atau tidak. RSetOF dapat mendeteksi outlier dengan akurasi cukup baik dalam beberapa skenario pengujianm berdasarkan parameter pengukuran RSetOF, top n outlier dan parameter evaluasi detection rate dan false positive rate.Kata Kunci : outlier, RSetOF, data mining, skalabilitasABSTRACT: Outlier is data which have different characteristic when compared with the large amount of data. Outliers often contain unexpected knowledge. Because of that in many Knowledge Discovery, finding outliers is more interesting than finding inlier in dataset. Outlier Detection is one of data mining’s functionalities that aims to find outlier in dataset. There many methods to detect outlier, but most of them faced the problems of handling the scalability of dataset. Scalability problem had caused the using of distances of points inappropriate to discover outliers in high dimensional. RSetOF (Rough Set Outlier Factor) is a method o detecting outlier in high dimensional dataset based on Non-Reduct from Rough Set approach. A RsetOF value calculated for each data based on rules from Non-Reduct, whether outlier data or not. RsetOF can detect outliers with relatively good accuracy in some test scenarios based on measurement parameters RsetOF value, top n outlier and parameter evaluation of detection rate and false positive rate.Keyword: outlier, RSetOF, data mining, scalabilit
Application of Computational Intelligence Techniques to Process Industry Problems
In the last two decades there has been a large progress in the computational
intelligence research field. The fruits of the effort spent on the research in the discussed
field are powerful techniques for pattern recognition, data mining, data modelling, etc.
These techniques achieve high performance on traditional data sets like the UCI
machine learning database. Unfortunately, this kind of data sources usually represent
clean data without any problems like data outliers, missing values, feature co-linearity,
etc. common to real-life industrial data. The presence of faulty data samples can have
very harmful effects on the models, for example if presented during the training of the
models, it can either cause sub-optimal performance of the trained model or in the worst
case destroy the so far learnt knowledge of the model. For these reasons the application
of present modelling techniques to industrial problems has developed into a research
field on its own. Based on the discussion of the properties and issues of the data and the
state-of-the-art modelling techniques in the process industry, in this paper a novel
unified approach to the development of predictive models in the process industry is
presented
Online Bivariate Outlier Detection in Final Test Using Kernel Density Estimation
In parametric IC testing, outlier detection is applied to filter out potential unreliable devices. Most outlier detection methods are used in an offline setting and hence are not applicable to Final Test, where immediate pass/fail decisions are required. Therefore, we developed a new bivariate online outlier detection method that is applicable to Final Test without making assumptions about a specific form of relations between two test parameters. An acceptance region is constructed using kernel density estimation. We use a grid discretization in order to enable a fast outlier decision. After each accepted device the grid is updated, hence the method is able to adapt to shifting measurements
Data-driven Soft Sensors in the Process Industry
In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale
datasets. Such datasets can be extremely noisy and often contain a significant
amount of outliers that result from sensor malfunction or human operation
faults. In order to utilize such data for real-world applications, it is
critical to detect outliers so that models built from these datasets will not
be skewed by outliers.
In this paper, we propose a new outlier detection method that utilizes the
correlations in the data (e.g., taxi trip distance vs. trip time). Different
from existing outlier detection methods, we build a robust regression model
that explicitly models the outliers and detects outliers simultaneously with
the model fitting.
We validate our approach on real-world datasets against methods specifically
designed for each dataset as well as the state of the art outlier detectors.
Our outlier detection method achieves better performances, demonstrating the
robustness and generality of our method. Last, we report interesting case
studies on some outliers that result from atypical events.Comment: 10 page
- …