21,356 research outputs found

    A Rough Set Approach to Spatio-temporal Outlier Detection

    Get PDF
    Abstract. Detecting outliers which are grossly different from or inconsistent with the remaining spatio-temporal dataset is a major challenge in real-world knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatio-temporal data and we describe a rough set approach that finds the top outliers in an unlabeled spatio-temporal dataset. The proposed method, called Rough Outlier Set Extraction (ROSE), relies on a rough set theoretic representation of the outlier set using the rough set approximations, i.e. lower and upper approximations. It is also introduced a new set, called Kernel set, a representative subset of the original dataset, significative to outlier detection. Experimental results on real world datasets demonstrate its superiority over results obtained by various clustering algorithms. It is also shown that the kernel set is able to detect the same outliers set but with such less computational time

    Analisis dan Implementasi Rough Set Outlier Factor (RSetOF) untuk Deteksi Outlier

    Get PDF
    ABSTRAKSI: Outlier merupakan suatu data yang memiliki karakteristik yang berbeda dari data pada umumnya. Outlier ini seringkali mengandung knowldegde yang tak terduga. Oleh karena itu dalam banyak aplikasi Knowledge Discovery menemukan outlier lebih menarik daripada menemukan inlier pada dataset. Deteksi outlier merupakan salah satu fungsionalitas dalam data mining yang bertujuan untuk outlier dalam suatu dataset. Ada banyak metode untuk mendeteksi outlier, namun kebanyakan mengalami kendala dalam menangani skalabilitas pada data. Masalah skalabilitas menyebabkan penggunaan jarak tidak tepat untuk menemukan outlier pada data berdimensi tinggi. RsetOF (Rough Set Outlier Factor) merupakan suatu metode untuk mendeteksi outlier pada data dengan dimensi tinggi dengan menggunakan konsep Non-Reduct dari pendekatan Rough Set. Nilai RSetOF untuk tiap data akan dihitung berdasarkan rule dari Non-Reduct untuk menentukan data tersebut outlier atau tidak. RSetOF dapat mendeteksi outlier dengan akurasi cukup baik dalam beberapa skenario pengujianm berdasarkan parameter pengukuran RSetOF, top n outlier dan parameter evaluasi detection rate dan false positive rate.Kata Kunci : outlier, RSetOF, data mining, skalabilitasABSTRACT: Outlier is data which have different characteristic when compared with the large amount of data. Outliers often contain unexpected knowledge. Because of that in many Knowledge Discovery, finding outliers is more interesting than finding inlier in dataset. Outlier Detection is one of data mining’s functionalities that aims to find outlier in dataset. There many methods to detect outlier, but most of them faced the problems of handling the scalability of dataset. Scalability problem had caused the using of distances of points inappropriate to discover outliers in high dimensional. RSetOF (Rough Set Outlier Factor) is a method o detecting outlier in high dimensional dataset based on Non-Reduct from Rough Set approach. A RsetOF value calculated for each data based on rules from Non-Reduct, whether outlier data or not. RsetOF can detect outliers with relatively good accuracy in some test scenarios based on measurement parameters RsetOF value, top n outlier and parameter evaluation of detection rate and false positive rate.Keyword: outlier, RSetOF, data mining, scalabilit

    Application of Computational Intelligence Techniques to Process Industry Problems

    Get PDF
    In the last two decades there has been a large progress in the computational intelligence research field. The fruits of the effort spent on the research in the discussed field are powerful techniques for pattern recognition, data mining, data modelling, etc. These techniques achieve high performance on traditional data sets like the UCI machine learning database. Unfortunately, this kind of data sources usually represent clean data without any problems like data outliers, missing values, feature co-linearity, etc. common to real-life industrial data. The presence of faulty data samples can have very harmful effects on the models, for example if presented during the training of the models, it can either cause sub-optimal performance of the trained model or in the worst case destroy the so far learnt knowledge of the model. For these reasons the application of present modelling techniques to industrial problems has developed into a research field on its own. Based on the discussion of the properties and issues of the data and the state-of-the-art modelling techniques in the process industry, in this paper a novel unified approach to the development of predictive models in the process industry is presented

    Online Bivariate Outlier Detection in Final Test Using Kernel Density Estimation

    Get PDF
    In parametric IC testing, outlier detection is applied to filter out potential unreliable devices. Most outlier detection methods are used in an offline setting and hence are not applicable to Final Test, where immediate pass/fail decisions are required. Therefore, we developed a new bivariate online outlier detection method that is applicable to Final Test without making assumptions about a specific form of relations between two test parameters. An acceptance region is constructed using kernel density estimation. We use a grid discretization in order to enable a fast outlier decision. After each accepted device the grid is updated, hence the method is able to adapt to shifting measurements

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

    Detecting Outliers in Data with Correlated Measures

    Full text link
    Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page
    corecore