Search CORE

21,356 research outputs found

A Rough Set Approach to Spatio-temporal Outlier Detection

Author: Alessia Albanese
Alfredo Petrosino
Sankar K Pal
Publication venue
Publication date: 01/01/2011
Field of study

Abstract. Detecting outliers which are grossly different from or inconsistent with the remaining spatio-temporal dataset is a major challenge in real-world knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatio-temporal data and we describe a rough set approach that finds the top outliers in an unlabeled spatio-temporal dataset. The proposed method, called Rough Outlier Set Extraction (ROSE), relies on a rough set theoretic representation of the outlier set using the rough set approximations, i.e. lower and upper approximations. It is also introduced a new set, called Kernel set, a representative subset of the original dataset, significative to outlier detection. Experimental results on real world datasets demonstrate its superiority over results obtained by various clustering algorithms. It is also shown that the kernel set is able to detect the same outliers set but with such less computational time

CiteSeerX

Analisis dan Implementasi Rough Set Outlier Factor (RSetOF) untuk Deteksi Outlier

Author: Andiny Ika Rukmana Sari
Publication venue: Universitas Telkom
Publication date: 01/01/2011
Field of study

ABSTRAKSI: Outlier merupakan suatu data yang memiliki karakteristik yang berbeda dari data pada umumnya. Outlier ini seringkali mengandung knowldegde yang tak terduga. Oleh karena itu dalam banyak aplikasi Knowledge Discovery menemukan outlier lebih menarik daripada menemukan inlier pada dataset. Deteksi outlier merupakan salah satu fungsionalitas dalam data mining yang bertujuan untuk outlier dalam suatu dataset. Ada banyak metode untuk mendeteksi outlier, namun kebanyakan mengalami kendala dalam menangani skalabilitas pada data. Masalah skalabilitas menyebabkan penggunaan jarak tidak tepat untuk menemukan outlier pada data berdimensi tinggi. RsetOF (Rough Set Outlier Factor) merupakan suatu metode untuk mendeteksi outlier pada data dengan dimensi tinggi dengan menggunakan konsep Non-Reduct dari pendekatan Rough Set. Nilai RSetOF untuk tiap data akan dihitung berdasarkan rule dari Non-Reduct untuk menentukan data tersebut outlier atau tidak. RSetOF dapat mendeteksi outlier dengan akurasi cukup baik dalam beberapa skenario pengujianm berdasarkan parameter pengukuran RSetOF, top n outlier dan parameter evaluasi detection rate dan false positive rate.Kata Kunci : outlier, RSetOF, data mining, skalabilitasABSTRACT: Outlier is data which have different characteristic when compared with the large amount of data. Outliers often contain unexpected knowledge. Because of that in many Knowledge Discovery, finding outliers is more interesting than finding inlier in dataset. Outlier Detection is one of data mining’s functionalities that aims to find outlier in dataset. There many methods to detect outlier, but most of them faced the problems of handling the scalability of dataset. Scalability problem had caused the using of distances of points inappropriate to discover outliers in high dimensional. RSetOF (Rough Set Outlier Factor) is a method o detecting outlier in high dimensional dataset based on Non-Reduct from Rough Set approach. A RsetOF value calculated for each data based on rules from Non-Reduct, whether outlier data or not. RsetOF can detect outliers with relatively good accuracy in some test scenarios based on measurement parameters RsetOF value, top n outlier and parameter evaluation of detection rate and false positive rate.Keyword: outlier, RSetOF, data mining, scalabilit

Open Library

Application of Computational Intelligence Techniques to Process Industry Problems

Author: Gabrys Bogdan
Kadlec Petr
Publication venue: EXIT Publishing House
Publication date: 01/01/2008
Field of study

In the last two decades there has been a large progress in the computational intelligence research field. The fruits of the effort spent on the research in the discussed field are powerful techniques for pattern recognition, data mining, data modelling, etc. These techniques achieve high performance on traditional data sets like the UCI machine learning database. Unfortunately, this kind of data sources usually represent clean data without any problems like data outliers, missing values, feature co-linearity, etc. common to real-life industrial data. The presence of faulty data samples can have very harmful effects on the models, for example if presented during the training of the models, it can either cause sub-optimal performance of the trained model or in the worst case destroy the so far learnt knowledge of the model. For these reasons the application of present modelling techniques to industrial problems has developed into a research field on its own. Based on the discussion of the properties and issues of the data and the state-of-the-art modelling techniques in the process industry, in this paper a novel unified approach to the development of predictive models in the process industry is presented

Bournemouth University Research Online

Online Bivariate Outlier Detection in Final Test Using Kernel Density Estimation

Author: Bossers H.C.M.
Hurink J.L.
Smit G.J.M.
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

In parametric IC testing, outlier detection is applied to filter out potential unreliable devices. Most outlier detection methods are used in an offline setting and hence are not applicable to Final Test, where immediate pass/fail decisions are required. Therefore, we developed a new bivariate online outlier detection method that is applicable to Final Test without making assumptions about a specific form of relations between two test parameters. An acceptance region is constructed using kernel density estimation. We use a grid discretization in order to enable a fast outlier decision. After each accepted device the grid is updated, hence the method is able to adapt to shifting measurements

University of Twente Research Information

Data-driven Soft Sensors in the Process Industry

Author: Abdi
Alhoniemi
Angelov
Angelov
Angelov
Arazo-Bravo
Atkeson
Bastin
Bauer
Bishop
Bogdan Gabrys
Bonne
Breiman
Bro
Casali
Chen
Chen
Chen
Chen
Chen
Choi
Chruy
Davies
Dayal
De Wolf
Desai
Devogelaere
Ding
Dong
Dong
Dote
Doyle
Dunia
Dunia
Dunia
Eriksson
Fellner
Fortuna
Fortuna
Frank
Freund
Funahashi
Gabrielsson
Gabrys
Gabrys
Gabrys
Gabrys
Gama
Geladi
Gomez
Gonzalez
Gonzalez
Goodwin
Gosset
Guyon
Han
Hastie
He
Hodge
Hotelling
Jackson
James
Jang
Jiang
Jolliffe
Jordaan
Jos de Assis
Kadlec
Kadlec
Kalos
Kampjarvi
Kittler
Kohavi
Kohonen
Kordon
Kourti
Kourti
Krogh
Kuncheva
Lee
Lee
Lee
Lee
Li
Li
Lin
Lin
Luo
Macias
Mandic
Marjanovic
Meleiro
Menold
Nauck
Neogi
Nomikos
Nomikos
Nomikos
Opitz
Park
Pearson
Pearson
Petr Kadlec
Poggio
Prasad
Principe
Qin
Qin
Qin
Qin
Radhakrishnan
Rnnar
Rong
Rotem
Ruta
Ruta
Schafer
Scheffer
Serneels
Sibylle Strandt
Stanimirova
Su
Tzanakou
van Sprang
van Sprang
Vapnik
Venkatasubramanian
Venkatasubramanian
Venkatasubramanian
Vilalta
Walczak
Walczak
Walczak
Wang
Wang
Wang
Wang
Warne
Weiss
Widmer
Wold
Wold
Wold
Wolpert
Yan
Yang
Zadeh
Zamprogna
Zamprogna
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

Crossref

Bournemouth University Research Online

Detecting Outliers in Data with Correlated Measures

Author: Kifer Daniel
Kuo Yu-Hsuan
Li Zhenhui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2018
Field of study

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

arXiv.org e-Print Archive

Crossref