Search CORE

6,289 research outputs found

Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy

Author: Zhang Ji
Publication venue
Publication date: 01/12/2008
Field of study

[Abstract]: Outlier detection is an important research problem in data mining that aims to discover useful abnormal and irregular patterns hidden in large data sets. Most existing outlier detection methods only deal with static data with relatively low dimensionality. Recently, outlier detection for high-dimensional stream data became a new emerging research problem. A key observation that motivates this research is that outliers in high-dimensional data are projected outliers, i.e., they are embedded in lower-dimensional subspaces. Detecting projected outliers from high-dimensional stream data is a very challenging task for several reasons. First, detecting projected outliers is difficult even for high-dimensional static data. The exhaustive search for the out-lying subspaces where projected outliers are embedded is a NP problem. Second, the algorithms for handling data streams are constrained to take only one pass to process the streaming data with the conditions of space limitation and time criticality. The currently existing methods for outlier detection are found to be ineffective for detecting projected outliers in high-dimensional data streams. In this thesis, we present a new technique, called the Stream Project Outlier deTector (SPOT), which attempts to detect projected outliers in high-dimensional data streams. SPOT employs an innovative window-based time model in capturing dynamic statistics from stream data, and a novel data structure containing a set of top sparse subspaces to detect projected outliers effectively. SPOT also employs a multi-objective genetic algorithm as an effective search method for finding the outlying subspaces where most projected outliers are embedded. The experimental results demonstrate that SPOT is efficient and effective in detecting projected outliers for high-dimensional data streams. The main contribution of this thesis is that it provides a backbone in tackling the challenging problem of outlier detection for high- dimensional data streams. SPOT can facilitate the discovery of useful abnormal patterns and can be potentially applied to a variety of high demand applications, such as for sensor network data monitoring, online transaction protection, etc

University of Southern Queensland ePrints

Density-based projected clustering of data streams

Author: Gaber M.
Hassani M.
Seidl T.
Spaus P.
Publication venue
Publication date: 01/01/2012
Field of study

Portsmouth University Research Portal (Pure)

Publikationsserver der RWTH Aachen University

Local Subspace-Based Outlier Detection using Global Neighbourhoods

Author: Bäck Thomas
van Leeuwen Matthijs
van Stein Bas
Publication venue
Publication date: 01/11/2016
Field of study

Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often a mixture distribution of multiple components. We therefore introduce GLOSS, an algorithm that performs local subspace outlier detection using global neighbourhoods. Experiments on synthetic data demonstrate that GLOSS more accurately detects local outliers in mixed data than its competitors. Moreover, experiments on real-world data show that our approach identifies relevant outliers overlooked by existing methods, confirming that one should keep an eye on the global perspective even when doing local outlier detection.Comment: Short version accepted at IEEE BigData 201

arXiv.org e-Print Archive

Crossref

Unsupervised Learning for Online AbnormalityDetection in Smart Meter Data

Author: Aligholian Armin
Farajollahi Mohammad
Mohsenian-Rad Hamed
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Crossref

eScholarship - University of California

Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

Author: Belhadi Asma
Cano Alberto
Djenouri Djamel
Djenouri Youcef
Lin Chun Wei
Publication venue: VCU Scholars Compass
Publication date: 01/01/2019
Field of study

Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow

VCU Scholars Compass

NORA - Norwegian Open Research Archives