2 research outputs found

    Online Detection of Outliers for Data Streams

    Get PDF
    In applications, such as Web clicks and environmental monitoring, data are in the form of a stream, each of which is an infinite sequence of data points with explicit or implicit timestamps and has special characteristics, such as transiency, uncertainty, dynamic data distribution, multi-dimensionality, asynchronous data arrival, dynamic relationships, and schema heterogeneity of data from different sources. In those applications, outliers do exist due to many reasons including human error, instrument error, catastrophe, and malicious behavior. Being able to detect outliers effectively is critical to many data management and mining tasks. However, not much research has been conducted to discover outliers in data stream applications, especially for those involving multi-dimensionality, related, heterogeneous, and asynchronous streams.In this dissertation, two innovative outlier detection algorithms, Orion and Wadjet, which take all the data streams' characteristics into consideration are presented. Orion is designed for applications where data are from single stream. It looks for a projected dimension that reveals the outlier nature of multi-dimensional data points with the help of an evolutionary algorithm, and identifies a data point as an outlier if it resides in a low density region in that dimension. Wadjet is designed for applications where data are from multiple, heterogeneous, and asynchronous streams. It has two phases: in the first phase, it processes each stream independently like Orion, and in the second phase, it captures and continuously evaluates the cross-correlation, if any, among the data points from multiple streams, and identifies a data point as an outlier if its value does not conform to the captured cross-correlation.Extensive theoretical and empirical analyses have been conducted to evaluate the performance of Orion and Wadjet using real and synthetic datasets. The evaluation results show that both algorithms have better accuracy and execution time than the state-of-art techniques when applied to homogeneous data stream applications. The results also show that Wadjet is effective in detecting outliers in heterogeneous data streams which cannot be handled by existing algorithms

    incremental outlier detection in data streams using local correlation integral

    No full text
    ACM SIGAPPIn this paper, an incremental outlier detection technique capable of dealing with a large amount of data is presented and evaluated in the context of intrusion detection. The proposed method is based on the LOcal Correlation Integral (LOCI for short). The detection technique consists of two parts. The first part named insertion receives the sequence of input point and updates Multi-granularity DEviation Factor (MDEF) of the point at intervals. The second part named deletion deletes one or a batch of points. This technique is able to process streaming data in a single scan. Moreover, the number of updates in the incremental LOCI algorithm per insertion/deletion of a single data record does not depend on the total number of data records. Experimental results with real life data sets show that the technique is capable of dealing with data streams, successfully detecting outlier. Copyright 2009 ACM
    corecore