17,538 research outputs found

    NETS: Extremely fast outlier detection from a data stream via set-based processing

    Get PDF
    This paper addresses the problem of efficiently detecting outliers from a data stream as old data points expire from and new data points enter the window incrementally. The proposed method is based on a newly discovered characteristic of a data stream that the change in the locations of data points in the data space is typically very insignificant. This observation has led to the finding that the existing distance-based outlier detection algorithms perform excessive unnecessary computations that are repetitive and/or canceling out the effects. Thus, in this paper, we propose a novel set-based approach to detecting outliers, whereby data points at similar locations are grouped and the detection of outliers or inliers is handled at the group level. Specifically, a new algorithm NETS is proposed to achieve a remarkable performance improvement by realizing set-based early identification of outliers or inliers and taking advantage of the net effect between expired and new data points. Additionally, NETS is capable of achieving the same efficiency even for a high-dimensional data stream through two-level dimensional filtering. Comprehensive experiments using six real-world data streams show 5 to 25 times faster processing time than state-of-the-art algorithms with comparable memory consumption. We assert that NETS opens a new possibility to real-time data stream outlier detection

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection

    Get PDF
    In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient

    Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

    Get PDF
    Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow

    Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma

    Full text link
    A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.Comment: 14 pages, 40 figure

    Halo Star Streams in the Solar Neighborhood

    Full text link
    We have assembled a sample of halo stars in the solar neighborhood to look for halo substructure in velocity and angular momentum space. Our sample includes red giants, RR Lyrae, and red horizontal branch stars within 2.5 kpc of the Sun with [Fe/H] less than -1.0. It was chosen to include stars with accurate distances, space velocities, and metallicities as well as well-quantified errors. We confirm the existence of the streams found by Helmi and coworkers, which we refer to as the H99 streams. These streams have a double-peaked velocity distribution in the z direction. We use the results of modeling of the H99 streams by Helmi and collaborators to test how one might use v_z velocity information and radial velocity information to detect kinematic substructure in the halo. We find that detecting the H99 streams with radial velocities alone would require a large sample. We use the velocity distribution of the H99 streams to estimate their age. From our model of the progenitor of the H99 streams, we determine that it was accreted between 6 and 9 Gyr ago. The H99 streams have [alpha/Fe] abundances similar to other halo stars in the solar neighborhood, suggesting that the gas that formed these stars were enriched mostly by Type II SNe. We have also discovered in angular momentum space two other possible substructures, which we refer to as the retrograde and prograde outliers. The retrograde outliers are likely to be halo substructure, but the prograde outliers are most likely part of the smooth halo. The retrograde outliers have significant structure in the v_phi direction and show a range of [alpha/Fe]. The methods presented in this paper can be used to exploit the kinematic information present in future large databases like RAVE, SDSSII/SEGUE, and Gaia.Comment: 46 pages, 13 figures, and 9 tables. Minor changes to text to match proofed version of the paper. Low resolution figures. High resolution version at http://www.astro.wisc.edu/~kepley/solar_streams.p
    corecore