16,563 research outputs found

    Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

    Get PDF
    Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow

    In-Network Outlier Detection in Wireless Sensor Networks

    Full text link
    To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy usage,(3) only uses single hop communication thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance using simulation with real sensor data streams. Our results demonstrate that our approach is accurate and imposes a reasonable communication load and level of power consumption.Comment: Extended version of a paper appearing in the Int'l Conference on Distributed Computing Systems 200

    Local Subspace-Based Outlier Detection using Global Neighbourhoods

    Full text link
    Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often a mixture distribution of multiple components. We therefore introduce GLOSS, an algorithm that performs local subspace outlier detection using global neighbourhoods. Experiments on synthetic data demonstrate that GLOSS more accurately detects local outliers in mixed data than its competitors. Moreover, experiments on real-world data show that our approach identifies relevant outliers overlooked by existing methods, confirming that one should keep an eye on the global perspective even when doing local outlier detection.Comment: Short version accepted at IEEE BigData 201

    Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma

    Full text link
    A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.Comment: 14 pages, 40 figure

    A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets

    Get PDF
    The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
    corecore