33,913 research outputs found

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Probabilistic Anomaly Detection in Natural Gas Time Series Data

    Get PDF
    This paper introduces a probabilistic approach to anomaly detection, specifically in natural gas time series data. In the natural gas field, there are various types of anomalies, each of which is induced by a range of causes and sources. The causes of a set of anomalies are examined and categorized, and a Bayesian maximum likelihood classifier learns the temporal structures of known anomalies. Given previously unseen time series data, the system detects anomalies using a linear regression model with weather inputs, after which the anomalies are tested for false positives and classified using a Bayesian classifier. The method can also identify anomalies of an unknown origin. Thus, the likelihood of a data point being anomalous is given for anomalies of both known and unknown origins. This probabilistic anomaly detection method is tested on a reported natural gas consumption data set

    Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

    Get PDF
    Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow

    Robust Linear Spectral Unmixing using Anomaly Detection

    Full text link
    This paper presents a Bayesian algorithm for linear spectral unmixing of hyperspectral images that accounts for anomalies present in the data. The model proposed assumes that the pixel reflectances are linear mixtures of unknown endmembers, corrupted by an additional nonlinear term modelling anomalies and additive Gaussian noise. A Markov random field is used for anomaly detection based on the spatial and spectral structures of the anomalies. This allows outliers to be identified in particular regions and wavelengths of the data cube. A Bayesian algorithm is proposed to estimate the parameters involved in the model yielding a joint linear unmixing and anomaly detection algorithm. Simulations conducted with synthetic and real hyperspectral images demonstrate the accuracy of the proposed unmixing and outlier detection strategy for the analysis of hyperspectral images

    Log-based Anomaly Detection of CPS Using a Statistical Method

    Full text link
    Detecting anomalies of a cyber physical system (CPS), which is a complex system consisting of both physical and software parts, is important because a CPS often operates autonomously in an unpredictable environment. However, because of the ever-changing nature and lack of a precise model for a CPS, detecting anomalies is still a challenging task. To address this problem, we propose applying an outlier detection method to a CPS log. By using a log obtained from an actual aquarium management system, we evaluated the effectiveness of our proposed method by analyzing outliers that it detected. By investigating the outliers with the developer of the system, we confirmed that some outliers indicate actual faults in the system. For example, our method detected failures of mutual exclusion in the control system that were unknown to the developer. Our method also detected transient losses of functionalities and unexpected reboots. On the other hand, our method did not detect anomalies that were too many and similar. In addition, our method reported rare but unproblematic concurrent combinations of operations as anomalies. Thus, our approach is effective at finding anomalies, but there is still room for improvement

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
    • …
    corecore