33,913 research outputs found
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Probabilistic Anomaly Detection in Natural Gas Time Series Data
This paper introduces a probabilistic approach to anomaly detection, specifically in natural gas time series data. In the natural gas field, there are various types of anomalies, each of which is induced by a range of causes and sources. The causes of a set of anomalies are examined and categorized, and a Bayesian maximum likelihood classifier learns the temporal structures of known anomalies. Given previously unseen time series data, the system detects anomalies using a linear regression model with weather inputs, after which the anomalies are tested for false positives and classified using a Bayesian classifier. The method can also identify anomalies of an unknown origin. Thus, the likelihood of a data point being anomalous is given for anomalies of both known and unknown origins. This probabilistic anomaly detection method is tested on a reported natural gas consumption data set
Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow
Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow
Robust Linear Spectral Unmixing using Anomaly Detection
This paper presents a Bayesian algorithm for linear spectral unmixing of
hyperspectral images that accounts for anomalies present in the data. The model
proposed assumes that the pixel reflectances are linear mixtures of unknown
endmembers, corrupted by an additional nonlinear term modelling anomalies and
additive Gaussian noise. A Markov random field is used for anomaly detection
based on the spatial and spectral structures of the anomalies. This allows
outliers to be identified in particular regions and wavelengths of the data
cube. A Bayesian algorithm is proposed to estimate the parameters involved in
the model yielding a joint linear unmixing and anomaly detection algorithm.
Simulations conducted with synthetic and real hyperspectral images demonstrate
the accuracy of the proposed unmixing and outlier detection strategy for the
analysis of hyperspectral images
Log-based Anomaly Detection of CPS Using a Statistical Method
Detecting anomalies of a cyber physical system (CPS), which is a complex
system consisting of both physical and software parts, is important because a
CPS often operates autonomously in an unpredictable environment. However,
because of the ever-changing nature and lack of a precise model for a CPS,
detecting anomalies is still a challenging task. To address this problem, we
propose applying an outlier detection method to a CPS log. By using a log
obtained from an actual aquarium management system, we evaluated the
effectiveness of our proposed method by analyzing outliers that it detected. By
investigating the outliers with the developer of the system, we confirmed that
some outliers indicate actual faults in the system. For example, our method
detected failures of mutual exclusion in the control system that were unknown
to the developer. Our method also detected transient losses of functionalities
and unexpected reboots. On the other hand, our method did not detect anomalies
that were too many and similar. In addition, our method reported rare but
unproblematic concurrent combinations of operations as anomalies. Thus, our
approach is effective at finding anomalies, but there is still room for
improvement
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
- …