Search CORE

2,489 research outputs found

A survey of outlier detection methodologies

Author: Austin J.
Hodge V.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

CiteSeerX

Crossref

White Rose Research Online

On the Feasibility of Distinguishing Between Process Disturbances and Intrusions in Process Control Systems using Multivariate Statistical Process Control

Author: Camacho José
Garitano Iñaki
Iturbe Mikel
Uribeetxeberria Roberto
Zurutuza Urko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Process Control Systems (PCSs) are the operat-ing core of Critical Infrastructures (CIs). As such, anomalydetection has been an active research field to ensure CInormal operation. Previous approaches have leveraged networklevel data for anomaly detection, or have disregarded theexistence of process disturbances, thus opening the possibility of mislabelling disturbances as attacks and vice versa. In thispaper we present an anomaly detection and diagnostic systembased on Multivariate Statistical Process Control (MSPC), thataims to distinguish between attacks and disturbances. For this end, we expand traditional MSPC to monitor process leveland controller level data. We evaluate our approach using the Tennessee-Eastman process. Results show that our approachcan be used to distinguish disturbances from intrusions to acertain extent and we conclude that the proposed approach canbe extended with other sources of data for improving results

arXiv.org e-Print Archive

Crossref

eBiltegia

New Methods for Network Traffic Anomaly Detection

Author: Babaie Tahereh Tara
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2014
Field of study

In this thesis we examine the efficacy of applying outlier detection techniques to understand the behaviour of anomalies in communication network traffic. We have identified several shortcomings. Our most finding is that known techniques either focus on characterizing the spatial or temporal behaviour of traffic but rarely both. For example DoS attacks are anomalies which violate temporal patterns while port scans violate the spatial equilibrium of network traffic. To address this observed weakness we have designed a new method for outlier detection based spectral decomposition of the Hankel matrix. The Hankel matrix is spatio-temporal correlation matrix and has been used in many other domains including climate data analysis and econometrics. Using our approach we can seamlessly integrate the discovery of both spatial and temporal anomalies. Comparison with other state of the art methods in the networks community confirms that our approach can discover both DoS and port scan attacks. The spectral decomposition of the Hankel matrix is closely tied to the problem of inference in Linear Dynamical Systems (LDS). We introduce a new problem, the Online Selective Anomaly Detection (OSAD) problem, to model the situation where the objective is to report new anomalies in the system and suppress know faults. For example, in the network setting an operator may be interested in triggering an alarm for malicious attacks but not on faults caused by equipment failure. In order to solve OSAD we combine techniques from machine learning and control theory in a unique fashion. Machine Learning ideas are used to learn the parameters of an underlying data generating system. Control theory techniques are used to model the feedback and modify the residual generated by the data generating state model. Experiments on synthetic and real data sets confirm that the OSAD problem captures a general scenario and tightly integrates machine learning and control theory to solve a practical problem

Sydney eScholarship

Tiresias: Online Anomaly Detection for Hierarchical Operational Network Data

Author: Caesar Matthew
Duffield Nick
Hong Chi-Yao
Wang Jia
Publication venue
Publication date: 01/01/2012
Field of study

Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect problems in their networks. Unfortunately, there is lack of efficient tools to automatically track and detect anomalous events on operational data, causing ISP operators to rely on manual inspection of this data. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). To address these challenges, we propose Tiresias, an automated approach to locating anomalous events on hierarchical operational data. Tiresias leverages the hierarchical structure of operational data to identify high-impact aggregates (e.g., locations in the network, failure modes) likely to be associated with anomalous events. To accommodate different kinds of operational network data, Tiresias consists of an online detection algorithm with low time and space complexity, while preserving high detection accuracy. We present results from two case studies using operational data collected at a large commercial IP network operated by a Tier-1 ISP: customer care call logs and set-top box crash logs. By comparing with a reference set verified by the ISP's operational group, we validate that Tiresias can achieve >94% accuracy in locating anomalies. Tiresias also discovered several previously unknown anomalies in the ISP's customer care cases, demonstrating its effectiveness

arXiv.org e-Print Archive

CiteSeerX