Isolation based anomaly detection: a re-examination

Abstract

Anomalies are instances that do not conform to the norm of a dataset. They are often indicators of interesting events such as deliberate human actions, system faults, sudden changes in the environment etc. Detecting anomalies can provide information about such events. Therefore, anomaly detection is an important data mining task which is utilised in many application domains such as intrusion detection, fraud detection, detection of disease conditions, and fault diagnosis. With the improvements in data collection and processing technologies, databases are on a course of an explosive growth in both size and number of attributes. Such growth is challenging for anomaly detection approaches because of the required scale of efficiency to handle such datasets. iForest is a recently introduced anomaly detector which is unique in the literature because it uses an isolation mechanism to identify anomalies without any distance or density calculations. The core strength of iForest is its exceptional efficiency which enables it to scaleup to very large datasets. It has been shown to perform competitively with the existing state-of-the-art anomaly detectors in datasets with several attributes. This thesis re-examines iForest to identify its strengths and weaknesses in different application settings. Three key weaknesses of iForest are identified as follows: deficiency in detecting local anomalies, anomalies masked by axis parallel normal clusters, and anomalies in multi-modal datasets. A novel isolation method is designed that employs an alternative isolation mechanism. This proposed isolation mechanism employs nearest-neighbour distance to perform isolation which is designed to be capable of overcoming the identified weaknesses of iForest. Subsequently, a hybrid isolation method which combines both the proposed isolation mechanism and the isolation mechanism of iForest is designed to harness the strengths of both mechanisms. Empirical evidence is provided to show that the proposed methods can overcome the identified weaknesses of iForest and that they are also able to scaleup efficiently to datasets of a large size and with a large number of attributes. The performance with benchmark datasets shows that the proposed methods are competitive with state-of-the-art anomaly detectors

    Similar works

    Full text

    thumbnail-image

    Available Versions