Explaining Anomalies with Sapling Random Forests

Pevný, T.

Explaining Anomalies with Sapling Random Forests

Authors: T. Pevný
Publication date: 1 January 2014
Publisher

Abstract

The main objective of anomaly detection algorithms is finding samples deviating from the majority. Although a vast number of algorithms designed for this already exist, almost none of them explain, why a particular sample was labelled as an anomaly. To address this issue, we propose an algorithm called Explainer, which returns the explanation of sample’s differentness in disjunctive normal form (DNF), which is easy to understand by humans. Since Explainer treats anomaly detection algorithms as black-boxes, it can be applied in many domains to simplify investigation of anomalies. The core of Explainer is a set of specifically trained trees, which we call sapling random forests. Since their training is fast and memory efficient, the whole algorithm is lightweight and applicable to large databases, datastreams, and real-time problems. The correctness of Explainer is demonstrated on a wide range of synthetic and real world datasets

Similar works

Full text

Available Versions

National Repository of Grey Literature

oai:invenio.nusl.cz:175465

Last time updated on 16/10/2015