1 research outputs found
Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs
The detection of anomalies is essential mining task for the security and
reliability in computer systems. Logs are a common and major data source for
anomaly detection methods in almost every computer system. They collect a range
of significant events describing the runtime system status. Recent studies have
focused predominantly on one-class deep learning methods on predefined
non-learnable numerical log representations. The main limitation is that these
models are not able to learn log representations describing the semantic
differences between normal and anomaly logs, leading to a poor generalization
of unseen logs. We propose Logsy, a classification-based method to learn log
representations in a way to distinguish between normal data from the system of
interest and anomaly samples from auxiliary log datasets, easily accessible via
the internet. The idea behind such an approach to anomaly detection is that the
auxiliary dataset is sufficiently informative to enhance the representation of
the normal data, yet diverse to regularize against overfitting and improve
generalization. We propose an attention-based encoder model with a new
hyperspherical loss function. This enables learning compact log representations
capturing the intrinsic differences between normal and anomaly logs.
Empirically, we show an average improvement of 0.25 in the F1 score, compared
to the previous methods. To investigate the properties of Logsy, we perform
additional experiments including evaluation of the effect of the auxiliary data
size, the influence of expert knowledge, and the quality of the learned log
representations. The results show that the learned representation boost the
performance of the previous methods such as PCA with a relative improvement of
28.2%.Comment: 11 pages, 8 figures, Accepted at ICDM 2020: 20th IEEE International
Conference on Data Minin