Study on log event noise reduction by using Naive Bayes supervised machine learning

Spain, Marc

Study on log event noise reduction by using Naive Bayes supervised machine learning

Authors: Marc Spain
Publication date: 1 December 2019
Publisher

Abstract

This research addresses which Naive Bayes model would be best to predict Windows log events that could be considered noise or in other words not containing information about malicious activities. With the exploding amount of log data being generated by servers, large corporations or organizations are having an increasingly difficult time analyzing these logs to find evidence of malicious activity in their environment. Fortune 200 and larger corporations today are producing Terabytes of log events daily and this is expanding at a rate that soon it will be in the Petabytes. It is estimated that 80 to 90 percent of these log events could be classified as noise or just informational. They are not needed for finding evidence of malicious activity. By showing a process that can be used to predict whether these log events are noise or non-noise, with a reasonable degree of accuracy, tools could then be used to analyze log events to find malicious activity to filter out noise events and reduce the amount of data needed to be processed. This research will compare the Naive Bayes Bag of Words Multinomial, Multinomial TF-IDF and Multi-Variate Bernoulli models using different size feature word sets in predicting Windows noise log events

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

SHAREOK repository

oai:shareok.org:11244/324912

Last time updated on 23/11/2020