Analyzing the influence of the sampling rate in the detection of malicious traffic on flow data

Abstract

[EN] Cyberattacks are a growing concern for companies and public administrations. The literature shows that analyzing network-layer traffic can detect intrusion attempts. However, such detection usually implies studying every datagram in a computer network. Therefore, routers routing a significant volume of network traffic do not perform an in-depth analysis of every packet. Instead, they analyze traffic patterns based on network flows. However, even gathering and analyzing flow data has a high-computational cost, and therefore routers usually apply a sampling rate to generate flow data. Adjusting the sampling rate is a tricky problem. If the sampling rate is low, much information is lost and some cyberattacks may be neglected, but if the sampling rate is high, routers cannot deal with it. This paper tries to characterize the influence of this parameter in different detection methods based on machine learning. To do so, we trained and tested malicious-traffic detection models using synthetic flow data gathered with several sampling rates. Then, we double-check the above models with flow data from the public BoT-IoT dataset and with actual flow data collected on RedCAYLE, the Castilla y León regional academic network.S

    Similar works