Data-Efficient, Federated Learning for Raw Network Traffic Detection

Abstract

Traditional machine learning (ML) models used for enterprise network intrusion detection systems (NIDS) typically rely on vast amounts of centralized data with expertly engineered features. Previous work, however, has shown the feasibility of using deep learning (DL) to detect malicious activity on raw network traffic payloads rather than engineered features at the edge, which is necessary for tactical military environments. In the future Internet of Battlefield Things (IoBT), the military will find itself in multiple environments with disconnected networks spread across the battlefield. These resource-constrained, data-limited networks require distributed and collaborative ML/DL models for inference that are continually trained both locally, using data from each separate tactical edge network, and then globally in order to learn and detect malicious activity represented across the multiple networks in a collaborative fashion. Federated Learning (FL), a collaborative paradigm which updates and distributes a global model through local model weight aggregation, provides a solution to train ML/DL models in NIDS utilizing learning from multiple edge devices from the disparate networks without the sharing of raw data. We develop and experiment with a data-efficient, FL framework for IoBT settings for intrusion detection using only raw network traffic in restricted, resource-limited environments. Our results indicate that regardless of the DL model architecture used on edge devices, the Federated Averaging FL algorithm achieved over 93% accuracy in model performance in detecting malicious payloads after only five episodes of FL training

    Similar works