Improving the Capabilities of Distributed Collaborative Intrusion Detection Systems using Machine Learning

Abstract

The impact of computer networks on modern society cannot be estimated. Arguably, computer networks are one of the core enablers of the contemporary world. Large computer networks are essential tools which drive our economy, critical infrastructure, education and entertainment. Due to their ubiquitousness and importance, it is reasonable to assume that security is an intrinsic aspect of their design. Yet, due to how networks developed, the security of this communication medium is still an outstanding issue. Proactive and reactive security mechanisms exist to cope with the security problems that arise when computer networks are used. Proactive mechanisms attempt to prevent malicious activity in a network. Prevention alone, however, is not sufficient: it is imprudent to assume that security cannot be bypassed. Reactive mechanisms are responsible for finding malicious activity that circumvents proactive security mechanisms. The most emblematic reactive mechanism for detecting intrusions in a network is known as a Network Intrusion Detection System (NIDS). Large networks represent immense attack surfaces where malicious actors can conceal their intentions by distributing their activities. A single NIDS needs to process massive quantities of traffic to discover malicious distributed activities. As individual NIDS have limited resources and a narrow monitoring scope, large networks need to employ multiple NIDS. Coordinating the detection efforts of NIDS is not a trivial task and, as a result, Collaborative Intrusion Detection System (CIDSs) were conceived. A CIDS is a group of NIDSs that collaborate to exchange information that enables them to detect distributed malicious activities. CIDSs may coordinate NIDSs using different communication overlays. From among the different communication overlays a CIDSs may use, a distributed one promises the most. Distributed overlays are scalable, dynamic, resilient and do not have a single point of failure. Distributed CIDSs, i.e., those using distributed overlays, are preferred in theory, yet not often deployed in practice. Several open issues exist that constraint the use of CIDSs in practice. In this thesis, we propose solutions to address some of the outstanding issues that prevent distributed CIDSs from becoming viable in practice. Our contributions rely on diverse Machine Learning (ML) techniques and concepts to solve these issues. The thesis is structured around five main contributions, each developed within a dedicated chapter. Our specific contributions are as follows. Dataset Generation We survey the intrusion detection research field to analyze and categorize the datasets that are used to develop, compare, and test NIDSs as well as CIDSs. From the defects we found in the datasets, we develop a classification of dataset defects. With our classification of dataset issues, we develop concepts to create suitable datasets for training and testing ML based NIDSs and CIDSs. With our concepts, we injects synthetic attacks into real background traffic. The generated attacks replicate the properties of the background traffic to make attacks as indistinguishable as they can be from real traffic. Intrusion Detection We develop an anomaly-based NIDS capable of overcoming some of the limitations that NIDSs have when they are used in large networks. Our anomaly-based NIDS leverages autoencoders and dropout to create models of normality that accurately describe the behavior of large networks. Our NIDS scales to the number of analyzed features, can learn adequate normality models even when anomalies are present in the learning data, operates in real time, and is accurate with only minimal false positives. Community Formation We formulate concepts to build communities of NIDSs, coined community-based CIDSs, that implement centralized ML algorithms in a distributed environment. Community-based CIDSs detect distributed attacks through the use of ensemble learning. Ensemble learning is used to combine local ML models created by different communities to detect network-wide attacks that individual communities would otherwise struggle to detect. Information Dissemination We design a dissemination strategy specific to CIDSs. The strategy enables NIDSs to efficiently disseminate information to discover and infer when similar network events take place, potentially uncovering distributed attacks. In contrast to other dissemination strategies, our strategy efficiently encodes, aggregates, correlates, and shares network features while minimizing network overhead. We use Sketches to aggregate data and Bayesian Networks to deduce new information from the aggregation process. Collusion Detection We devise an evidence-based trust mechanism that detects if the NIDSs of a CIDS are acting honestly, according to the goals of the CIDS, or dishonestly. The trust mechanism uses the reliability of the sensors and Bayesian-like estimators to compute trust scores. From the trust scores, our mechanism is designed to detect not only single dishonest NIDSs but multiple coalitions of dishonest ones. A coalition is a coordinated group of dishonest NIDSs that lie to boost their trust scores, and to reduce the trust scores of others outside the group

    Similar works