9 research outputs found
Network intrusion detection using genetic programming.
Masters Degree. University of KwaZulu-Natal, Pietermaritzburg.Network intrusion detection is a real-world problem that involves detecting intrusions on a computer network. Detecting whether a network connection is intrusive or non-intrusive is essentially a binary classification problem. However, the type of intrusive connections can be categorised into a number of network attack classes and the task of associating an intrusion to a particular network type is multiclass classification.
A number of artificial intelligence techniques have been used for network intrusion detection including Evolutionary Algorithms. This thesis investigates the application of evolutionary algorithms namely, Genetic Programming (GP), Grammatical Evolution (GE) and Multi-Expression Programming (MEP) in the network intrusion detection domain. Grammatical evolution and multi-expression programming are considered to be variants of GP. In this thesis, a comparison of the effectiveness of classifiers evolved by the three EAs within the network intrusion detection domain is performed. The comparison is performed on the publicly available KDD99 dataset. Furthermore, the effectiveness of a number of fitness functions is evaluated.
From the results obtained, standard genetic programming performs better than grammatical evolution and multi-expression programming. The findings indicate that binary classifiers evolved using standard genetic programming outperformed classifiers evolved using grammatical evolution and multi-expression programming. For evolving multiclass classifiers different fitness functions used produced classifiers with different characteristics resulting in some classifiers achieving higher detection rates for specific network intrusion attacks as compared to other intrusion attacks. The findings indicate that classifiers evolved using multi-expression programming and genetic programming achieved high detection rates as compared to classifiers evolved using grammatical evolution
Improving intrusion detection model prediction by threshold adaptation
This research was supported and funded by the Government of the Sultanate of Oman represented by the Ministry of Higher Education and the Sultan Qaboos University.Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the accuracy of anomaly-based network intrusion detection systems (IDS) that are built using predictive models in a batch learning setup. This work investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these intrusion detection models. Specifically, this research studied the adaptability features of three well known machine learning algorithms: C5.0, Random Forest and Support Vector Machine. Each algorithm’s ability to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. Multiple IDS datasets were used for the analysis, including a newly generated dataset (STA2018). This research demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation traffic have different statistical properties. Tests were undertaken to analyse the effects of feature selection and data balancing on model accuracy when different significant features in traffic were used. The effects of threshold adaptation on improving accuracy were statistically analysed. Of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates.Publisher PDFPeer reviewe
Detecting exploit patterns from network packet streams
Network-based Intrusion Detection Systems (NIDS), e.g., Snort, Bro or NSM, try to detect malicious network activity such as Denial of Service (DoS) attacks and port scans by monitoring network traffic. Research from network traffic measurement has identified various patterns that exploits on today\u27s Internet typically exhibit. However, there has not been any significant attempt, so far, to design algorithms with provable guarantees for detecting exploit patterns from network traffic packets. In this work, we develop and apply data streaming algorithms to detect exploit patterns from network packet streams.
In network intrusion detection, it is necessary to analyze large volumes of data in an online fashion. Our work addresses scalable analysis of data under the following situations. (1) Attack traffic can be stealthy in nature, which means detecting a few covert attackers might call for checking traffic logs of days or even months, (2) Traffic is multidimensional and correlations between multiple dimensions maybe important, and (3) Sometimes traffic from multiple sources may need to be analyzed in a combined manner. Our algorithms offer provable bounds on resource consumption and approximation error. Our theoretical results are supported by experiments over real network traces and synthetic datasets
Advanced attack tree based intrusion detection
Computer network systems are constantly under attack or have to deal with attack
attempts. The first step in any network’s ability to fight against intrusive attacks
is to be able to detect intrusions when they are occurring. Intrusion Detection
Systems (IDS) are therefore vital in any kind of network, just as antivirus is a
vital part of a computer system. With the increasing computer network intrusion
sophistication and complexity, most of the victim systems are compromised by
sophisticated multi-step attacks. In order to provide advanced intrusion detection
capability against the multi-step attacks, it makes sense to adopt a rigorous and
generalising view to tackling intrusion attacks. One direction towards achieving
this goal is via modelling and consequently, modelling based detection.
An IDS is required that has good quality of detection capability, not only to
be able to detect higher-level attacks and describe the state of ongoing multi-step
attacks, but also to be able to determine the achievement of high-level attack
detection even if any of the modelled low-level attacks are missed by the detector,
because no alert being generated may represent that the corresponding low-level
attack is either not being conducted by the adversary or being conducted by the
adversary but evades the detection.
This thesis presents an attack tree based intrusion detection to detect multistep
attacks. An advanced attack tree modelling technique, Attack Detection Tree,
is proposed to model the multi-step attacks and facilitate intrusion detection. In
addition, the notion of Quality of Detectability is proposed to describe the ongoing
states of both intrusion and intrusion detection. Moreover, a detection uncertainty
assessment mechanism is proposed to apply the measured evidence to deal with
the uncertainty issues during the assessment process to determine the achievement
of high-level attacks even if any modelled low-level incidents may be missing
Towards an information-theoretic framework for analyzing intrusion detection systems
IDS research still needs to strengthen mathematical foundations and theoretic guidelines. In this paper, we build a formal framework, based on information theory, for analyzing and quantifying the effectiveness of an IDS. We firstly present a formal IDS model, then analyze it following an information-theoretic approach. Thus, we propose a set of information-theoretic metrics that can quantitatively measure the effectiveness of an IDS in terms of feature representation capability, classification information loss, and overall intrusion detection capability. We establish a link to relate these metrics, and prove a fundamental upper bound on the intrusion detection capability of an IDS. Our framework is a practical theory which is data trace driven and evaluation oriented in this area. In addition to grounding IDS research on a mathematical theory for formal study, this framework provides practical guidelines for IDS fine-tuning, evaluation and design, that is, the provided set of metrics greatly facilitates a static/dynamic fine-tuning of an IDS to achieve optimal operation and a fine-grained means to evaluate IDS performance and improve IDS design. We conduct experiments to demonstrate the utility of our framework in practice
Towards an information-theoretic framework for analyzing intrusion detection systems
Abstract. IDS research still needs to strengthen mathematical foundations and theoretic guidelines. In this paper, we build a formal framework, based on information theory, for analyzing and quantifying the effectiveness of an IDS. We firstly present a formal IDS model, then analyze it following an informationtheoretic approach. Thus, we propose a set of information-theoretic metrics that can quantitatively measure the effectiveness of an IDS in terms of feature representation capability, classification information loss, and overall intrusion detection capability. We establish a link to relate these metrics, and prove a fundamental upper bound on the intrusion detection capability of an IDS. Our framework is a practical theory which is data trace driven and evaluation oriented in this area. In addition to grounding IDS research on a mathematical theory for formal study, this framework provides practical guidelines for IDS fine-tuning, evaluation and design, that is, the provided set of metrics greatly facilitates a static/dynamic fine-tuning of an IDS to achieve optimal operation and a finegrained means to evaluate IDS performance and improve IDS design. We conduct experiments to demonstrate the utility of our framework in practice.
Evaluating Host Intrusion Detection Systems
Host Intrusion Detection Systems (HIDSs) are critical tools needed to provide in-depth security to computer systems. Quantitative metrics for HIDSs are necessary for comparing HIDSs or determining the optimal operational point of a HIDS. While HIDSs and Network Intrusion Detection Systems (NIDSs) greatly differ, similar evaluations have been performed on both types of IDSs by assessing metrics associated with the classification algorithm (e.g., true positives, false positives). This dissertation motivates the necessity of additional characteristics to better describe the performance and effectiveness of HIDSs.
The proposed additional characteristics are the ability to collect data where an attack manifests (visibility), the ability of the HIDS to resist attacks in the event of an intrusion (attack resiliency), the ability to timely detect attacks (efficiency), and the ability of the HIDS to avoid interfering with the normal functioning of the system under supervision (transparency). For each characteristic, we propose corresponding quantitative evaluation metrics.
To measure the effect of visibility on the detection of attacks, we introduce the probability of attack manifestation and metrics related to data quality (i.e., relevance of the data regarding the attack to be detected). The metrics were applied empirically to evaluate filesystem data, which is the data source for many HIDSs.
To evaluate attack resiliency we introduce the probability of subversion, which we estimate by measuring the isolation between the HIDS and the system under supervision. Additionally, we provide methods to evaluate time delays for efficiency, and performance overhead for transparency. The proposed evaluation methods are then applied to compare two HIDSs.
Finally, we show how to integrate the proposed measurements into a cost framework. First, mapping functions are established to link operational costs of the HIDS with the metrics proposed for efficiency and transparency. Then we show how the number of attacks detected by the HIDS not only depends on detection accuracy, but also on the evaluation results of visibility and attack resiliency
Arhitektura sistema za prepoznavanje nepravilnosti u mrežnom saobraćaju zasnovano na analizi entropije
With the steady increase in reliance on computer networks in all aspects of life, computers and
other connected devices have become more vulnerable to attacks, which exposes them to many major
threats, especially in recent years. There are different systems to protect networks from these threats such
as firewalls, antivirus programs, and data encryption, but it is still hard to provide complete protection
for networks and their systems from the attacks, which are increasingly sophisticated with time. That is
why it is required to use intrusion detection systems (IDS) on a large scale to be the second line of defense
for computer and network systems along with other network security techniques. The main objective of
intrusion detection systems is used to monitor network traffic and detect internal and external attacks.
Intrusion detection systems represent an important focus of studies today, because most
protection systems, no matter how good they are, can fail due to the emergence of new
(unknown/predefined) types of intrusions. Most of the existing techniques detect network intrusions by
collecting information about known types of attacks, so-called signature-based IDS, using them to
recognize any attempt of attack on data or resources. The major problem of this approach is its inability
to detect previously unknown attacks, even if these attacks are derived slightly from the known ones (the
so-called zero-day attack). Also, it is powerless to detect encryption-related attacks. On the other hand,
detecting abnormalities concerning conventional behavior (anomaly-based IDS) exceeds the
abovementioned limitations. Many scientific studies have tended to build modern and smart systems to
detect both known and unknown intrusions. In this research, an architecture that applies a new technique
for IDS using an anomaly-based detection method based on entropy is introduced.
Network behavior analysis relies on the profiling of legitimate network behavior in order to
efficiently detect anomalous traffic deviations that indicate security threats. Entropy-based detection
techniques are attractive due to their simplicity and applicability in real-time network traffic, with no
need to train the system with labelled data. Besides the fact that the NetFlow protocol provides only a
basic set of information about network communications, it is very beneficial for identifying zero-day
attacks and suspicious behavior in traffic structure. Nevertheless, the challenge associated with limited
NetFlow information combined with the simplicity of the entropy-based approach is providing an
efficient and sensitive mechanism to detect a wide range of anomalies, including those of small intensity.
However, a recent study found of generic entropy-based anomaly detection reports its
vulnerability to deceit by introducing spoofed data to mask the abnormality. Furthermore, the majority
of approaches for further classification of anomalies rely on machine learning, which brings additional
complexity.
Previously highlighted shortcomings and limitations of these approaches open up a space for the
exploration of new techniques and methodologies for the detection of anomalies in network traffic in
order to isolate security threats, which will be the main subject of the research in this thesis.
Abstract
An architrvture for network traffic anomaly detection system based on entropy analysis
Page vii
This research addresses all these issues by providing a systematic methodology with the main
novelty in anomaly detection and classification based on the entropy of flow count and behavior features
extracted from the basic data obtained by the NetFlow protocol.
Two new approaches are proposed to solve these concerns. Firstly, an effective protection
mechanism against entropy deception derived from the study of changes in several entropy types, such
as Shannon, Rényi, and Tsallis entropies, as well as the measurement of the number of distinct elements
in a feature distribution as a new detection metric. The suggested method improves the reliability of
entropy approaches.
Secondly, an anomaly classification technique was introduced to the existing entropy-based
anomaly detection system. Entropy-based anomaly classification methods were presented and effectively
confirmed by tests based on a multivariate analysis of the entropy changes of several features as well as
aggregation by complicated feature combinations.
Through an analysis of the most prominent security attacks, generalized network traffic behavior
models were developed to describe various communication patterns. Based on a multivariate analysis of
the entropy changes by anomalies in each of the modelled classes, anomaly classification rules were
proposed and verified through the experiments. The concept of the behavior features is generalized, while
the proposed data partitioning provides greater efficiency in real-time anomaly detection. The practicality
of the proposed architecture for the implementation of effective anomaly detection and classification
system in a general real-world network environment is demonstrated using experimental data
Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models
Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are built using predictive models in a batch-learning setup. This thesis investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these Intrusion Detection models. Specifically, this thesis studied the adaptability features of three well known Machine Learning algorithms: C5.0, Random Forest, and Support Vector Machine. The ability of these algorithms to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. A new dataset (STA2018) was generated for this thesis and used for the analysis.
This thesis has demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation (test) traffic have different statistical properties. Further investigation was undertaken to analyse the effects of feature selection and data balancing processes on a model’s accuracy when evaluation traffic with different significant features were used. The effects of threshold adaptation on reducing the accuracy degradation of these models was statistically analysed. The results showed that, of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates.
This thesis then extended the analysis to apply threshold adaptation on sampled traffic subsets, by using different sample sizes, sampling strategies and label error rates. This investigation showed the robustness of the Random Forest algorithm in identifying the best threshold. The Random Forest algorithm only needed a sample that was 0.05% of the original evaluation traffic to identify a discriminating threshold with an overall accuracy rate of nearly 90% of the optimal threshold."This research was supported and funded by the Government of the Sultanate of Oman represented by the Ministry of Higher Education and the Sultan Qaboos University." -- p. i