6 research outputs found
FastPacket: Towards Pre-trained Packets Embedding based on FastText for next-generation NIDS
New Attacks are increasingly used by attackers everyday but many of them are
not detected by Intrusion Detection Systems as most IDS ignore raw packet
information and only care about some basic statistical information extracted
from PCAP files. Using networking programs to extract fixed statistical
features from packets is good, but may not enough to detect nowadays
challenges. We think that it is time to utilize big data and deep learning for
automatic dynamic feature extraction from packets. It is time to get inspired
by deep learning pre-trained models in computer vision and natural language
processing, so security deep learning solutions will have its pre-trained
models on big datasets to be used in future researches. In this paper, we
proposed a new approach for embedding packets based on character-level
embeddings, inspired by FastText success on text data. We called this approach
FastPacket. Results are measured on subsets of CIC-IDS-2017 dataset, but we
expect promising results on big data pre-trained models. We suggest building
pre-trained FastPacket on MAWI big dataset and make it available to community,
similar to FastText. To be able to outperform currently used NIDS, to start a
new era of packet-level NIDS that can better detect complex attacks.Comment: arXiv admin note: text overlap with arXiv:2209.1396
Are Public Intrusion Datasets Fit for Purpose: Characterising the State of the Art in Intrusion Event Datasets
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.In recent years cybersecurity attacks have caused major disruption and information loss for online organisations, with high profile incidents in the news. One of the key challenges in advancing the state of the art in intrusion detection is the lack of representative datasets. These datasets typically contain millions of time-ordered events (e.g. network packet traces, flow summaries, log entries); subsequently analysed to identify abnormal behavior and specific attacks [1]. Generating realistic datasets has historically required expensive networked assets, specialised traffic generators, and considerable design preparation. Even with advances in virtualisation it remains challenging to create and maintain a representative environment.
Major improvements are needed in the design, quality and availability of datasets, to assist researchers in developing advanced detection techniques. With the emergence of new technology paradigms, such as intelligent transport and autonomous vehicles, it is also likely that new classes of threat will emerge [2]. Given the rate of change in threat behavior [3] datasets become quickly obsolete, and some of the most widely cited datasets date back over two decades. Older datasets have limited value: often heavily filtered and anonymised, with unrealistic event distributions, and opaque design methodology.
The relative scarcity of (Intrusion Detection System) IDS datasets is compounded by the lack of a central registry, and inconsistent information on provenance. Researchers may also find it hard to locate datasets or understand their relative merits. In addition, many datasets rely on simulation, originating from academic or government institutions. The publication process itself often creates conflicts, with the need to de-identify sensitive information in order to meet regulations such as General Data Protection Act (GDPR) [4]. Another final issue for researchers is the lack of standardised metrics with which to compare dataset quality.
In this paper we attempt to classify the most widely used public intrusion datasets, providing references to archives and associated literature. We illustrate their relative utility and scope, highlighting the threat composition, formats, special features, and associated limitations. We identify best practice in dataset design, and describe potential pitfalls of designing anomaly detection techniques based on data that may be either inappropriate, or compromised due to unrealistic threat coverage. Such contributions as made in this paper is expected to facilitate continuous research and development for effectively combating the constantly evolving cyber threat landscape
Arhitektura sistema za prepoznavanje nepravilnosti u mrežnom saobraćaju zasnovano na analizi entropije
With the steady increase in reliance on computer networks in all aspects of life, computers and
other connected devices have become more vulnerable to attacks, which exposes them to many major
threats, especially in recent years. There are different systems to protect networks from these threats such
as firewalls, antivirus programs, and data encryption, but it is still hard to provide complete protection
for networks and their systems from the attacks, which are increasingly sophisticated with time. That is
why it is required to use intrusion detection systems (IDS) on a large scale to be the second line of defense
for computer and network systems along with other network security techniques. The main objective of
intrusion detection systems is used to monitor network traffic and detect internal and external attacks.
Intrusion detection systems represent an important focus of studies today, because most
protection systems, no matter how good they are, can fail due to the emergence of new
(unknown/predefined) types of intrusions. Most of the existing techniques detect network intrusions by
collecting information about known types of attacks, so-called signature-based IDS, using them to
recognize any attempt of attack on data or resources. The major problem of this approach is its inability
to detect previously unknown attacks, even if these attacks are derived slightly from the known ones (the
so-called zero-day attack). Also, it is powerless to detect encryption-related attacks. On the other hand,
detecting abnormalities concerning conventional behavior (anomaly-based IDS) exceeds the
abovementioned limitations. Many scientific studies have tended to build modern and smart systems to
detect both known and unknown intrusions. In this research, an architecture that applies a new technique
for IDS using an anomaly-based detection method based on entropy is introduced.
Network behavior analysis relies on the profiling of legitimate network behavior in order to
efficiently detect anomalous traffic deviations that indicate security threats. Entropy-based detection
techniques are attractive due to their simplicity and applicability in real-time network traffic, with no
need to train the system with labelled data. Besides the fact that the NetFlow protocol provides only a
basic set of information about network communications, it is very beneficial for identifying zero-day
attacks and suspicious behavior in traffic structure. Nevertheless, the challenge associated with limited
NetFlow information combined with the simplicity of the entropy-based approach is providing an
efficient and sensitive mechanism to detect a wide range of anomalies, including those of small intensity.
However, a recent study found of generic entropy-based anomaly detection reports its
vulnerability to deceit by introducing spoofed data to mask the abnormality. Furthermore, the majority
of approaches for further classification of anomalies rely on machine learning, which brings additional
complexity.
Previously highlighted shortcomings and limitations of these approaches open up a space for the
exploration of new techniques and methodologies for the detection of anomalies in network traffic in
order to isolate security threats, which will be the main subject of the research in this thesis.
Abstract
An architrvture for network traffic anomaly detection system based on entropy analysis
Page vii
This research addresses all these issues by providing a systematic methodology with the main
novelty in anomaly detection and classification based on the entropy of flow count and behavior features
extracted from the basic data obtained by the NetFlow protocol.
Two new approaches are proposed to solve these concerns. Firstly, an effective protection
mechanism against entropy deception derived from the study of changes in several entropy types, such
as Shannon, Rényi, and Tsallis entropies, as well as the measurement of the number of distinct elements
in a feature distribution as a new detection metric. The suggested method improves the reliability of
entropy approaches.
Secondly, an anomaly classification technique was introduced to the existing entropy-based
anomaly detection system. Entropy-based anomaly classification methods were presented and effectively
confirmed by tests based on a multivariate analysis of the entropy changes of several features as well as
aggregation by complicated feature combinations.
Through an analysis of the most prominent security attacks, generalized network traffic behavior
models were developed to describe various communication patterns. Based on a multivariate analysis of
the entropy changes by anomalies in each of the modelled classes, anomaly classification rules were
proposed and verified through the experiments. The concept of the behavior features is generalized, while
the proposed data partitioning provides greater efficiency in real-time anomaly detection. The practicality
of the proposed architecture for the implementation of effective anomaly detection and classification
system in a general real-world network environment is demonstrated using experimental data