69 research outputs found
Benchmark-Based Reference Model for Evaluating Botnet Detection Tools Driven by Traffic-Flow Analytics
Botnets are some of the most recurrent cyber-threats, which take advantage of the wide
heterogeneity of endpoint devices at the Edge of the emerging communication environments for
enabling the malicious enforcement of fraud and other adversarial tactics, including malware, data
leaks or denial of service. There have been significant research advances in the development of
accurate botnet detection methods underpinned on supervised analysis but assessing the accuracy
and performance of such detection methods requires a clear evaluation model in the pursuit of
enforcing proper defensive strategies. In order to contribute to the mitigation of botnets, this paper
introduces a novel evaluation scheme grounded on supervised machine learning algorithms that
enable the detection and discrimination of different botnets families on real operational
environments. The proposal relies on observing, understanding and inferring the behavior of each
botnet family based on network indicators measured at flow-level. The assumed evaluation
methodology contemplates six phases that allow building a detection model against botnet-related
malware distributed through the network, for which five supervised classifiers were instantiated
were instantiated for further comparisons—Decision Tree, Random Forest, Naive Bayes Gaussian,
Support Vector Machine and K-Neighbors. The experimental validation was performed on two public
datasets of real botnet traffic—CIC-AWS-2018 and ISOT HTTP Botnet. Bearing the heterogeneity of
the datasets, optimizing the analysis with the Grid Search algorithm led to improve the classification
results of the instantiated algorithms. An exhaustive evaluation was carried out demonstrating the
adequateness of our proposal which prompted that Random Forest and Decision Tree models are the
most suitable for detecting different botnet specimens among the chosen algorithms. They exhibited
higher precision rates whilst analyzing a large number of samples with less processing time. The
variety of testing scenarios were deeply assessed and reported to set baseline results for future
benchmark analysis targeted on flow-based behavioral patterns
Benchmark-Based Reference Model for Evaluating Botnet Detection Tools Driven by Traffic-Flow Analytics
Botnets are some of the most recurrent cyber-threats, which take advantage of the wide heterogeneity of endpoint devices at the Edge of the emerging communication environments for enabling the malicious enforcement of fraud and other adversarial tactics, including malware, data leaks or denial of service. There have been significant research advances in the development of accurate botnet detection methods underpinned on supervised analysis but assessing the accuracy and performance of such detection methods requires a clear evaluation model in the pursuit of
enforcing proper defensive strategies. In order to contribute to the mitigation of botnets, this paper
introduces a novel evaluation scheme grounded on supervised machine learning algorithms that enable the detection and discrimination of different botnets families on real operational environments. The proposal relies on observing, understanding and inferring the behavior of
each botnet family based on network indicators measured at flow-level. The assumed evaluation methodology contemplates six phases that allow building a detection model against botnet-related malware distributed through the network, for which five supervised classifiers were instantiated were instantiated for further comparisons—Decision Tree, Random Forest, Naive Bayes Gaussian,
Support Vector Machine and K-Neighbors. The experimental validation was performed on two public
datasets of real botnet traffic—CIC-AWS-2018 and ISOT HTTP Botnet. Bearing the heterogeneity of the datasets, optimizing the analysis with the Grid Search algorithm led to improve the classification results of the instantiated algorithms. An exhaustive evaluation was carried out demonstrating the adequateness of our proposal which prompted that Random Forest and Decision Tree models are the most suitable for detecting different botnet specimens among the chosen algorithms. They exhibited
higher precision rates whilst analyzing a large number of samples with less processing time.
The variety of testing scenarios were deeply assessed and reported to set baseline results for future benchmark analysis targeted on flow-based behavioral patterns
Utilising Deep Learning techniques for effective zero-day attack detection
Machine Learning (ML) and Deep Learning (DL) have been used for building Intrusion Detection Systems (IDS). The increase in both the number and sheer variety of new cyber-attacks poses a tremendous challenge for IDS solutions that rely on a database of historical attack signatures. Therefore, the industrial pull for robust IDS capable of flagging zero-day attacks is growing. Current outlier-based zero-day detection research suffers from high false-negative rates, thus limiting their practical use and performance. This paper proposes an autoencoder implementation to detect zero-day attacks. The aim is to build an IDS model with high recall while keeping the miss rate (false-negatives) to an acceptable minimum. Two well-known IDS datasets are used for evaluation—CICIDS2017 and NSL-KDD. To demonstrate the efficacy of our model, we compare its results against a One-Class Support Vector Machine (SVM). The manuscript highlights the performance of a One-Class SVM when zero-day attacks are distinctive from normal behaviour. The proposed model benefits greatly from autoencoders encoding-decoding capabilities. The results show that autoencoders are well-suited at detecting complex zero-day attacks. The results demonstrate a zero-day detection accuracy of [89% - 99%] for the NSL-KDD dataset and [75% - 98%] for the CICIDS2017 dataset. Finally, the paper outlines the observed trade-off between recall and fallout
Towards Enhancement of Machine Learning Techniques Using CSE-CIC-IDS2018 Cybersecurity Dataset
In machine learning, balanced datasets play a crucial role in the bias observed towards classification and prediction. The CSE-CIC IDS datasets published in 2017 and 2018 have both attracted considerable scholarly attention towards research in intrusion detection systems. Recent work published using this dataset indicates little attention paid to the imbalance of the dataset. The study presented in this paper sets out to explore the degree to which imbalance has been treated and provide a taxonomy of the machine learning approaches developed using these datasets. A survey of published works related to these datasets was done to deliver a combined qualitative and quantitative methodological approach for our analysis towards deriving a taxonomy.
The research presented here confirms that the impact of bias due to the imbalance datasets is rarely addressed. This data supports further research and development of supervised machine learning techniques that reduce bias in classification or prediction due to these imbalance datasets. This study\u27s experiment is to train the model using the train, and test split function from sci-kit learn library on the CSE-CIC-IDS2018. The system needs to be trained by a learning algorithm to accomplish this. There are many machine learning algorithms available and presented by the literature. Among which there are three types of classification based Supervised ML techniques which are used in our study: 1) KNN, 2) Random Forest (RF) and 3) Logistic Regression (LR). This experiment also determines how each of the dataset\u27s 67 preprocessed features affects the ML model\u27s performance. Feature drop selection is performed in two ways, independent and group drop. Experimental results generate the threshold values for each classifier and performance metric values such as accuracy, precision, recall, and F1-score. Also, results are generated from the comparison of manual feature drop methods. A good amount of drop is noticed in the group for most of the classifiers
Performance Evaluation of Apache Spark MLlib Algorithms on an Intrusion Detection Dataset
The increase in the use of the Internet and web services and the advent of
the fifth generation of cellular network technology (5G) along with
ever-growing Internet of Things (IoT) data traffic will grow global internet
usage. To ensure the security of future networks, machine learning-based
intrusion detection and prevention systems (IDPS) must be implemented to detect
new attacks, and big data parallel processing tools can be used to handle a
huge collection of training data in these systems. In this paper Apache Spark,
a general-purpose and fast cluster computing platform is used for processing
and training a large volume of network traffic feature data. In this work, the
most important features of the CSE-CIC-IDS2018 dataset are used for
constructing machine learning models and then the most popular machine learning
approaches, namely Logistic Regression, Support Vector Machine (SVM), three
different Decision Tree Classifiers, and Naive Bayes algorithm are used to
train the model using up to eight number of worker nodes. Our Spark cluster
contains seven machines acting as worker nodes and one machine is configured as
both a master and a worker. We use the CSE-CIC-IDS2018 dataset to evaluate the
overall performance of these algorithms on Botnet attacks and distributed
hyperparameter tuning is used to find the best single decision tree parameters.
We have achieved up to 100% accuracy using selected features by the learning
method in our experimentsComment: Journal of Computing and Security (Isfahan University, Iran), Vol. 9,
No.1, 202
Applications of Artificial Intelligence in IT security
The objective of this work is to explore the intrusion detection prob- lem and create simple rules for detecting specific intrusions. The intrusions are explored in the realistic CSE-CIC-IDS2018 dataset. First, the dataset is analyzed by computing appropriate statistics and visualizing the data. In the data visu- alization various dimensionality reduction methods are tested. After analyzing the dataset the data are normalized and prepared for the training. The training process focuses on feature selection and finding the best model for the intrusion detection problem. The feature selection is also used for creating rules. The rules are extracted from an ensemble of Decision Trees. At the end of this work, the rules are compared to the best model. The experiments demonstrate that the simple rules are able to achieve similar results as the best model and can be used in a rule-based intrusion detection system or be deployed as a simple model. 1CĂlem tĂ©to práce je prozkoumat problematiku detekce ĂştokĹŻ na poÄŤĂ- taÄŤovĂ© systĂ©my a vytvoĹ™it jednoduchá pravidla, která jsou schopna detekovat jednotlivĂ© Ăştoky. Ăštoky jsou prozkoumány na realistickĂ©m datasetu CSE-CIC- IDS2018. Nejprve se práce zabĂ˝vá analĂ˝zou datasetu. V analĂ˝ze jsou spoÄŤĂtány rĹŻznĂ© statistiky datasetu a na závÄ›r jsou otestovanĂ© rĹŻznĂ© metody redukce di- menzĂ pro zobrazenĂ dat v dvou demenzionálnĂm prostoru. Po analĂ˝ze následuje pĹ™Ăprava a normalizace dat. Proces trĂ©novánĂ se pak zaměřuje na vĂ˝bÄ›r vhod- nĂ˝ch pĹ™ĂznakĹŻ a hledánĂ nejlepšĂho modelu. StejnĂ© pĹ™Ăznaky jsou pak pouĹľity i pro vytvářenĂ pravidel. Pravidla jsou extrahována ze souboru rozhodovacĂch stromĹŻ. V závÄ›ru práce jsou pravidla porovnána s nejlepšĂm modelem. Ex- perimenty ukazujĂ, Ĺľe jednoduchá pravidla jsou schopna dosáhnout podobnĂ˝ch vĂ˝sledkĹŻ jako nejlepšà model. Mohou bĂ˝t pouĹľita v pravidlovĂ˝ch systĂ©mech pro detekci ĂştokĹŻ nebo nasazena jako jednoduchĂ˝ model. 1Department of Theoretical Computer Science and Mathematical LogicKatedra teoretickĂ© informatiky a matematickĂ© logikyMatematicko-fyzikálnĂ fakultaFaculty of Mathematics and Physic
- …