1,095 research outputs found
Deep Learning-Based Intrusion Detection Methods for Computer Networks and Privacy-Preserving Authentication Method for Vehicular Ad Hoc Networks
The incidence of computer network intrusions has significantly increased over the last decade, partially attributed to a thriving underground cyber-crime economy and the widespread availability of advanced tools for launching such attacks. To counter these attacks, researchers in both academia and industry have turned to machine learning (ML) techniques to develop Intrusion Detection Systems (IDSes) for computer networks. However, many of the datasets use to train ML classifiers for detecting intrusions are not balanced, with some classes having fewer samples than others. This can result in ML classifiers producing suboptimal results. In this dissertation, we address this issue and present better ML based solutions for intrusion detection. Our contributions in this direction can be summarized as follows:
Balancing Data Using Synthetic Data to detect intrusions in Computer Networks: In the past, researchers addressed the issue of imbalanced data in datasets by using over-sampling and under-sampling techniques. In this study, we go beyond such traditional methods and utilize a synthetic data generation method called Con- ditional Generative Adversarial Network (CTGAN) to balance the datasets and in- vestigate its impact on the performance of widely used ML classifiers. To the best of our knowledge, no one else has used CTGAN to generate synthetic samples for balancing intrusion detection datasets. We use two widely used publicly available datasets and conduct extensive experiments and show that ML classifiers trained on these datasets balanced with synthetic samples generated by CTGAN have higher prediction accuracy and Matthew Correlation Coefficient (MCC) scores than those trained on imbalanced datasets by 8% and 13%, respectively.
Deep Learning approach for intrusion detection using focal loss function: To overcome the data imbalance problem for intrusion detection, we leverage the specialized loss function, called focal loss, that automatically down-weighs easy ex- amples and focuses on the hard negatives by facilitating dynamically scaled-gradient updates for training ML models effectively. We implement our approach using two well-known Deep Learning (DL) neural network architectures. Compared to training DL models using cross-entropy loss function, our approach (training DL models using focal loss function) improved accuracy, precision, F1 score, and MCC score by 24%, 39%, 39%, and 60% respectively.
Efficient Deep Learning approach to detect Intrusions using Few-shot Learning: To address the issue of imbalance the datasets and develop a highly effective IDS, we utilize the concept of few-shot learning. We present a Few-Shot and Self-Supervised learning framework, called FS3, for detecting intrusions in IoT networks. FS3 works in three phases. Our approach involves first pretraining an encoder on a large-scale external dataset in a selfsupervised manner. We then employ few-shot learning (FSL), which seeks to replicate the encoder’s ability to learn new patterns from only a few training examples. During the encoder training us- ing a small number of samples, we train them contrastively, utilizing the triplet loss function. The third phase introduces a novel K-Nearest neighbor algorithm that sub- samples the majority class instances to further reduce imbalance and improve overall performance. Our proposed framework FS3, utilizing only 20% of labeled data, out- performs fully supervised state-of-the-art models by up to 42.39% and 43.95% with respect to the metrics precision and F1 score, respectively.
The rapid evolution of the automotive industry and advancements in wireless com- munication technologies will result in the widespread deployment of Vehicular ad hoc networks (VANETs). However, despite the network’s potential to enable intelligent and autonomous driving, it also introduces various attack vectors that can jeopardize its security. In this dissertation, we present efficient privacy-preserving authenticated message dissemination scheme in VANETs.
Conditional Privacy-preserving Authentication and Message Dissemination Scheme using Timestamp based Pseudonyms: To authenticate a message sent by a vehicle using its pseudonym, a certificate of the pseudonym signed by the central authority is generally utilized. If a vehicle is found to be malicious, certificates associated with all the pseudonyms assigned to it must be revoked. Certificate revocation lists (CRLs) should be shared with all entities that will be corresponding with the vehicle. As each vehicle has a large pool of pseudonyms allocated to it, the CRL can quickly grow in size as the number of revoked vehicles increases. This results in high storage overheads for storing the CRL, and significant authentication overheads as the receivers must check their CRL for each message received to verify its pseudonym. To address this issue, we present a timestamp-based pseudonym allocation scheme that reduces the storage overhead and authentication overhead by streamlining the CRL management process
A Multi-Stage Classification Approach for IoT Intrusion Detection Based on Clustering with Oversampling
This research received no external funding. The APC is funded by Prince Sultan UniversityThe authors would like to acknowledge the support of Prince Sultan University
for paying the Article Processing Charges (APC) of this publication.Intrusion detection of IoT-based data is a hot topic and has received a lot of interests from researchers and practitioners since the security of IoT networks is crucial. Both supervised and unsupervised learning methods are used for intrusion detection of IoT networks. This paper proposes an approach of three stages considering a clustering with reduction stage, an oversampling stage, and a classification by a Single Hidden Layer Feed-Forward Neural Network (SLFN) stage. The novelty of the paper resides in the technique of data reduction and data oversampling for generating useful and balanced training data and the hybrid consideration of the unsupervised and supervised methods for detecting the intrusion activities. The experiments were evaluated in terms of accuracy, precision, recall, and G-mean and divided into four steps: measuring the effect of the data reduction with clustering, the evaluation of the framework with basic classifiers, the effect of the oversampling technique, and a comparison with basic classifiers. The results show that SLFN classification technique and the choice of Support Vector Machine and Synthetic Minority Oversampling Technique (SVM-SMOTE) with a ratio of 0.9 and the k value of 3 for k-means++ clustering technique give better results than other values and other classification techniques.Prince Sultan Universit
Machine Learning Based Network Vulnerability Analysis of Industrial Internet of Things
It is critical to secure the Industrial Internet of Things (IIoT) devices
because of potentially devastating consequences in case of an attack. Machine
learning and big data analytics are the two powerful leverages for analyzing
and securing the Internet of Things (IoT) technology. By extension, these
techniques can help improve the security of the IIoT systems as well. In this
paper, we first present common IIoT protocols and their associated
vulnerabilities. Then, we run a cyber-vulnerability assessment and discuss the
utilization of machine learning in countering these susceptibilities. Following
that, a literature review of the available intrusion detection solutions using
machine learning models is presented. Finally, we discuss our case study, which
includes details of a real-world testbed that we have built to conduct
cyber-attacks and to design an intrusion detection system (IDS). We deploy
backdoor, command injection, and Structured Query Language (SQL) injection
attacks against the system and demonstrate how a machine learning based anomaly
detection system can perform well in detecting these attacks. We have evaluated
the performance through representative metrics to have a fair point of view on
the effectiveness of the methods
Deep learning with focal loss approach for attacks classification
The rapid development of deep learning improves the detection and classification of attacks on intrusion detection systems. However, the unbalanced data issue increases the complexity of the architecture model. This study proposes a novel deep learning model to overcome the problem of classifying multi-class attacks. The deep learning model consists of two stages. The pre-tuning stage uses automatic feature extraction with a deep autoencoder. The second stage is fine-tuning using deep neural network classifiers with fully connected layers. To reduce imbalanced class data, the feature extraction was implemented using the deep autoencoder and improved focal loss function in the classifier. The model was evaluated using 3 loss functions, including cross-entropy, weighted cross-entropy, and focal losses. The results could correct the class imbalance in deep learning-based classifications. Attack classification was achieved using automatic extraction with the focal loss on the CSE-CIC-IDS2018 dataset is a high-quality classifier with 98.38% precision, 98.27% sensitivity, and 99.82% specificity
Application of advanced machine learning techniques to early network traffic classification
The fast-paced evolution of the Internet is drawing a complex context which
imposes demanding requirements to assure end-to-end Quality of Service. The
development of advanced intelligent approaches in networking is envisioning
features that include autonomous resource allocation, fast reaction against
unexpected network events and so on. Internet Network Traffic Classification
constitutes a crucial source of information for Network Management, being decisive
in assisting the emerging network control paradigms. Monitoring traffic flowing
through network devices support tasks such as: network orchestration, traffic
prioritization, network arbitration and cyberthreats detection, amongst others.
The traditional traffic classifiers became obsolete owing to the rapid Internet
evolution. Port-based classifiers suffer from significant accuracy losses due to port
masking, meanwhile Deep Packet Inspection approaches have severe user-privacy
limitations. The advent of Machine Learning has propelled the application of
advanced algorithms in diverse research areas, and some learning approaches have
proved as an interesting alternative to the classic traffic classification approaches.
Addressing Network Traffic Classification from a Machine Learning perspective
implies numerous challenges demanding research efforts to achieve feasible
classifiers. In this dissertation, we endeavor to formulate and solve important
research questions in Machine-Learning-based Network Traffic Classification. As a
result of numerous experiments, the knowledge provided in this research constitutes
an engaging case of study in which network traffic data from two different
environments are successfully collected, processed and modeled.
Firstly, we approached the Feature Extraction and Selection processes providing our
own contributions. A Feature Extractor was designed to create Machine-Learning
ready datasets from real traffic data, and a Feature Selection Filter based on fast
correlation is proposed and tested in several classification datasets. Then, the
original Network Traffic Classification datasets are reduced using our Selection
Filter to provide efficient classification models. Many classification models based on
CART Decision Trees were analyzed exhibiting excellent outcomes in identifying
various Internet applications. The experiments presented in this research comprise
a comparison amongst ensemble learning schemes, an exploratory study on Class
Imbalance and solutions; and an analysis of IP-header predictors for early traffic
classification. This thesis is presented in the form of compendium of JCR-indexed
scientific manuscripts and, furthermore, one conference paper is included.
In the present work we study a wide number of learning approaches employing the
most advance methodology in Machine Learning. As a result, we identify the
strengths and weaknesses of these algorithms, providing our own solutions to
overcome the observed limitations. Shortly, this thesis proves that Machine
Learning offers interesting advanced techniques that open prominent prospects in
Internet Network Traffic Classification.Departamento de TeorÃa de la Señal y Comunicaciones e IngenierÃa TelemáticaDoctorado en TecnologÃas de la Información y las Telecomunicacione
A Comparative Analysis of Machine Learning Models for Banking News Extraction by Multiclass Classification With Imbalanced Datasets of Financial News: Challenges and Solutions
Online portals provide an enormous amount of news articles every day. Over the years, numerous studies have concluded that news events have a significant impact on forecasting and interpreting the movement of stock prices. The creation of a framework for storing news-articles and collecting information for specific domains is an important and untested problem for the Indian stock market. When online news portals produce financial news articles about many subjects simultaneously, finding news articles that are important to the specific domain is nontrivial. A critical component of the aforementioned system should, therefore, include one module for extracting and storing news articles, and another module for classifying these text documents into a specific domain(s). In the current study, we have performed extensive experiments to classify the financial news articles into the predefined four classes Banking, Non-Banking, Governmental, and Global. The idea of multi-class classification was to extract the Banking news and its most correlated news articles from the pool of financial news articles scraped from various web news portals. The news articles divided into the mentioned classes were imbalanced. Imbalance data is a big difficulty with most classifier learning algorithms. However, as recent works suggest, class imbalances are not in themselves a problem, and degradation in performance is often correlated with certain variables relevant to data distribution, such as the existence in noisy and ambiguous instances in the adjacent class boundaries. A variety of solutions to addressing data imbalances have been proposed recently, over-sampling, down-sampling, and ensemble approach. We have presented the various challenges that occur with data imbalances in multiclass classification and solutions in dealing with these challenges. The paper has also shown a comparison of the performances of various machine learning models with imbalanced data and data balances using sampling and ensemble techniques. From the result, it’s clear that the performance of Random Forest classifier with data balances using the over-sampling technique SMOTE is best in terms of precision, recall, F-1, and accuracy. From the ensemble classifiers, the Balanced Bagging classifier has shown similar results as of the Random Forest classifier with SMOTE. Random forest classifier's accuracy, however, was 100% and it was 99% with the Balanced Bagging classifier
A taxonomy of network threats and the effect of current datasets on intrusion detection systems
As the world moves towards being increasingly dependent on computers and automation, building secure applications, systems and networks are some of the main challenges faced in the current decade. The number of threats that individuals and businesses face is rising exponentially due to the increasing complexity of networks and services of modern networks. To alleviate the impact of these threats, researchers have proposed numerous solutions for anomaly detection; however, current tools often fail to adapt to ever-changing architectures, associated threats and zero-day attacks. This manuscript aims to pinpoint research gaps and shortcomings of current datasets, their impact on building Network Intrusion Detection Systems (NIDS) and the growing number of sophisticated threats. To this end, this manuscript provides researchers with two key pieces of information; a survey of prominent datasets, analyzing their use and impact on the development of the past decade’s Intrusion Detection Systems (IDS) and a taxonomy of network threats and associated tools to carry out these attacks. The manuscript highlights that current IDS research covers only 33.3% of our threat taxonomy. Current datasets demonstrate a clear lack of real-network threats, attack representation and include a large number of deprecated threats, which together limit the detection accuracy of current machine learning IDS approaches. The unique combination of the taxonomy and the analysis of the datasets provided in this manuscript aims to improve the creation of datasets and the collection of real-world data. As a result, this will improve the efficiency of the next generation IDS and reflect network threats more accurately within new datasets
A taxonomy of network threats and the effect of current datasets on intrusion detection systems
As the world moves towards being increasingly dependent on computers and automation, building secure applications, systems and networks are some of the main challenges faced in the current decade. The number of threats that individuals and businesses face is rising exponentially due to the increasing complexity of networks and services of modern networks. To alleviate the impact of these threats, researchers have proposed numerous solutions for anomaly detection; however, current tools often fail to adapt to ever-changing architectures, associated threats and zero-day attacks. This manuscript aims to pinpoint research gaps and shortcomings of current datasets, their impact on building Network Intrusion Detection Systems (NIDS) and the growing number of sophisticated threats. To this end, this manuscript provides researchers with two key pieces of information; a survey of prominent datasets, analyzing their use and impact on the development of the past decade's Intrusion Detection Systems (IDS) and a taxonomy of network threats and associated tools to carry out these attacks. The manuscript highlights that current IDS research covers only 33.3% of our threat taxonomy. Current datasets demonstrate a clear lack of real-network threats, attack representation and include a large number of deprecated threats, which together limit the detection accuracy of current machine learning IDS approaches. The unique combination of the taxonomy and the analysis of the datasets provided in this manuscript aims to improve the creation of datasets and the collection of real-world data. As a result, this will improve the efficiency of the next generation IDS and reflect network threats more accurately within new datasets
Network Intrusion Detection with Limited Labeled Data
With the increasing dependency of daily life over computer networks, the
importance of these networks security becomes prominent. Different intrusion
attacks to networks have been designed and the attackers are working on
improving them. Thus the ability to detect intrusion with limited number of
labeled data is desirable to provide networks with higher level of security. In
this paper we design an intrusion detection system based on a deep neural
network. The proposed system is based on self-supervised contrastive learning
where a huge amount of unlabeled data can be used to generate informative
representation suitable for various downstream tasks with limited number of
labeled data. Using different experiments, we have shown that the proposed
system presents an accuracy of 94.05% over the UNSW-NB15 dataset, an
improvement of 4.22% in comparison to previous method based on self-supervised
learning. Our simulations have also shown impressive results when the size of
labeled training data is limited. The performance of the resulting Encoder
Block trained on UNSW-NB15 dataset has also been tested on other datasets for
representation extraction which shows competitive results in downstream tasks
IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
With the wide spread of sensors and smart devices in recent years, the data
generation speed of the Internet of Things (IoT) systems has increased
dramatically. In IoT systems, massive volumes of data must be processed,
transformed, and analyzed on a frequent basis to enable various IoT services
and functionalities. Machine Learning (ML) approaches have shown their capacity
for IoT data analytics. However, applying ML models to IoT data analytics tasks
still faces many difficulties and challenges, specifically, effective model
selection, design/tuning, and updating, which have brought massive demand for
experienced data scientists. Additionally, the dynamic nature of IoT data may
introduce concept drift issues, causing model performance degradation. To
reduce human efforts, Automated Machine Learning (AutoML) has become a popular
field that aims to automatically select, construct, tune, and update machine
learning models to achieve the best performance on specified tasks. In this
paper, we conduct a review of existing methods in the model selection, tuning,
and updating procedures in the area of AutoML in order to identify and
summarize the optimal solutions for every step of applying ML algorithms to IoT
data analytics. To justify our findings and help industrial users and
researchers better implement AutoML approaches, a case study of applying AutoML
to IoT anomaly detection problems is conducted in this work. Lastly, we discuss
and classify the challenges and research directions for this domain.Comment: Published in Engineering Applications of Artificial Intelligence
(Elsevier, IF:7.8); Code/An AutoML tutorial is available at Github link:
https://github.com/Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytic
- …