169 research outputs found

    Reduction of False Positives in Intrusion Detection Based on Extreme Learning Machine with Situation Awareness

    Get PDF
    Protecting computer networks from intrusions is more important than ever for our privacy, economy, and national security. Seemingly a month does not pass without news of a major data breach involving sensitive personal identity, financial, medical, trade secret, or national security data. Democratic processes can now be potentially compromised through breaches of electronic voting systems. As ever more devices, including medical machines, automobiles, and control systems for critical infrastructure are increasingly networked, human life is also more at risk from cyber-attacks. Research into Intrusion Detection Systems (IDSs) began several decades ago and IDSs are still a mainstay of computer and network protection and continue to evolve. However, detecting previously unseen, or zero-day, threats is still an elusive goal. Many commercial IDS deployments still use misuse detection based on known threat signatures. Systems utilizing anomaly detection have shown great promise to detect previously unseen threats in academic research. But their success has been limited in large part due to the excessive number of false positives that they produce. This research demonstrates that false positives can be better minimized, while maintaining detection accuracy, by combining Extreme Learning Machine (ELM) and Hidden Markov Models (HMM) as classifiers within the context of a situation awareness framework. This research was performed using the University of New South Wales - Network Based 2015 (UNSW-NB15) data set which is more representative of contemporary cyber-attack and normal network traffic than older data sets typically used in IDS research. It is shown that this approach provides better results than either HMM or ELM alone and with a lower False Positive Rate (FPR) than other comparable approaches that also used the UNSW-NB15 data set

    Intrusion detection for industrial control systems

    Get PDF
    Industrial Control Systems (ICS) are rapidly shifting from closed local networks, to remotely accessible networks. This shift has created a need for strong cybersecurity anomaly and intrusion detection for these systems; however, due to the complexity and diversity of ICSs, well defined and reliable anomaly and intrusion detection systems are still being developed. Machine learning approaches for anomaly and intrusion detection on the network level may provide general protection that can be applied to any ICS. This paper explores two machine learning applications for classifying the attack label of the UNSW-NB15 dataset. The UNSW-NB15 is a benchmark dataset that was created off general network communications and includes labels for normal behavior and attack vectors. A baseline was created using K-Nearest Neighbors (kNN) due to its mathematical simplicity. Once the baseline was created a feed forward artificial neural network known as a Multi-Layer Perceptron (MLP), was implemented for comparison due to its ease of reuse for running in a production environment. The experimental results show that both kNN and MLPs are effective approaches for identifying malicious network traffic; although, both still need to be further refined and improved before implementation on a real-world production scale

    Advancing IoT Security with Tsetlin Machines: A Resource-Efficient Anomaly Detection Approach

    Get PDF
    The number of IoT devices are rapidly increasing, and the nature of the devices leave them vulnerable to attacks. As of today there are no general security solutions that meet the requirements of running with limited resources on devices with a large variety of use cases. Traditional AI models are able to classify and distinguish between benign and malignant network traffic. However, they require more resources than IoT devices can provide, and cannot train on-chip once deployed. This thesis introduces the Tsetlin Machine as a potential solution to this problem. As a binary, propositional logic model, the Tsetlin Machine is compatible with hardware and can perform predictions in near real-time on limited resources, making it a suitable candidate for intrusion detection in IoT devices. To assess the viability of the Tsetlin Machine as an IDS, we developed custom data loaders for the benchmark datasets: CIC-IDS2017, KDD99, NSL-KDD, UNSW-NB15, and UNSW-Bot-IoT. We ran hyperparameter searches and numerous experiments to determine the performance of the Tsetlin machine on each dataset. We discovered that preprocessing data by converting each data value to a 32-bit binary number and imposing an upper bound on class sizes proved to be an effective strategy. Furthermore, we compared the performance of the Tsetlin Machine against various classifiers from the scikit-learn library and lazy predict. The results show that the Tsetlin Machine's performance was on par with, if not superior to, other machine learning models, indicating its potential as a reliable method for anomaly detection in IoT devices. However, future work is required to determine its viability in a real-life setting, running on limited resources and classifying real-time data

    Advancing IoT Security with Tsetlin Machines: A Resource-Efficient Anomaly Detection Approach

    Get PDF
    The number of IoT devices are rapidly increasing, and the nature of the devices leave them vulnerable to attacks. As of today there are no general security solutions that meet the requirements of running with limited resources on devices with a large variety of use cases. Traditional AI models are able to classify and distinguish between benign and malignant network traffic. However, they require more resources than IoT devices can provide, and cannot train on-chip once deployed. This thesis introduces the Tsetlin Machine as a potential solution to this problem. As a binary, propositional logic model, the Tsetlin Machine is compatible with hardware and can perform predictions in near real-time on limited resources, making it a suitable candidate for intrusion detection in IoT devices. To assess the viability of the Tsetlin Machine as an IDS, we developed custom data loaders for the benchmark datasets: CIC-IDS2017, KDD99, NSL-KDD, UNSW-NB15, and UNSW-Bot-IoT. We ran hyperparameter searches and numerous experiments to determine the performance of the Tsetlin machine on each dataset. We discovered that preprocessing data by converting each data value to a 32-bit binary number and imposing an upper bound on class sizes proved to be an effective strategy. Furthermore, we compared the performance of the Tsetlin Machine against various classifiers from the scikit-learn library and lazy predict. The results show that the Tsetlin Machine's performance was on par with, if not superior to, other machine learning models, indicating its potential as a reliable method for anomaly detection in IoT devices. However, future work is required to determine its viability in a real-life setting, running on limited resources and classifying real-time data

    Feature Selection in UNSW-NB15 and KDDCUP’99 datasets

    Get PDF
    Machine learning and data mining techniques have been widely used in order to improve network intrusion detection in recent years. These techniques make it possible to automate anomaly detection in network traffics. One of the major problems that researchers are facing is the lack of published data available for research purposes. The KDD’99 dataset was used by researchers for over a decade even though this dataset was suffering from some reported shortcomings and it was criticized by few researchers. In 2009, Tavallaee M. et al. proposed a new dataset (NSL-KDD) extracted from the KDD’99 dataset in order to improve the dataset where it can be used for carrying out research in anomaly detection. The UNSW-NB15 dataset is the latest published dataset which was created in 2015 for research purposes in intrusion detection. This research is analysing the features included in the UNSW-NB15 dataset by employing machine learning techniques and exploring significant features (curse of high dimensionality) by which intrusion detection can be improved in network systems. Therefore, the existing irrelevant and redundant features are omitted from the dataset resulting not only faster training and testing process but also less resource consumption while maintaining high detection rates. A subset of features is proposed in this study and the findings are compared with the previous work in relation to features selection in the KDD’99 dataset

    Analysis of Theoretical and Applied Machine Learning Models for Network Intrusion Detection

    Get PDF
    Network Intrusion Detection System (IDS) devices play a crucial role in the realm of network security. These systems generate alerts for security analysts by performing signature-based and anomaly-based detection on malicious network traffic. However, there are several challenges when configuring and fine-tuning these IDS devices for high accuracy and precision. Machine learning utilizes a variety of algorithms and unique dataset input to generate models for effective classification. These machine learning techniques can be applied to IDS devices to classify and filter anomalous network traffic. This combination of machine learning and network security provides improved automated network defense by developing highly-optimized IDS models that utilize unique algorithms for enhanced intrusion detection. Machine learning models can be trained using a combination of machine learning algorithms, network intrusion datasets, and optimization techniques. This study sought to identify which variation of these parameters yielded the best-performing network intrusion detection models, measured by their accuracy, precision, recall, and F1 score metrics. Additionally, this research aimed to validate theoretical models’ metrics by applying them in a real-world environment to see if they perform as expected. This research utilized a quantitative experimental study design to organize a two-phase approach to train and test a series of machine learning models for network intrusion detection by utilizing Python scripting, the scikit-learn library, and Zeek IDS software. The first phase involved optimizing and training 105 machine learning models by testing a combination of seven machine learning algorithms, five network intrusion datasets, and three optimization methods. These 105 models were then fed into the second phase, where the models were applied in a machine learning IDS pipeline to observe how the models performed in an implemented environment. The results of this study identify which algorithms, datasets, and optimization methods generate the best-performing models for network intrusion detection. This research also showcases the need to utilize various algorithms and datasets since no individual algorithm or dataset consistently achieved high metric scores independent of other training variables. Additionally, this research also indicates that optimization during model development is highly recommended; however, there may not be a need to test for multiple optimization methods since they did not typically impact the yielded models’ overall categorization of v success or failure. Lastly, this study’s results strongly indicate that theoretical machine learning models will most likely perform significantly worse when applied in an implemented IDS ML pipeline environment. This study can be utilized by other industry professionals and research academics in the fields of information security and machine learning to generate better highly-optimized models for their work environments or experimental research

    Deep Learning-Based Intrusion Detection Methods for Computer Networks and Privacy-Preserving Authentication Method for Vehicular Ad Hoc Networks

    Get PDF
    The incidence of computer network intrusions has significantly increased over the last decade, partially attributed to a thriving underground cyber-crime economy and the widespread availability of advanced tools for launching such attacks. To counter these attacks, researchers in both academia and industry have turned to machine learning (ML) techniques to develop Intrusion Detection Systems (IDSes) for computer networks. However, many of the datasets use to train ML classifiers for detecting intrusions are not balanced, with some classes having fewer samples than others. This can result in ML classifiers producing suboptimal results. In this dissertation, we address this issue and present better ML based solutions for intrusion detection. Our contributions in this direction can be summarized as follows: Balancing Data Using Synthetic Data to detect intrusions in Computer Networks: In the past, researchers addressed the issue of imbalanced data in datasets by using over-sampling and under-sampling techniques. In this study, we go beyond such traditional methods and utilize a synthetic data generation method called Con- ditional Generative Adversarial Network (CTGAN) to balance the datasets and in- vestigate its impact on the performance of widely used ML classifiers. To the best of our knowledge, no one else has used CTGAN to generate synthetic samples for balancing intrusion detection datasets. We use two widely used publicly available datasets and conduct extensive experiments and show that ML classifiers trained on these datasets balanced with synthetic samples generated by CTGAN have higher prediction accuracy and Matthew Correlation Coefficient (MCC) scores than those trained on imbalanced datasets by 8% and 13%, respectively. Deep Learning approach for intrusion detection using focal loss function: To overcome the data imbalance problem for intrusion detection, we leverage the specialized loss function, called focal loss, that automatically down-weighs easy ex- amples and focuses on the hard negatives by facilitating dynamically scaled-gradient updates for training ML models effectively. We implement our approach using two well-known Deep Learning (DL) neural network architectures. Compared to training DL models using cross-entropy loss function, our approach (training DL models using focal loss function) improved accuracy, precision, F1 score, and MCC score by 24%, 39%, 39%, and 60% respectively. Efficient Deep Learning approach to detect Intrusions using Few-shot Learning: To address the issue of imbalance the datasets and develop a highly effective IDS, we utilize the concept of few-shot learning. We present a Few-Shot and Self-Supervised learning framework, called FS3, for detecting intrusions in IoT networks. FS3 works in three phases. Our approach involves first pretraining an encoder on a large-scale external dataset in a selfsupervised manner. We then employ few-shot learning (FSL), which seeks to replicate the encoder’s ability to learn new patterns from only a few training examples. During the encoder training us- ing a small number of samples, we train them contrastively, utilizing the triplet loss function. The third phase introduces a novel K-Nearest neighbor algorithm that sub- samples the majority class instances to further reduce imbalance and improve overall performance. Our proposed framework FS3, utilizing only 20% of labeled data, out- performs fully supervised state-of-the-art models by up to 42.39% and 43.95% with respect to the metrics precision and F1 score, respectively. The rapid evolution of the automotive industry and advancements in wireless com- munication technologies will result in the widespread deployment of Vehicular ad hoc networks (VANETs). However, despite the network’s potential to enable intelligent and autonomous driving, it also introduces various attack vectors that can jeopardize its security. In this dissertation, we present efficient privacy-preserving authenticated message dissemination scheme in VANETs. Conditional Privacy-preserving Authentication and Message Dissemination Scheme using Timestamp based Pseudonyms: To authenticate a message sent by a vehicle using its pseudonym, a certificate of the pseudonym signed by the central authority is generally utilized. If a vehicle is found to be malicious, certificates associated with all the pseudonyms assigned to it must be revoked. Certificate revocation lists (CRLs) should be shared with all entities that will be corresponding with the vehicle. As each vehicle has a large pool of pseudonyms allocated to it, the CRL can quickly grow in size as the number of revoked vehicles increases. This results in high storage overheads for storing the CRL, and significant authentication overheads as the receivers must check their CRL for each message received to verify its pseudonym. To address this issue, we present a timestamp-based pseudonym allocation scheme that reduces the storage overhead and authentication overhead by streamlining the CRL management process
    • …
    corecore