148 research outputs found

    Detection and Explanation of Distributed Denial of Service (DDoS) Attack Through Interpretable Machine Learning

    Get PDF
    Distributed denial of service (DDoS) is a network-based attack where the aim of the attacker is to overwhelm the victim server. The attacker floods the server by sending enormous amount of network packets in a distributed manner beyond the servers capacity and thus causing the disruption of its normal service. In this dissertation, we focus to build intelligent detectors that can learn by themselves with less human interactions and detect DDoS attacks accurately. Machine learning (ML) has promising outcomes throughout the technologies including cybersecurity and provides us with intelligence when applied on Intrusion Detection Systems (IDSs). In addition, from the state-of-the-art ML-based IDSs, the Ensemble classifier (combination of classifiers) outperforms single classifier. Therefore, we have implemented both supervised and unsupervised ensemble frameworks to build IDSs for better DDoS detection accuracy with lower false alarms compared to the existing ones. Our experimentation, done with the most popular and benchmark datasets such as NSL-KDD, UNSW-NB15, and CICIDS2017, have achieved at most detection accuracy of 99.1% with the lowest false positive rate of 0.01%. As feature selection is one of the mandatory preprocessing phases in ML classification, we have designed several feature selection techniques for better performances in terms of DDoS detection accuracy, false positive alarms, and training times. Initially, we have implemented an ensemble framework for feature selection (FS) methods which combines almost all well-known FS methods and yields better outcomes compared to any single FS method.The goal of my dissertation is not only to detect DDoS attacks precisely but also to demonstrate explanations for these detections. Interpretable machine learning (IML) technique is used to explain a detected DDoS attack with the help of the effectiveness of the corresponding features. We also have implemented a novel feature selection approach based on IML which helps to find optimum features that are used further to retrain our models. The retrained model gives better performances than general feature selection process. Moreover, we have developed an explainer model using IML that identifies detected DDoS attacks with proper explanations based on effectiveness of the features. The contribution of this dissertation is five-folded with the ultimate goal of detecting the most frequent DDoS attacks in cyber security. In order to detect DDoS attacks, we first used ensemble machine learning classification with both supervised and unsupervised classifiers. For better performance, we then implemented and applied two feature selection approaches, such as ensemble feature selection framework and IML based feature selection approach, both individually and in a combination with supervised ensemble framework. Furthermore, we exclusively added explanations for the detected DDoS attacks with the help of explainer models that are built using LIME and SHAP IML methods. To build trustworthy explainer models, a detailed survey has been conducted on interpretable machine learning methods and on their associated tools. We applied the designed framework in various domains, like smart grid and NLP-based IDS to verify its efficacy and ability of performing as a generic model

    HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs)

    Full text link
    Machine learning (ML) is crucial in network anomaly detection for proactive threat hunting, reducing detection and response times significantly. However, challenges in model training, maintenance, and frequent false positives impact its acceptance and reliability. Explainable AI (XAI) attempts to mitigate these issues, allowing cybersecurity teams to assess AI-generated alerts with confidence, but has seen limited acceptance from incident responders. Large Language Models (LLMs) present a solution through discerning patterns in extensive information and adapting to different functional requirements. We present HuntGPT, a specialized intrusion detection dashboard applying a Random Forest classifier using the KDD99 dataset, integrating XAI frameworks like SHAP and Lime for user-friendly and intuitive model interaction, and combined with a GPT-3.5 Turbo, it delivers threats in an understandable format. The paper delves into the system's architecture, components, and technical accuracy, assessed through Certified Information Security Manager (CISM) Practice Exams, evaluating response quality across six metrics. The results demonstrate that conversational agents, supported by LLM and integrated with XAI, provide robust, explainable, and actionable AI solutions in intrusion detection, enhancing user understanding and interactive experience

    Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research

    Get PDF
    This survey presents a comprehensive review of current literature on Explainable Artificial Intelligence (XAI) methods for cyber security applications. Due to the rapid development of Internet-connected systems and Artificial Intelligence in recent years, Artificial Intelligence including Machine Learning and Deep Learning has been widely utilized in the fields of cyber security including intrusion detection, malware detection, and spam filtering. However, although Artificial Intelligence-based approaches for the detection and defense of cyber attacks and threats are more advanced and efficient compared to the conventional signature-based and rule-based cyber security strategies, most Machine Learning-based techniques and Deep Learning-based techniques are deployed in the “black-box” manner, meaning that security experts and customers are unable to explain how such procedures reach particular conclusions. The deficiencies of transparencies and interpretability of existing Artificial Intelligence techniques would decrease human users’ confidence in the models utilized for the defense against cyber attacks, especially in current situations where cyber attacks become increasingly diverse and complicated. Therefore, it is essential to apply XAI in the establishment of cyber security models to create more explainable models while maintaining high accuracy and allowing human users to comprehend, trust, and manage the next generation of cyber defense mechanisms. Although there are papers reviewing Artificial Intelligence applications in cyber security areas and the vast literature on applying XAI in many fields including healthcare, financial services, and criminal justice, the surprising fact is that there are currently no survey research articles that concentrate on XAI applications in cyber security. Therefore, the motivation behind the survey is to bridge the research gap by presenting a detailed and up-to-date survey of XAI approaches applicable to issues in the cyber security field. Our work is the first to propose a clear roadmap for navigating the XAI literature in the context of applications in cyber security

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Explainable Intrusion Detection Systems using white box techniques

    Get PDF
    Artificial Intelligence (AI) has found increasing application in various domains, revolutionizing problem-solving and data analysis. However, in decision-sensitive areas like Intrusion Detection Systems (IDS), trust and reliability are vital, posing challenges for traditional black box AI systems. These black box IDS, while accurate, lack transparency, making it difficult to understand the reasons behind their decisions. This dissertation explores the concept of eXplainable Intrusion Detection Systems (X-IDS), addressing the issue of trust in X-IDS. It explores the limitations of common black box IDS and the complexities of explainability methods, leading to the fundamental question of trusting explanations generated by black box explainer modules. To address these challenges, this dissertation presents the concept of white box explanations, which are innately explainable. While white box algorithms are typically simpler and more interpretable, they often sacrifice accuracy. However, this work utilized white box Competitive Learning (CL), which can achieve competitive accuracy in comparison to black box IDS. We introduce Rule Extraction (RE) as another white box technique that can be applied to explain black box IDS. It involves training decision trees on the inputs, weights, and outputs of black box models, resulting in human-readable rulesets that serve as global model explanations. These white box techniques offer the benefits of accuracy and trustworthiness, which are challenging to achieve simultaneously. This work aims to address gaps in the existing literature, including the need for highly accurate white box IDS, a methodology for understanding explanations, small testing datasets, and comparisons between white box and black box models. To achieve these goals, the study employs CL and eclectic RE algorithms. CL models offer innate explainability and high accuracy in IDS applications, while eclectic RE enhances trustworthiness. The contributions of this dissertation include a novel X-IDS architecture featuring Self-Organizing Map (SOM) models that adhere to DARPA’s guidelines for explainable systems, an extended X-IDS architecture incorporating three CL-based algorithms, and a hybrid X-IDS architecture combining a Deep Neural Network (DNN) predictor with a white box eclectic RE explainer. These architectures create more explainable, trustworthy, and accurate X-IDS systems, paving the way for enhanced AI solutions in decision-sensitive domains

    iSee: a case-based reasoning platform for the design of explanation experiences.

    Get PDF
    Explainable Artificial Intelligence (XAI) is an emerging field within Artificial Intelligence (AI) that has provided many methods that enable humans to understand and interpret the outcomes of AI systems. However, deciding on the best explanation approach for a given AI problem is currently a challenging decision-making task. This paper presents the iSee project, which aims to address some of the XAI challenges by providing a unifying platform where personalized explanation experiences are generated using Case-Based Reasoning. An explanation experience includes the proposed solution to a particular explainability problem and its corresponding evaluation, provided by the end user. The ultimate goal is to provide an open catalog of explanation experiences that can be transferred to other scenarios where trustworthy AI is required

    Addressing Pragmatic Challenges in Utilizing AI for Security of Industrial IoT

    Get PDF
    Industrial control systems (ICSs) are an essential part of every nation\u27s critical infrastructure and have been utilized for a long time to supervise industrial machines and processes. Today’s ICSs are substantially different from the information technology (IT) devices a decade ago. The integration of internet of things (IoT) technology has made them more efficient and optimized, improved automation, and increased quality and compliance. Now, they are a sub (and arguably the most critical) part of IoT\u27s domain, called industrial IoT (IIoT). In the past, to secure ICSs from malicious outside attack, these systems were isolated from the outside world. However, recent advances, increased connectivity with corporate networks, and utilization of internet communications to transmit the information more conveniently have introduced the possibility of cyber-attacks against these systems. Due to the sensitive nature of the industrial applications, security is the foremost concern. We discuss why despite the exceptional performance of artificial intelligent (AI) and machine learning (ML), industry leaders still have a hard time utilizing these models in practice as a standalone units. The goal of this dissertation is to address some of these challenges to help pave the way of utilizing smarter and more modern security solutions in these systems. To be specific, here, we focus on data scarcity for the AI, black-box nature of the AI, high computational load of the AI. Industrial companies almost never release their network data, because they are obligated to follow confidentiality laws and user privacy restrictions. Hence, real-world IIoT datasets are not available for security research area, and we face a data scarcity challenge in IIoT security research community. In this domain, the researchers usually have to resort to commercial or public datasets that are not specific to this domain. In our work, we have developed a real-world testbed that resembles an actual industrial plant. We have emulated a popular industrial system in water treatment processes. So, we could collect datasets containing realist traffic to conduct our research. There exists several specific characteristics of IIoT networks that are unique to them. We have provided an extensive study to figure out them and incorporate them in the design. We have gathered information on relevant cyber-attacks in IIoT systems to run them against the system to gather realistic datasets containing both normal and attack traffic analogous to real industrial network traffic. Their particular communication protocols are also their specific to them. We have implemented one of the most popular one in our dataset. Another attribute that distinguishes the security of these systems from others is the imbalanced data. The number of attack samples are significantly lower compared to the enormous number of normal traffic that flows in the system daily. We have made sure we build our datasets compliant with all the specific attributes of an IIoT. Another challenge that we address here is the ``black box nature of learning models that creates hurdles in generating adequate trust in their decisions. Thus, they are seldom utilized as a standalone unit in IIoT high-risk applications. Explainable AI (XAI) has gained an increasing interest in recent years to help with this problem. However, most of the research works that have been done so far focus on image applications or are very slow. For applications such as security of IIoT, we deal with numerical data and low latency is of utmost importance. In this dissertation, we propose a universal XAI model named Transparency Relying Upon Statistical Theory (TRUST). TRUST is model-agnostic, high-performing, and suitable for numerical applications. We prove its superiority compared to another popular XAI model in performance regarding speed and being able to successfully reason the AI\u27s behavior. When dealing with the IoT technology, especially industrial IoT, we deal with a massive amount of data streaming to and from the IoT devices. In addition, the availability and reliability constraints of industrial systems require them to operate at a fast pace and avoid creating any bottleneck in the system. High computational load of complex AI models might cause a burden by having to deal with a large number of data and producing the results not as fast as required. In this dissertation, we utilize distributed computing in the form of edge/cloud structure to address these problems. We propose Anomaly Detection using Distributed AI (ADDAI) that can easily span out geographically to cover a large number of IoT sources. Due to its distributed nature, it guarantees critical IIoT requirements such as high speed, robustness against a single point of failure, low communication overhead, privacy, and scalability. We formulate the communication cost which is minimized and the improvement in performance

    An energy-efficient and trustworthy unsupervised anomaly detection framework (EATU) for IIoT

    Get PDF
    This is the final version. Available from ACM via the DOI in this record. Many anomaly detection techniques have been adopted by Industrial Internet of Things (IIoT) for improving self-diagnosing efficiency and infrastructures security. However, they are usually associated with the issues of computational-hungry and “black box”. Thus, it becomes important to ensure that the detection is not only accurate but also energy-efficient and trustworthy. In this paper, we propose an Energy-efficient And Trustworthy Unsupervised anomaly detection framework (EATU) for IIoT. The framework consists of two levels of feature extraction: 1) Autoencoder-based feature extraction and 2) Efficient DeepExplainer-based explainable feature selection. We propose an Efficient DeepExplainer model based on perturbation-focused sampling which demonstrates the most computational efficiency, amongst state-of-the-art explainable models. With the important features selected by Efficient DeepExplainer, the rationale of why an anomaly detection decision was made is given, enhancing the trustworthiness of the detection as well as improving the accuracy of anomaly detection. Three real-world IIoT datasets with high-dimensional features are used to validate the effectiveness of the proposed framework. Extensive experimental results demonstrate that in comparison with the state-of-the-art, our framework has the attributes of improved accuracy, trustworthiness (in terms of correctness and stability of the explanation) and energy-efficiency (in terms of wall-clock-time and resource usage).Engineering and Physical Sciences Research Council (EPSRC)National Key Research and Development Program of ChinaNational Natural Science Foundation of Chin
    • …
    corecore