588 research outputs found

    Intrusion detection using probabilistic graphical models

    Get PDF
    Modern computer systems are plagued by security vulnerabilities and flaws on many levels. Those vulnerabilities and flaws are discovered and exploited by attackers for their various intrusion purposes, such as eavesdropping, data modification, identity spoofing, password based attack, and denial of service attack, etc. The security of our computer systems and data is always at risk because of the open society of the internet. Due to the rapid growth of the internet applications, intrusion detection and prevention have become increasingly important research topics, in order to protect networking systems, such as the Web servers, database servers, cloud servers and so on, from threats. In this thesis, we attempt to build more efficient Intrusion Detection System through three different approaches, from different perspectives and based on different situations. Firstly, we propose Bayesian Model Averaging of Bayesian Network (BNMA) Classifiers for intrusion detection. In this work, we compare our BNMA classifier with Bayesian Network classifier and Naive Bayes classifier, which were shown be good models for detecting intrusion with reasonable accuracy and efficiency in the literature. From the experiment results, we see that BNMA can be more efficient and reliable than its competitors, i.e., the Bayesian network classifier and Naive Bayesian Network classifier, for all different sizes of training dataset. The advantage of BNMA is more pronounced when the training dataset size is small. Secondly, we introduce the Situational Data Model as a method for collecting dataset to train intrusion detection models. Unlike previously discussed static features as in the KDD CUP 99 data, which were collected without time stamps, Situational Data are collected in chronological sequence. Therefore, they can capture not only the dependency relationships among different features, but also relationships of values collected over time for the same features. The experiment results show that the intrusion detection model trained by Situational Dataset outperforms that trained by action-only sequences. Thirdly, we introduce the Situation Aware with Conditional Random Fields Intrusion Detection System (SA-CRF-IDS). The SA-CRF-IDS is trained by probabilistic graphical model Conditional Random Fields (CRF) over the Situational Dataset. The experiment results show that the CRF outperforms HMM with significantly better detection accuracy, and better ROC curve when we run the experiment on the non-Situational dataset. On the other hand, the two training methods have very similar performance when the Situational Dataset is adopted

    Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

    Get PDF
    Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data

    TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS

    Get PDF
    With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems
    corecore