Search CORE

588 research outputs found

Intrusion detection using probabilistic graphical models

Author: Xiao Liyuan
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Modern computer systems are plagued by security vulnerabilities and flaws on many levels. Those vulnerabilities and flaws are discovered and exploited by attackers for their various intrusion purposes, such as eavesdropping, data modification, identity spoofing, password based attack, and denial of service attack, etc. The security of our computer systems and data is always at risk because of the open society of the internet. Due to the rapid growth of the internet applications, intrusion detection and prevention have become increasingly important research topics, in order to protect networking systems, such as the Web servers, database servers, cloud servers and so on, from threats. In this thesis, we attempt to build more efficient Intrusion Detection System through three different approaches, from different perspectives and based on different situations. Firstly, we propose Bayesian Model Averaging of Bayesian Network (BNMA) Classifiers for intrusion detection. In this work, we compare our BNMA classifier with Bayesian Network classifier and Naive Bayes classifier, which were shown be good models for detecting intrusion with reasonable accuracy and efficiency in the literature. From the experiment results, we see that BNMA can be more efficient and reliable than its competitors, i.e., the Bayesian network classifier and Naive Bayesian Network classifier, for all different sizes of training dataset. The advantage of BNMA is more pronounced when the training dataset size is small. Secondly, we introduce the Situational Data Model as a method for collecting dataset to train intrusion detection models. Unlike previously discussed static features as in the KDD CUP 99 data, which were collected without time stamps, Situational Data are collected in chronological sequence. Therefore, they can capture not only the dependency relationships among different features, but also relationships of values collected over time for the same features. The experiment results show that the intrusion detection model trained by Situational Dataset outperforms that trained by action-only sequences. Thirdly, we introduce the Situation Aware with Conditional Random Fields Intrusion Detection System (SA-CRF-IDS). The SA-CRF-IDS is trained by probabilistic graphical model Conditional Random Fields (CRF) over the Situational Dataset. The experiment results show that the CRF outperforms HMM with significantly better detection accuracy, and better ROC curve when we run the experiment on the non-Situational dataset. On the other hand, the two training methods have very similar performance when the Situational Dataset is adopted

Digital Repository @ Iowa State University (ISU)

Recommended from our members

Ensemble learning of model hyperparameters and spatiotemporal data for calibration of low-cost PM2.5 sensors.

Author: Bhanu Bir
Day Rong-Fuh
Tsai Chih-Chun
Tung Ching-Ying
Yin Peng-Yeng
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

he PM2.5 air quality index (AQI) measurements from government-built supersites are accurate but cannot provide a dense coverage of monitoring areas. Low-cost PM2.5 sensors can be used to deploy a fine-grained internet-of-things (IoT) as a complement to government facilities. Calibration of low-cost sensors by reference to high-accuracy supersites is thus essential. Moreover, the imputation for missing-value in training data may affect the calibration result, the best performance of calibration model requires hyperparameter optimization, and the affecting factors of PM2.5 concentrations such as climate, geographical landscapes and anthropogenic activities are uncertain in spatial and temporal dimensions. In this paper, an ensemble learning for imputation method selection, calibration model hyperparameterization, and spatiotemporal training data composition is proposed. Three government supersites are chosen in central Taiwan for the deployment of low-cost sensors and hourly PM2.5 measurements are collected for 60 days for conducting experiments. Three optimizers, Sobol sequence, Nelder and Meads, and particle swarm optimization (PSO), are compared for evaluating their performances with various versions of ensembles. The best calibration results are obtained by using PSO, and the improvement ratios with respect to R2, RMSE, and NME, are 4.92%, 52.96%, and 56.85%, respectively

eScholarship - University of California

Combining univariate approaches for ensemble change detection in multivariate data

Author: Faithfull William
Kuncheva Ludmila
Rodriguez Juan
Publication venue
Publication date: 01/01/2019
Field of study

Bangor University Research Portal

Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

Author: Zhang Jian
Publication venue
Publication date: 27/08/2018
Field of study

Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data

YorkSpace

TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS

Author: Mahfouz Ahmed Mosbah Elsaeed
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2021
Field of study

With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems

University of Memphis Digital Commons