855 research outputs found

    A Review on Various Methods of Intrusion Detection System

    Get PDF
    Detection of Intrusion is an essential expertise business segment as well as a dynamic area of study and expansion caused by its requirement. Modern day intrusion detection systems still have these limitations of time sensitivity. The main requirement is to develop a system which is able of handling large volume of network data to detect attacks more accurately and proactively. Research conducted by on the KDDCUP99 dataset resulted in a various set of attributes for each of the four major attack types. Without reducing the number of features, detecting attack patterns within the data is more difficult for rule generation, forecasting, or classification. The goal of this research is to present a new method that Compare results of appropriately categorized and inaccurately categorized as proportions and the features chosen. Data mining is used to clean, classify and examine large amount of network data. Since a large volume of network traffic that requires processing, we use data mining techniques. Different Data Mining techniques such as clustering, classification and association rules are proving to be useful for analyzing network traffic. This paper presents the survey on data mining techniques applied on intrusion detection systems for the effective identification of both known and unknown patterns of attacks, thereby helping the users to develop secure information systems. Keywords: IDS, Data Mining, Machine Learning, Clustering, Classification DOI: 10.7176/CEIS/11-1-02 Publication date: January 31st 2020

    A performance study of anomaly detection using entropy method

    Full text link
    An experiment to study the entropy method for an anomaly detection system has been performed. The study has been conducted using real data generated from the distributed sensor networks at the Intel Berkeley Research Laboratory. The experimental results were compared with the elliptical method and has been analyzed in two dimensional data sets acquired from temperature and humidity sensors across 52 micro controllers. Using the binary classification to determine the upper and lower boundaries for each series of sensors, it has been shown that the entropy method are able to detect more number of out ranging sensor nodes than the elliptical methods. It can be argued that the better result was mainly due to the lack of elliptical approach which is requiring certain correlation between two sensor series, while in the entropy approach each sensor series is treated independently. This is very important in the current case where both sensor series are not correlated each other.Comment: Proceeding of the International Conference on Computer, Control, Informatics and its Applications (2017) pp. 137-14

    Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

    Improved hybrid teaching learning based optimization-jaya and support vector machine for intrusion detection systems

    Get PDF
    Most of the currently existing intrusion detection systems (IDS) use machine learning algorithms to detect network intrusion. Machine learning algorithms have widely been adopted recently to enhance the performance of IDSs. While the effectiveness of some machine learning algorithms in detecting certain types of network intrusion has been ascertained, the situation remains that no single method currently exists that can achieve consistent results when employed for the detection of multiple attack types. Hence, the detection of network attacks on computer systems has remain a relevant field of research for some time. The support vector machine (SVM) is one of the most powerful machine learning algorithms with excellent learning performance characteristics. However, SVM suffers from many problems, such as high rates of false positive alerts, as well as low detection rates of rare but dangerous attacks that affects its performance; feature selection and parameters optimization are important operations needed to increase the performance of SVM. The aim of this work is to develop an improved optimization method for IDS that can be efficient and effective in subset feature selection and parameters optimization. To achieve this goal, an improved Teaching Learning-Based Optimization (ITLBO) algorithm was proposed in dealing with subset feature selection. Meanwhile, an improved parallel Jaya (IPJAYA) algorithm was proposed for searching the best parameters (C, Gama) values of SVM. Hence, a hybrid classifier called ITLBO-IPJAYA-SVM was developed in this work for the improvement of the efficiency of network intrusion on data sets that contain multiple types of attacks. The performance of the proposed approach was evaluated on NSL-KDD and CICIDS intrusion detection datasets and from the results, the proposed approaches exhibited excellent performance in the processing of large datasets. The results also showed that SVM optimization algorithm achieved accuracy values of 0.9823 for NSL-KDD dataset and 0.9817 for CICIDS dataset, which were higher than the accuracy of most of the existing paradigms for classifying network intrusion detection datasets. In conclusion, this work has presented an improved optimization algorithm that can improve the accuracy of IDSs in the detection of various types of network attack

    Feature Space Modeling for Accurate and Efficient Learning From Non-Stationary Data

    Get PDF
    A non-stationary dataset is one whose statistical properties such as the mean, variance, correlation, probability distribution, etc. change over a specific interval of time. On the contrary, a stationary dataset is one whose statistical properties remain constant over time. Apart from the volatile statistical properties, non-stationary data poses other challenges such as time and memory management due to the limitation of computational resources mostly caused by the recent advancements in data collection technologies which generate a variety of data at an alarming pace and volume. Additionally, when the collected data is complex, managing data complexity, emerging from its dimensionality and heterogeneity, can pose another challenge for effective computational learning. The problem is to enable accurate and efficient learning from non-stationary data in a continuous fashion over time while facing and managing the critical challenges of time, memory, concept change, and complexity simultaneously. Feature space modeling is one of the most effective solutions to address this problem. For non-stationary data, selecting relevant features is even more critical than stationary data due to the reduction of feature dimension which can ensure the best use a computational resource to produce higher accuracy and efficiency by data mining algorithms. In this dissertation, we investigated a variety of feature space modeling techniques to improve the overall performance of data mining algorithms. In particular, we built Relief based feature sub selection method in combination with data complexity iv analysis to improve the classification performance using ovarian cancer image data collected in a non-stationary batch mode. We also collected time series health sensor data in a streaming environment and deployed feature space transformation using Singular Value Decomposition (SVD). This led to reduced dimensionality of feature space resulting in better accuracy and efficiency produced by Density Ration Estimation Method in identifying potential change points in data over time. We have also built an unsupervised feature space modeling using matrix factorization and Lasso Regression which was successfully deployed in conjugate with Relative Density Ratio Estimation to address the botnet attacks in a non-stationary environment. Relief based feature model improved 16% accuracy of Fuzzy Forest classifier. For change detection framework, we observed 9% improvement in accuracy for PCA feature transformation. Due to the unsupervised feature selection model, for 2% and 5% malicious traffic ratio, the proposed botnet detection framework exhibited average 20% better accuracy than One Class Support Vector Machine (OSVM) and average 25% better accuracy than Autoencoder. All these results successfully demonstrate the effectives of these feature space models. The fundamental theme that repeats itself in this dissertation is about modeling efficient feature space to improve both accuracy and efficiency of selected data mining models. Every contribution in this dissertation has been subsequently and successfully employed to capitalize on those advantages to solve real-world problems. Our work bridges the concepts from multiple disciplines ineffective and surprising ways, leading to new insights, new frameworks, and ultimately to a cross-production of diverse fields like mathematics, statistics, and data mining
    • …
    corecore