40 research outputs found

    Automatic Dataset Labelling and Feature Selection for Intrusion Detection Systems

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Correctly labelled datasets are commonly required. Three particular scenarios are highlighted, which showcase this need. When using supervised Intrusion Detection Systems (IDSs), these systems need labelled datasets to be trained. Also, the real nature of the analysed datasets must be known when evaluating the efficiency of the IDSs when detecting intrusions. Another scenario is the use of feature selection that works only if the processed datasets are labelled. In normal conditions, collecting labelled datasets from real networks is impossible. Currently, datasets are mainly labelled by implementing off-line forensic analysis, which is impractical because it does not allow real-time implementation. We have developed a novel approach to automatically generate labelled network traffic datasets using an unsupervised anomaly based IDS. The resulting labelled datasets are subsets of the original unlabelled datasets. The labelled dataset is then processed using a Genetic Algorithm (GA) based approach, which performs the task of feature selection. The GA has been implemented to automatically provide the set of metrics that generate the most appropriate intrusion detection results

    ANALYZING BIG DATA WITH DECISION TREES

    Get PDF
    ANALYZING BIG DATA WITH DECISION TREE

    Study of Tree Base Data Mining Algorithms for Network Intrusion Detection

    Get PDF
    Internet growth has increased rapidly due to which number of network attacks have been increased. This emphasis importance of network intrusion detection systems (IDS) for securing the network. It is the process of monitoring and analyzing network traffic for detecting security violations many researcher suggested data mining technique such as classification, clustering ,pattern matching and rule induction for developing an effective intrusion detection system. In order to detect the intrusion, the network traffic can be classified into normal and anomalous. In this paper we have evaluated tree base classification algorithms namely J48, Hoeffding tree, Random Forest, Random Tree, REPTree. The comparison of these tree based classification algorithms is presented in this paper based upon their performance metrics using 10 fold cross validation and KDD- CUP test dataset. This study shows that random forest and J48 are the best suitable tree base algorithms

    An intrusion detection system based on polynomial feature correlation analysis

    Full text link
    © 2017 IEEE. This paper proposes an anomaly-based Intrusion Detection System (IDS), which flags anomalous network traffic with a distance-based classifier. A polynomial approach was designed and applied in this work to extract hidden correlations from traffic related statistics in order to provide distinguishing features for detection. The proposed IDS was evaluated using the well-known KDD Cup 99 data set. Evaluation results show that the proposed system achieved better detection rates on KDD Cup 99 data set in comparison with another two state-of-the-art detection schemes. Moreover, the computational complexity of the system has been analysed in this paper and shows similar to the two state-of-the-art schemes

    Empirical study of automatic dataset labelling

    Get PDF
    Correctly labelled dataseis are commonly required. Three particular scenarios are highlighted, which showcase this need. One of these scenarios is when using supervised Intrusion Detection Systems (TDSs). These systems need labelled datasets for their training process. Also, the real nature of analysed datasets must be known when evaluating the efficiency of IDSs detecting intrusions. The third scenario is the use of feature selection that works only if the processed datasets are labelled. In normal conditions, collecting labelled datasets from real communication networks is impossible. In a previous work we developed a novel approach to automatically generate labelled network traffic datasets using an unsupervised anomaly based IDS. The approach was empirically proven to be an efficient unsupervised labelling approach. It was evaluated using a single dataset. This paper extends our previous work by using a greater number of datasets, gathered from a real IEEE 802.11 network testbed. The datasets are comprised of different wireless-specific attacks. This paper also proposes a new and more precise method to calculate the boundary threshold, used in the labelling process

    Hybrid intelligent approach for network intrusion detection

    Get PDF
    In recent years, computer networks are broadly used, and they have become very complicated. A lot of sensitive information passes through various kinds of computer devices, ranging from minicomputers to servers and mobile devices. These occurring changes have led to draw the conclusion that the number of attacks on important information over the network systems is increasing with every year. Intrusion is the main threat to the network. It is defined as a series of activities aimed for exposing the security of network systems in terms of confidentiality, integrity and availability, as a result; intrusion detection is extremely important as a part of the defense. Hence, there must be substantial improvement in network intrusion detection techniques and systems. Due to the prevailing limitations of finding novel attacks, high false detection, and accuracy in previous intrusion detection approaches, this study has proposed a hybrid intelligent approach for network intrusion detection based on k-means clustering algorithm and support vector machine classification algorithm. The aim of this study is to reduce the rate of false alarm and also to improve the detection rate, comparing with the existing intrusion detection approaches. In the present study, NSL-KDD intrusion dataset has been used for training and testing the proposed approach. In order to improve classification performance, some steps have been taken beforehand. The first one is about unifying the types and filtering the dataset by data transformation. Then, a features selection algorithm is applied to remove irrelevant and noisy features for the purpose of intrusion. Feature selection has decreased the features from 41 to 21 features for intrusion detection and later normalization method is employed to perform and reduce the differences among the data. Clustering is the last step of processing before classification has been performed, using k-means algorithm. Under the purpose of classification, support vector machine have been used. After training and testing the proposed hybrid intelligent approach, the results of performance evaluation have shown that the proposed network intrusion detection has achieved high accuracy and low false detection rate. The accuracy is 96.025 percent and the false alarm is 3.715 percent

    Decision Tree Classification Model In Water Supply Network

    Full text link
    With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future

    Success Factors of Donation-Based Crowdfunding Campaigns: A Machine Learning Approach

    Get PDF
    Crowdfunding has emerged as an alternative mechanism to traditional financing mechanisms in which individuals solicit financial capital or donation from the crowd. The success factors of crowdfunding are not well-understood, particularly for donation-based crowdfunding platforms. This study identifies key drivers of donation-based crowdfunding campaign success using a machine learning approach. Based on an analysis of crowdfunding campaigns from Gofundme.com, we show that our models were able to predict the average daily amount received at a high level of accuracy using variables available at the beginning of the campaign and the number of days it had been posted. In addition, Facebook and Twitter shares and the number of likes, improved the accuracy of the models. Among the six machine learning algorithms we used, support vector machine (SVM) performs the best in predicting campaign success
    corecore