7,759 research outputs found

    Unsupervised feature selection method for intrusion detection system

    Full text link
    © 2015 IEEE. This paper considers the feature selection problem for data classification in the absence of data labels. It first proposes an unsupervised feature selection algorithm, which is an enhancement over the Laplacian score method, named an Extended Laplacian score, EL in short. Specifically, two main phases are involved in EL to complete the selection procedures. In the first phase, the Laplacian score algorithm is applied to select the features that have the best locality preserving power. In the second phase, EL proposes a Redundancy Penalization (RP) technique based on mutual information to eliminate the redundancy among the selected features. This technique is an enhancement over Battiti's MIFS. It does not require a user-defined parameter such as beta to complete the selection processes of the candidate feature set as it is required in MIFS. After tackling the feature selection problem, the final selected subset is then used to build an Intrusion Detection System. The effectiveness and the feasibility of the proposed detection system are evaluated using three well-known intrusion detection datasets: KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset. The evaluation results confirm that our feature selection approach performs better than the Laplacian score method in terms of classification accuracy

    Anomaly-Based Intrusion Detection System

    Get PDF
    Anomaly-based network intrusion detection plays a vital role in protecting networks against malicious activities. In recent years, data mining techniques have gained importance in addressing security issues in network. Intrusion detection systems (IDS) aim to identify intrusions with a low false alarm rate and a high detection rate. Although classification-based data mining techniques are popular, they are not effective to detect unknown attacks. Unsupervised learning methods have been given a closer look for network IDS, which are insignificant to detect dynamic intrusion activities. The recent contributions in literature focus on machine learning techniques to build anomaly-based intrusion detection systems, which extract the knowledge from training phase. Though existing intrusion detection techniques address the latest types of attacks like DoS, Probe, U2R, and R2L, reducing false alarm rate is a challenging issue. Most network IDS depend on the deployed environment. Hence, developing a system which is independent of the deployed environment with fast and appropriate feature selection method is a challenging issue. The exponential growth of zero-day attacks emphasizing the need of security mechanisms which can accurately detect previously unknown attacks is another challenging task. In this work, an attempt is made to develop generic meta-heuristic scale for both known and unknown attacks with a high detection rate and low false alarm rate by adopting efficient feature optimization techniques

    A Study on Feature Analysis and Ensemble-based Intrusion Detection Scheme using CICIDS-2017 dataset

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.One of the primary security research challenges faced by traditional IDS methods is their inability to handle large volumes of network data and detect modern cyber-attacks with high detection accuracy and low false alarms. Hence, there is a need for efficient and reliable IDS schemes that can tackle this ever-changing cybersecurity paradigm. Machine learning techniques are hence, becoming very popular in designing modern intrusion detection systems. Several supervised and unsupervised machine learning techniques have been used in literature; however, the IDS classification efficiency is affected by noisy data in high dimensional datasets. The role of feature selection is significant as the feature selection process eliminates the redundant and noisy data and further selecting optimal feature subset enables reduction of high dimensional IDS datasets. Machine learning algorithms are extensively being used for intrusion detection. However, research has proved that the performance of multiple classifier-based IDS is far better than an IDS classifier, which has given us the motivation to develop an ensemble-based intrusion detection model. Lastly, the benchmark IDS datasets currently being used for the evaluation of IDS schemes are outdated and do not represent modern-day attacks. The CICIDS-2017 dataset is offered by the University of New Brunswick. It is the latest publicly available dataset for intrusion detection. However, there are a significantly low number of research studies conducted using this dataset which also focus on optimal feature selection. This dataset has a good potential to be used as a future benchmark intrusion detection dataset as it covers the modern-day system setup and threat profile and the dependency on outdated IDS datasets can be removed. There is a need to benchmark the performance of modern IDS datasets using machine learning ensemble-based classifiers. This thesis aims to address the issues by proposing a new intrusion detection framework using ensemble-based feature selection method for generating a low dimensionality feature subset and ensemble-based intrusion detection framework to benchmark the performance of the CICIDS - 2017 dataset. The proposed scheme is beneficial for research community as it combines the use of the latest available IDS dataset with ensemble technique for feature selection and ensemble-based intrusion detection model

    Automatic Dataset Labelling and Feature Selection for Intrusion Detection Systems

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Correctly labelled datasets are commonly required. Three particular scenarios are highlighted, which showcase this need. When using supervised Intrusion Detection Systems (IDSs), these systems need labelled datasets to be trained. Also, the real nature of the analysed datasets must be known when evaluating the efficiency of the IDSs when detecting intrusions. Another scenario is the use of feature selection that works only if the processed datasets are labelled. In normal conditions, collecting labelled datasets from real networks is impossible. Currently, datasets are mainly labelled by implementing off-line forensic analysis, which is impractical because it does not allow real-time implementation. We have developed a novel approach to automatically generate labelled network traffic datasets using an unsupervised anomaly based IDS. The resulting labelled datasets are subsets of the original unlabelled datasets. The labelled dataset is then processed using a Genetic Algorithm (GA) based approach, which performs the task of feature selection. The GA has been implemented to automatically provide the set of metrics that generate the most appropriate intrusion detection results

    Data mining based cyber-attack detection

    Get PDF
    • …
    corecore