2 research outputs found
Building an Effective Intrusion Detection System using Unsupervised Feature Selection in Multi-objective Optimization Framework
Intrusion Detection Systems (IDS) are developed to protect the network by
detecting the attack. The current paper proposes an unsupervised feature
selection technique for analyzing the network data. The search capability of
the non-dominated sorting genetic algorithm (NSGA-II) has been employed for
optimizing three different objective functions utilizing different information
theoretic measures including mutual information, standard deviation, and
information gain to identify mutually exclusive and a high variant subset of
features. Finally, the Pareto optimal front of the different optimal feature
subsets are obtained and these feature subsets are utilized for developing
classification systems using different popular machine learning models like
support vector machines, decision trees and k-nearest neighbour (k=5)
classifier etc. We have evaluated the results of the algorithm on KDD-99,
NSL-KDD and Kyoto 2006+ datasets. The experimental results on KDD-99 dataset
show that decision tree provides better results than other available
classifiers. The proposed system obtains the best results of 99.78% accuracy,
99.27% detection rate and false alarm rate of 0.2%, which are better than all
the previous results for KDD dataset. We achieved an accuracy of 99.83% for 20%
testing data of NSL-KDD dataset and 99.65% accuracy for 10-fold
cross-validation on Kyoto dataset. The most attractive characteristic of the
proposed scheme is that during the selection of appropriate feature subset, no
labeled information is utilized and different feature quality measures are
optimized simultaneously using the multi-objective optimization framework.Comment: 3 figure
Supervised Feature Selection Techniques in Network Intrusion Detection: a Critical Review
Machine Learning (ML) techniques are becoming an invaluable support for
network intrusion detection, especially in revealing anomalous flows, which
often hide cyber-threats. Typically, ML algorithms are exploited to
classify/recognize data traffic on the basis of statistical features such as
inter-arrival times, packets length distribution, mean number of flows, etc.
Dealing with the vast diversity and number of features that typically
characterize data traffic is a hard problem. This results in the following
issues: i) the presence of so many features leads to lengthy training processes
(particularly when features are highly correlated), while prediction accuracy
does not proportionally improve; ii) some of the features may introduce bias
during the classification process, particularly those that have scarce relation
with the data traffic to be classified. To this end, by reducing the feature
space and retaining only the most significant features, Feature Selection (FS)
becomes a crucial pre-processing step in network management and, specifically,
for the purposes of network intrusion detection. In this review paper, we
complement other surveys in multiple ways: i) evaluating more recent datasets
(updated w.r.t. obsolete KDD 99) by means of a designed-from-scratch
Python-based procedure; ii) providing a synopsis of most credited FS approaches
in the field of intrusion detection, including Multi-Objective Evolutionary
techniques; iii) assessing various experimental analyses such as feature
correlation, time complexity, and performance. Our comparisons offer useful
guidelines to network/security managers who are considering the incorporation
of ML concepts into network intrusion detection, where trade-offs between
performance and resource consumption are crucial