3,273 research outputs found
TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System
Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier
A Comparative Study of Genetic Algorithm and Particle Swarm optimisation for Dendritic Cell Algorithm
Dendritic cell algorithm (DCA) is a class of artificial immune systems that was originally developed for anomaly detection in networked systems and later as a general binary classifier. Conventionally, in its life cycle, the DCA goes through four phases including feature categorisation into artificial signals, context detection of data items, context assignment, and finally labeling of data items as either abnormal or normal class. During the context detection phase, the DCA requires users to manually pre-define the parameters used by its weighted function to process the signals and data items. Notice that the manual derivation of the parameters of the DCA cannot guarantee the optimal set of weights being used, research attention has thus been attracted to the optimisation of the parameters. This paper reports a systematic comparative study between Genetic algorithm (GA) and Particle Swarm optimisation (PSO) on parameter optimisation for DCA. In order to evaluate the performance of GADCA and PSO-DCA, twelve publicly available datasets from UCI machine learning repository were employed. The performance results based on the computational time, classification accuracy, sensitivity, F-measure, and precision show that, the GA-DCA overall outperforms PSO-DCA for most of the datasets
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
A Novel Evolutionary Swarm Fuzzy Clustering Approach for Hyperspectral Imagery
In land cover assessment, classes often gradually change from one to another. Therefore, it is difficult to allocate sharp boundaries between different classes of interest. To overcome this issue and model such conditions, fuzzy techniques that resemble human reasoning have been proposed as alternatives. Fuzzy C-means is the most common fuzzy clustering technique, but its concept is based on a local search mechanism and its convergence rate is rather slow, especially considering high-dimensional problems (e.g., in processing of hyperspectral images). Here, in order to address those shortcomings of hard approaches, a new approach is proposed, i.e., fuzzy C-means which is optimized by fractional order Darwinian particle swarm optimization. In addition, to speed up the clustering process, the histogram of image intensities is used during the clustering process instead of the raw image data. Furthermore, the proposed clustering approach is combined with support vector machine classification to accurately classify hyperspectral images. The new classification framework is applied on two well-known hyperspectral data sets; Indian Pines and Salinas. Experimental results confirm that the proposed swarm-based clustering approach can group hyperspectral images accurately in a time-efficient manner compared to other existing clustering techniques.PostPrin
- …