3,273 research outputs found

    Two new approaches to feature selection with harmony search

    Get PDF

    TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System

    Get PDF
    Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier

    A Comparative Study of Genetic Algorithm and Particle Swarm optimisation for Dendritic Cell Algorithm

    Get PDF
    Dendritic cell algorithm (DCA) is a class of artificial immune systems that was originally developed for anomaly detection in networked systems and later as a general binary classifier. Conventionally, in its life cycle, the DCA goes through four phases including feature categorisation into artificial signals, context detection of data items, context assignment, and finally labeling of data items as either abnormal or normal class. During the context detection phase, the DCA requires users to manually pre-define the parameters used by its weighted function to process the signals and data items. Notice that the manual derivation of the parameters of the DCA cannot guarantee the optimal set of weights being used, research attention has thus been attracted to the optimisation of the parameters. This paper reports a systematic comparative study between Genetic algorithm (GA) and Particle Swarm optimisation (PSO) on parameter optimisation for DCA. In order to evaluate the performance of GADCA and PSO-DCA, twelve publicly available datasets from UCI machine learning repository were employed. The performance results based on the computational time, classification accuracy, sensitivity, F-measure, and precision show that, the GA-DCA overall outperforms PSO-DCA for most of the datasets

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Feature Selection for Fuzzy Models

    Get PDF

    A Novel Evolutionary Swarm Fuzzy Clustering Approach for Hyperspectral Imagery

    Get PDF
    In land cover assessment, classes often gradually change from one to another. Therefore, it is difficult to allocate sharp boundaries between different classes of interest. To overcome this issue and model such conditions, fuzzy techniques that resemble human reasoning have been proposed as alternatives. Fuzzy C-means is the most common fuzzy clustering technique, but its concept is based on a local search mechanism and its convergence rate is rather slow, especially considering high-dimensional problems (e.g., in processing of hyperspectral images). Here, in order to address those shortcomings of hard approaches, a new approach is proposed, i.e., fuzzy C-means which is optimized by fractional order Darwinian particle swarm optimization. In addition, to speed up the clustering process, the histogram of image intensities is used during the clustering process instead of the raw image data. Furthermore, the proposed clustering approach is combined with support vector machine classification to accurately classify hyperspectral images. The new classification framework is applied on two well-known hyperspectral data sets; Indian Pines and Salinas. Experimental results confirm that the proposed swarm-based clustering approach can group hyperspectral images accurately in a time-efficient manner compared to other existing clustering techniques.PostPrin
    • …
    corecore