437 research outputs found

    Strengthening intrusion detection system for adversarial attacks:Improved handling of imbalance classification problem

    Get PDF
    Most defence mechanisms such as a network-based intrusion detection system (NIDS) are often sub-optimal for the detection of an unseen malicious pattern. In response, a number of studies attempt to empower a machine-learning-based NIDS to improve the ability to recognize adversarial attacks. Along this line of research, the present work focuses on non-payload connections at the TCP stack level, which is generalized and applicable to different network applications. As a compliment to the recently published investigation that searches for the most informative feature space for classifying obfuscated connections, the problem of class imbalance is examined herein. In particular, a multiple-clustering-based undersampling framework is proposed to determine the set of cluster centroids that best represent the majority class, whose size is reduced to be on par with that of the minority. Initially, a pool of centroids is created using the concept of ensemble clustering that aims to obtain a collection of accurate and diverse clusterings. From that, the final set of representatives is selected from this pool. Three different objective functions are formed for this optimization driven process, thus leading to three variants of FF-Majority, FF-Minority and FF-Overall. Based on the thorough evaluation of a published dataset, four classification models and different settings, these new methods often exhibit better predictive performance than its baseline, the single-clustering undersampling counterpart and state-of-the-art techniques. Parameter analysis and implication for analyzing an extreme case are also provided as a guideline for future applications

    An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

    Get PDF
    In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i.e. examples of one class in a training data set vastly outnumber examples of the other class(es). Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, and feature selection approaches to this problem. In this paper, we present a new hybrid frame work dubbed as Majority Under-sampling based on Cluster Disjunct (MAJOR_CD) for learning from skewed training data. This algorithm provides a simpler and faster alternative by using cluster disjunct concept. We conduct experiments using twelve UCI data sets from various application domains using five algorithms for comparison on six evaluation metrics. The empirical study suggests that MAJOR_CD have been believed to be effective in addressing the class imbalance problem

    Improving the Anomaly Detection by Combining PSO Search Methods and J48 Algorithm

    Get PDF
    The feature selection techniques are used to find the most important and relevant features in a dataset. Therefore, in this study feature selection technique was used to improve the performance of Anomaly Detection. Many feature selection techniques have been developed and implemented on the NSL-KDD dataset. However, with the rapid growth of traffic on a network where more applications, devices, and protocols participate, the traffic data is complex and heterogeneous contribute to security issues. This makes the NSL-KDD dataset no longer reliable for it. The detection model must also be able to recognize the type of novel attack on complex network datasets. So, a robust analysis technique for a more complex and larger dataset is required, to overcome the increase of security issues in a big data network. This study proposes particle swarm optimization (PSO) Search methods as a feature selection method. As contribute to feature analysis knowledge, In the experiment a combination of particle swarm optimization (PSO) Search methods with other search methods are examined. To overcome the limitation NSL-KDD dataset, in the experiments the CICIDS2017 dataset used. To validate the selected features from the proposed technique J48 classification algorithm used in this study. The detection performance of the combination PSO Search method with J48 examined and compare with other feature selection and previous study. The proposed technique successfully finds the important features of the dataset, which improve detection performance with 99.89% accuracy. Compared with the previous study the proposed technique has better accuracy, TPR, and FPR

    Oblique Decision Tree Algorithm with Minority Condensation for Class Imbalanced Problem

    Get PDF
    In recent years, a significant issue in classification is to handle a dataset containing imbalanced number of instances in each class. Classifier modification is one of the well-known techniques to deal with this particular issue. In this paper, the effective classification model based on an oblique decision tree is enhanced to work with the imbalanced datasets that is called oblique minority condensed decision tree (OMCT). Initially, it selects the best axis-parallel hyperplane based on decision tree algorithm using the minority entropy of instances within the minority inner fence selection. Then it perturbs this hyperplane along each axis to improve its minority entropy. Finally, it stochastically perturbs this hyperplane to escape the local solution. From the experimental results, OMCT significantly outperforms 6 state-of-the-art decision tree algorithms that are CART, C4.5, OC1, AE, DCSM and ME on 18 real-world datasets from UCI in term of precision, recall and F1 score. Moreover, the size of decision tree from OMCT is significantly smaller than others

    Learning from Imbalanced Multi-label Data Sets by Using Ensemble Strategies

    Get PDF
    Multi-label classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Problems of this type are ubiquitous in everyday life. Such as, a movie can be categorized as action, crime, and thriller. Most algorithms on multi-label classification learning are designed for balanced data and don’t work well on imbalanced data. On the other hand, in real applications, most datasets are imbalanced. Therefore, we focused to improve multi-label classification performance on imbalanced datasets. In this paper, a state-of-the-art multi-label classification algorithm, which called IBLR_ML, is employed. This algorithm is produced from combination of k-nearest neighbor and logistic regression algorithms. Logistic regression part of this algorithm is combined with two ensemble learning algorithms, Bagging and Boosting. My approach is called IB-ELR. In this paper, for the first time, the ensemble bagging method whit stable learning as the base learner and imbalanced data sets as the training data is examined. Finally, to evaluate the proposed methods; they are implemented in JAVA language. Experimental results show the effectiveness of proposed methods. Keywords: Multi-label classification, Imbalanced data set, Ensemble learning, Stable algorithm, Logistic regression, Bagging, Boostin
    • …
    corecore