6 research outputs found

    Improving anomalous rare attack detection rate for intrusion detection system using support vector machine and genetic programming

    Get PDF
    Commonly addressed problem in intrusion detection system (IDS) research works that employed NSL-KDD dataset is to improve the rare attacks detection rate. However, some of the rare attacks are hard to be recognized by the IDS model due to their patterns are totally missing from the training set, hence, reducing the rare attacks detection rate. This problem of missing rare attacks can be defined as anomalous rare attacks and hardly been solved in IDS literature. Hence, in this letter, we proposed a new classifier to improve the anomalous attacks detection rate based on support vector machine (SVM) and genetic programming (GP). Based on the experimental results, our classifier, GPSVM, managed to get higher detection rate on the anomalous rare attacks, without significant reduction on the overall accuracy. This is because, GPSVM optimization task is to ensure the accuracy is balanced between classes without reducing the generalization property of SVM

    A new classification model for a class imbalanced data set using genetic programming and support vector machines: case study for wilt disease classification

    Get PDF
    Class imbalanced data set is a state where each class of the given data set is not evenly distributed. When such case happens, most standard classifiers fail to recognize examples that belong to a minority class. Hence, several methods have been proposed to solve this problem such as resampling, modification on classifier optimization problem or introducing a new optimization task on top of the classifier. This work proposes a new optimization task based on genetic programming, built on top of support vector machine, in order to improve the classification rate for minority class without significant reduction on accuracy metric. The experimentation carried out on wilt disease data set shows the new classifier, support vector based on genetic programming machine, gives a more balanced accuracy between classes compared to various classification techniques in solving the imbalanced classification problem

    A new classifier based on combination of genetic programming and support vector machine in solving imbalanced classification problem

    Get PDF
    In supervised learning, class imbalanced data set is a state where the class distribution is not uniform among the classes. Many classifiers fail to properly identify pattern that belongs to minority class due to most of those classifiers are built in order to minimize error rate. Hence, a biased classification model is highly anticipated as higher accuracy can always be represented by majority class. There are two methods in dealing with imbalanced classification problem, which are based on data or algorithmic level. Data level based methods are meant to solve the imbalanced classification problem based on the idea of making both classes equal in number. However, by changing the distribution of both classes, the original classes distribution that are followed by that particular data will be violated. Algorithmic level based methods however are based on introducing new optimization task to improve the minority class classification rate, without changing the data characteristics. Nevertheless, the optimization task requires specific care in order to prevent the issue of overfitting classification model. Therefore, a new classifier based on genetic programming (GP) and support vector machine (SVM) is proposed in this thesis in order to solve the imbalanced classification problem without changing the data properties. The idea is to use GP to optimize the SVM decision function such that the minority class classification rate is increased without sacrificing the accuracy rate for both classes. In addition, the classifier is also optimized such that it has a good generalization property. The main keys of the new classifier are based on the new kernel method, new learning metric and a new optimization algorithm in order to optimize the SVM decision function. The proposed classifier is called Support Vector Genetic Programming Machine, SVGPM. In order to evaluate the performance of SVGPM against current methods in solving imbalanced classification task, three experiments are conducted such as on selected standard class imbalanced benchmark data sets, intrusion detection system (IDS) data set and remote sensing data set. The SVGPM performance is compared against SVM and cost-sensitive SVM due to the superiority of SVM in dealing with imbalanced classification problem. The second experiment is by evaluating the SVGPM performance on detecting anomalous rare attacks from network intrusion data set. The SVGPM performance is compared against current methods in developing a prediction model for IDS. In the third experiment, SVGPM is evaluated on wilt disease data set from remote sensing study, to identify wilt diseased trees in high-resolution image. The SVGPM performance is compared against the previously proposed methods in mapping the regions that are covered by wilt diseased trees in Japan. The carried out experimentation shown that SVGPM gives a very good classification rate in classifying minority class without sacrificing the accuracy rate for both classes. This is because, in the training stage, the introduced optimization task in SVGPM ensures that each minority class example is generalized into one learning concept and both classification rate for majority and minority classes are similar

    Change-Oriented Summarization of Temporal Scholarly Document Collections by Semantic Evolution Analysis

    No full text
    The number of scholarly publications has dramatically increased over the last decades. For anyone new to a particular science domain it is not easy to understand the major trends and significant changes that the domain has undergone over time. Temporal summarization and related approaches should be then useful to make sense of scholarly temporal collections. In this paper we demonstrate an approach to analyze the dataset of research papers by providing a high level overview of important changes that occurred over time in this dataset. The novelty of our approach lies in the adaptation of methods used for semantic term evolution analysis. However, we analyze not just semantic evolution of single words independently, but we estimate common semantic drifts shared by groups of semantically converging words. As an example dataset we study the ACL Anthology Reference Corpus that spans from 1974 to 2015 and contains 22,878 scholarly articles

    Multiclass Classification Method in Handheld Based Smartphone Gait Identification

    No full text
    Gait identification has been widely used in many types of research and application. Since gait identification involves with many people and classes, using a single classifier is not a good option as the dataset may contains overlapped class boundary and moreover, most of the classifiers are well built for binary classes. This paper discusses the application of multiclass classifiers such as one-vs-all (OvA), one-vs-one (OvO) and random correction code (RCC) on handheld based smartphone gait signal for person identification. The mapping uses J48 as the main classifier. The result is then compared with a single J48 for the benchmark. Finally, the best multiclass method is compared with few machine learning classifier in-order to see its capability. From the result, it can be seen that using OvO and RCC thus increase the accuracy performance if compared to a single classifier. For the best classifier in the multiclass mapping method, it can be seen that J48 yield the best accuracy score
    corecore