23,142 research outputs found

    Bi-Directional Feature Fixation-Based Particle Swarm Optimization for Large-Scale Feature Selection

    Get PDF
    Feature selection, which aims to improve the classification accuracy and reduce the size of the selected feature subset, is an important but challenging optimization problem in data mining. Particle swarm optimization (PSO) has shown promising performance in tackling feature selection problems, but still faces challenges in dealing with large-scale feature selection in Big Data environment because of the large search space. Hence, this article proposes a bi-directional feature fixation (BDFF) framework for PSO and provides a novel idea to reduce the search space in large-scale feature selection. BDFF uses two opposite search directions to guide particles to adequately search for feature subsets with different sizes. Based on the two different search directions, BDFF can fix the selection states of some features and then focus on the others when updating particles, thus narrowing the large search space. Besides, a self-adaptive strategy is designed to help the swarm concentrate on a more promising direction for search in different stages of evolution and achieve a balance between exploration and exploitation. Experimental results on 12 widely-used public datasets show that BDFF can improve the performance of PSO on large-scale feature selection and obtain smaller feature subsets with higher classification accuracy

    Binary Competitive Swarm Optimizer Approaches For Feature Selection

    Get PDF
    Feature selection is known as an NP-hard combinatorial problem in which the possible feature subsets increase exponentially with the number of features. Due to the increment of the feature size, the exhaustive search has become impractical. In addition, a feature set normally includes irrelevant, redundant, and relevant information. Therefore, in this paper, binary variants of a competitive swarm optimizer are proposed for wrapper feature selection. The proposed approaches are used to select a subset of significant features for classification purposes. The binary version introduced here is performed by employing the S-shaped and V-shaped transfer functions, which allows the search agents to move on the binary search space. The proposed approaches are tested by using 15 benchmark datasets collected from the UCI machine learning repository, and the results are compared with other conventional feature selection methods. Our results prove the capability of the proposed binary version of the competitive swarm optimizer not only in terms of high classification performance, but also low computational cost

    Comparative between optimization feature selection by using classifiers algorithms on spam email

    Get PDF
    Spam mail has become a rising phenomenon in a world that has recently witnessed high growth in the volume of emails. This indicates the need to develop an effective spam filter. At the present time, Classification algorithms for text mining are used for the classification of emails. This paper provides a description and evaluation of the effectiveness of three popular classifiers using optimization feature selections, such as Genetic algorithm, Harmony search, practical swarm optimization, and simulating annealing. The research focuses on a comparison of the effect of classifiers using K-nearest Neighbor (KNN), Naïve Bayesian (NB), and Support Vector Machine (SVM) on spam classifiers (without using feature selection) also enhances the reliability of feature selection by proposing optimization feature selection to reduce number of features that are not important

    Improving the Anomaly Detection by Combining PSO Search Methods and J48 Algorithm

    Get PDF
    The feature selection techniques are used to find the most important and relevant features in a dataset. Therefore, in this study feature selection technique was used to improve the performance of Anomaly Detection. Many feature selection techniques have been developed and implemented on the NSL-KDD dataset. However, with the rapid growth of traffic on a network where more applications, devices, and protocols participate, the traffic data is complex and heterogeneous contribute to security issues. This makes the NSL-KDD dataset no longer reliable for it. The detection model must also be able to recognize the type of novel attack on complex network datasets. So, a robust analysis technique for a more complex and larger dataset is required, to overcome the increase of security issues in a big data network. This study proposes particle swarm optimization (PSO) Search methods as a feature selection method. As contribute to feature analysis knowledge, In the experiment a combination of particle swarm optimization (PSO) Search methods with other search methods are examined. To overcome the limitation NSL-KDD dataset, in the experiments the CICIDS2017 dataset used. To validate the selected features from the proposed technique J48 classification algorithm used in this study. The detection performance of the combination PSO Search method with J48 examined and compare with other feature selection and previous study. The proposed technique successfully finds the important features of the dataset, which improve detection performance with 99.89% accuracy. Compared with the previous study the proposed technique has better accuracy, TPR, and FPR

    COMPARATIVE ANALYSIS OF PARTICLE SWARM OPTIMIZATION ALGORITHMS FOR TEXT FEATURE SELECTION

    Get PDF
    With the rapid growth of Internet, more and more natural language text documents are available in electronic format, making automated text categorization a must in most fields. Due to the high dimensionality of text categorization tasks, feature selection is needed before executing document classification. There are basically two kinds of feature selection approaches: the filter approach and the wrapper approach. For the wrapper approach, a search algorithm for feature subsets and an evaluation algorithm for assessing the fitness of the selected feature subset are required. In this work, I focus on the comparison between two wrapper approaches. These two approaches use Particle Swarm Optimization (PSO) as the search algorithm. The first algorithm is PSO based K-Nearest Neighbors (KNN) algorithm, while the second is PSO based Rocchio algorithm. Three datasets are used in this study. The result shows that BPSO-KNN is slightly better in classification results than BPSO-Rocchio, while BPSO-Rocchio has far shorter computation time than BPSO-KNN

    Feature Selection with Harmony Search for Classification: A Review

    Get PDF
    In the area of data mining, feature selection is an important task for classification and dimensionality reduction. Feature selection is the process of choosing the most relevant features in a datasets. If the datasets contains irrelevant features, it will not only affect the training of the classification process but also the accuracy of the model. A good classification accuracy can be achieved when the model correctly predicted the class labels. This paper gives a general review of feature selection with Harmony Search (HS) algorithm for classification in various application. From the review, feature selection with HS algorithm shows a good performance as compared to other metaheuristics algorithm such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO)

    Improving gender classification with feature selection in forensic anthropology

    Get PDF
    Gender classification has been one of the most vital tasks in a real world problem especially when it comes to death investigations. Developing a biological profile of an individual is a crucial step in forensic anthropology process as for the identification of gender. Forensic anthropologists employ the principle of skeleton remains to produce a biological profile. Different parts of skeleton contains different features that will contribute to gender classification. However, not all the features could contribute to gender classification and affect to a low accuracy of gender classification. Therefore, feature selection method is applied to identify the most significant features for gender classification. This paper presents the implementation of feature selection approaches which are Particle Swarm Optimization (PSO) and Harmony Search (HS) algorithm using three different dataset from Goldman Osteometric Dataset, Osteological Collection and George Murray Black Collection. All three dataset contains 4081 samples of metrics measurement and have gone through the process of classification by using Back Propagation Neural Network (BPNN) and Naïve Bayes classifier. The main scope of this paper is to identify the effect of feature selection towards gender classification. The result shows that the accuracy of gender classification for every dataset increased when feature selection is applied to the dataset. Among all the skeleton parts in this experiment, clavicle part achieved the highest increment of accuracy rate which is from 89.76% to 96.06% for PSO algorithm and 96.32% for HS

    A novel approach to data mining using simplified swarm optimization

    Get PDF
    Data mining has become an increasingly important approach to deal with the rapid growth of data collected and stored in databases. In data mining, data classification and feature selection are considered the two main factors that drive people when making decisions. However, existing traditional data classification and feature selection techniques used in data management are no longer enough for such massive data. This deficiency has prompted the need for a new intelligent data mining technique based on stochastic population-based optimization that could discover useful information from data. In this thesis, a novel Simplified Swarm Optimization (SSO) algorithm is proposed as a rule-based classifier and for feature selection. SSO is a simplified Particle Swarm Optimization (PSO) that has a self-organising ability to emerge in highly distributed control problem space, and is flexible, robust and cost effective to solve complex computing environments. The proposed SSO classifier has been implemented to classify audio data. To the author’s knowledge, this is the first time that SSO and PSO have been applied for audio classification. Furthermore, two local search strategies, named Exchange Local Search (ELS) and Weighted Local Search (WLS), have been proposed to improve SSO performance. SSO-ELS has been implemented to classify the 13 benchmark datasets obtained from the UCI repository database. Meanwhile, SSO-WLS has been implemented in Anomaly-based Network Intrusion Detection System (A-NIDS). In A-NIDS, a novel hybrid SSO-based Rough Set (SSORS) for feature selection has also been proposed. The empirical analysis showed promising results with high classification accuracy rate achieved by all proposed techniques over audio data, UCI data and KDDCup 99 datasets. Therefore, the proposed SSO rule-based classifier with local search strategies has offered a new paradigm shift in solving complex problems in data mining which may not be able to be solved by other benchmark classifiers

    Optimising decision trees using multi-objective particle swarm optimisation

    Get PDF
    Copyright © 2009 Springer-Verlag Berlin Heidelberg. The final publication is available at link.springer.comBook title: Swarm Intelligence for Multi-objective Problems in Data MiningSummary. Although conceptually quite simple, decision trees are still among the most popular classifiers applied to real-world problems. Their popularity is due to a number of factors – core among these is their ease of comprehension, robust performance and fast data processing capabilities. Additionally feature selection is implicit within the decision tree structure. This chapter introduces the basic ideas behind decision trees, focusing on decision trees which only consider a rule relating to a single feature at a node (therefore making recursive axis-parallel slices in feature space to form their classification boundaries). The use of particle swarm optimization (PSO) to train near optimal decision trees is discussed, and PSO is applied both in a single objective formulation (minimizing misclassification cost), and multi-objective formulation (trading off misclassification rates across classes). Empirical results are presented on popular classification data sets from the well-known UCI machine learning repository, and PSO is demonstrated as being fully capable of acting as an optimizer for trees on these problems. Results additionally support the argument that multi-objectification of a problem can improve uni-objective search in classification problems
    corecore