10 research outputs found
Efficient Feature Subset Selection Algorithm for High Dimensional Data
Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance
Correlation based feature selection with clustering for high dimensional data
Feature selection is an essential technique to reduce the dimensionality problem in data mining task. Traditional feature selection algorithms are fail to scale on large space. This paper proposes a new method to solve dimensionality problem where clustering is integrating with correlation measure to produce good feature subset. First Irrelevant features are eliminated by using k-means clustering method and then non-redundant features are selected by correlation measure from each cluster. The proposed method is evaluate on Microarray and Text datasets and the results are compared with other renowned feature selection methods using Naïve Bayes classifier. To verify the accuracy of the proposed method with different number of relevant features, percentagewise criteria is used. The experimental results reveal the efficiency and accuracy of the proposed method. Keywords: Clustering, Feature selection, Correlation, Dimensionality reductio
Improving the anomaly detection by combining PSO search methods and J48 algorithm
The feature selection techniques are used to find the most important and relevant features in a dataset. Therefore, in this study feature selection technique was used to improve the performance of Anomaly Detection. Many feature selection techniques have been developed and implemented on the NSL-KDD dataset. However, with the rapid growth of traffic on a network where more applications, devices, and protocols participate, the traffic data is complex and heterogeneous contribute to security issues. This makes the NSL-KDD dataset no longer reliable for it. The detection model must also be able to recognize the type of novel attack on complex network datasets. So, a robust analysis technique for a more complex and larger dataset is required, to overcome the increase of security issues in a big data network. This study proposes particle swarm optimization (PSO) Search methods as a feature selection method. As contribute to feature analysis knowledge, In the experiment a combination of particle swarm optimization (PSO) Search methods with other search methods are examined. To overcome the limitation NSL-KDD dataset, in the experiments the CICIDS2017 dataset used. To validate the selected features from the proposed technique J48 classification algorithm used in this study. The detection performance of the combination PSO Search method with J48 examined and compare with other feature selection and previous study. The proposed technique successfully finds the important features of the dataset, which improve detection performance with 99.89% accuracy. Compared with the previous study the proposed technique has better accuracy, TPR, and FPR.Anomaly Detection, CICIDS201