4 research outputs found

    Software Fault Prediction using Bio-Inspired Algorithms to Select the Features to be employed: An Empirical Study

    Get PDF
    In recent past, the use of bio-inspired algorithms got a significant attention in software fault predictions, where they can be used to select the most relevant features for a dataset aiming to increase the prediction accuracy of estimation techniques. The most-earlier and widely investigated algorithms are Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). More recently, researchers have analyzed other algorithms inspired from nature. In this paper, we consider GA and PSO as baseline/benchmark algorithms and evaluate their performances against seven recently-employed bio-inspired algorithms and metaheuristics, namely Ant Colony Optimization, Bat Search, Bee Search, Cuckoo Search, Harmony Search, Multi-Objective Evolutionary Algorithm, and Tabu Search, for feature selection in software fault prediction. We present experiments with seven open source datasets and three estimation techniques: Random Forest, Support Vector Regression, and Linear Regression. We found that it is not always true that the recently introduced algorithms outperform the earlier introduced algorithms

    Feature selection of unbalanced breast cancer data using particle swarm optimization

    Get PDF
    Breast cancer is one of the significant deaths causing diseases of women around the globe. Therefore, high accuracy in cancer prediction models is vital to improving patients’ treatment quality and survivability rate. In this work, we presented a new method namely improved balancing particle swarm optimization (IBPSO) algorithm to predict the stage of breast cancer using unbalanced surveillance epidemiology and end result (USEER) data. The work contributes in two directions. First, design and implement an improved particle swarm optimization (IPSO) algorithm to avoid the local minima while reducing USEER data’s dimensionality. The improvement comes primarily through employing the cross-over ability of the genetic algorithm as a fitness function while using the correlation-based function to guide the selection task to a minimal feature subset of USEER sufficiently to describe the universe. Second, develop an improved synthetic minority over-sampling technique (ISMOTE) that avoid overfitting problem while efficiently balance USEER. ISMOTE generates the new objects based on the average of the two objects with the smallest and largest distance from the centroid object of the minority class. The experiments and analysis show that the proposed IBPSO is feasible and effective, outperforms other state-of-the-art methods; in minimizing the features with an accuracy of 98.45%

    Selecting Root Exploit Features Using Flying Animal-Inspired Decision

    Get PDF
    Malware is an application that executes malicious activities to a computer system, including mobile devices. Root exploit brings more damages among all types of malware because it is able to run in stealthy mode. It compromises the nucleus of the operating system known as kernel to bypass the Android security mechanisms. Once it attacks and resides in the kernel, it is able to install other possible types of malware to the Android devices. In order to detect root exploit, it is important to investigate its features to assist machine learning to predict it accurately. This study proposes flying animal-inspired (1) bat, 2) firefly, and 3) bee) methods to search automatically the exclusive features, then utilizes these flying animal-inspired decision features to improve the machine learning prediction. Furthermore, a boosting method (Adaboost) boosts the multilayer perceptron (MLP) potential to a stronger classification. The evaluation jotted the best result is from bee search, which recorded 91.48 percent in accuracy, 82.2 percent in true positive rate, and 0.1 percent false positive rate

    Mutable composite firefly algorithm for gene selection in microarray based cancer classification

    Get PDF
    Cancer classification is critical due to the strenuous effort required in cancer treatment and the rising cancer mortality rate. Recent trends with high throughput technologies have led to discoveries in terms of biomarkers that successfully contributed to cancerrelated issues. A computational approach for gene selection based on microarray data analysis has been applied in many cancer classification problems. However, the existing hybrid approaches with metaheuristic optimization algorithms in feature selection (specifically in gene selection) are not generalized enough to efficiently classify most cancer microarray data while maintaining a small set of genes. This leads to the classification accuracy and genes subset size problem. Hence, this study proposed to modify the Firefly Algorithm (FA) along with the Correlation-based Feature Selection (CFS) filter for the gene selection task. An improved FA was proposed to overcome FA slow convergence by generating mutable size solutions for the firefly population. In addition, a composite position update strategy was designed for the mutable size solutions. The proposed strategy was to balance FA exploration and exploitation in order to address the local optima problem. The proposed hybrid algorithm known as CFS-Mutable Composite Firefly Algorithm (CFS-MCFA) was evaluated on cancer microarray data for biomarker selection along with the deployment of Support Vector Machine (SVM) as the classifier. Evaluation was performed based on two metrics: classification accuracy and size of feature set. The results showed that the CFS-MCFA-SVM algorithm outperforms benchmark methods in terms of classification accuracy and genes subset size. In particular, 100 percent accuracy was achieved on all four datasets and with only a few biomarkers (between one and four). This result indicates that the proposed algorithm is one of the competitive alternatives in feature selection, which later contributes to the analysis of microarray data
    corecore