609 research outputs found

    Software defect prediction framework based on hybrid metaheuristic optimization methods

    Get PDF
    A software defect is an error, failure, or fault in a software that produces an incorrect or unexpected result. Software defects are expensive in quality and cost. The accurate prediction of defect‐prone software modules certainly assist testing effort, reduce costs and improve the quality of software. The classification algorithm is a popular machine learning approach for software defect prediction. Unfortunately, software defect prediction remains a largely unsolved problem. As the first problem, the comparison and benchmarking results of the defect prediction using machine learning classifiers indicate that, the poor accuracy level is dominant and no particular classifiers perform best for all the datasets. There are two main problems that affect classification performance in software defect prediction: noisy attributes and imbalanced class distribution of datasets, and difficulty of selecting optimal parameters of the classifiers. In this study, a software defect prediction framework that combines metaheuristic optimization methods for feature selection and parameter optimization, with meta learning methods for solving imbalanced class problem on datasets, which aims to improve the accuracy of classification models has been proposed. The proposed framework and models that are are considered to be the specific research contributions of this thesis are: 1) a comparison framework of classification models for software defect prediction known as CF-SDP, 2) a hybrid genetic algorithm based feature selection and bagging technique for software defect prediction known as GAFS+B, 3) a hybrid particle swarm optimization based feature selection and bagging technique for software defect prediction known as PSOFS+B, and 4) a hybrid genetic algorithm based neural network parameter optimization and bagging technique for software defect prediction, known as NN-GAPO+B. For the purpose of this study, ten classification algorithms have been selected. The selection aims at achieving a balance between established classification algorithms used in software defect prediction. The proposed framework and methods are evaluated using the state-of-the-art datasets from the NASA metric data repository. The results indicated that the proposed methods (GAFS+B, PSOFS+B and NN-GAPO+B) makes an impressive improvement in the performance of software defect prediction. GAFS+B and PSOFS+B significantly affected on the performance of the class imbalance suffered classifiers, such as C4.5 and CART. GAFS+B and PSOFS+B also outperformed the existing software defect prediction frameworks in most datasets. Based on the conducted experiments, logistic regression performs best in most of the NASA MDP datasets, without or with feature selection method. The proposed methods also generated the selected relevant features in software defect prediction. The top ten most relevant features in software defect prediction include branch count metrics, decision density, halstead level metric of a module, number of operands contained in a module, maintenance severity, number of blank LOC, halstead volume, number of unique operands contained in a module, total number of LOC and design density

    HYBRYDOWY, BINARNY ALGORYTM WOA OPARTY NA TRANSMITANCJI STOŻKOWEJ DO PROGNOZOWANIA DEFEKTÓW OPROGRAMOWANIA

    Get PDF
    Reliability is one of the key factors used to gauge software quality. Software defect prediction (SDP) is one of the most important factors which affects measuring software's reliability. Additionally, the high dimensionality of the features has a direct effect on the accuracy of SDP models. The objective of this paper is to propose a hybrid binary whale optimization algorithm (BWOA) based on taper-shape transfer functions for solving feature selection problems and dimension reduction with a KNN classifier as a new software defect prediction method. In this paper, the values of a real vector that represents the individual encoding have been converted to binary vector by using the four types of Taper-shaped transfer functions to enhance the performance of BWOA to reduce the dimension of the search space. The performance of the suggested method (T-BWOA-KNN) was evaluated using eleven standard software defect prediction datasets from the PROMISE and NASA repositories depending on the K-Nearest Neighbor (KNN) classifier. Seven evaluation metrics have been used to assess the effectiveness of the suggested method. The experimental results have shown that the performance of T-BWOA-KNN produced promising results compared to other methods including ten methods from the literature, four types of T-BWOA with the KNN classifier. In addition, the obtained results are compared and analyzed with other methods from the literature in terms of the average number of selected features (SF) and accuracy rate (ACC) using the Kendall W test. In this paper, a new hybrid software defect prediction method called T-BWOA-KNN has been proposed which is concerned with the feature selection problem. The experimental results have proved that T-BWOA-KNN produced promising performance compared with other methods for most datasets.Niezawodność jest jednym z kluczowych czynników stosowanych do oceny jakości oprogramowania. Przewidywanie defektów oprogramowania SDP (ang. Software Defect Prediction) jest jednym z najważniejszych czynników wpływających na pomiar niezawodności oprogramowania. Dodatkowo, wysoka wymiarowość cech ma bezpośredni wpływ na dokładność modeli SDP. Celem artykułu jest zaproponowanie hybrydowego algorytmu optymalizacji BWOA (ang. Binary Whale Optimization Algorithm) w oparciu o transmitancję stożkową do rozwiązywania problemów selekcji cech i redukcji wymiarów za pomocą klasyfikatora KNN jako nowej metody przewidywania defektów oprogramowania. W artykule, wartości wektora rzeczywistego, reprezentującego indywidualne kodowanie zostały przekonwertowane na wektor binarny przy użyciu czterech typów funkcji transferu w kształcie stożka w celu zwiększenia wydajności BWOA i zmniejszenia wymiaru przestrzeni poszukiwań. Wydajność sugerowanej metody (T-BWOA-KNN) oceniano przy użyciu jedenastu standardowych zestawów danych do przewidywania defektów oprogramowania z repozytoriów PROMISE i NASA w zależności od klasyfikatora KNN. Do oceny skuteczności sugerowanej metody wykorzystano siedem wskaźników ewaluacyjnych. Wyniki eksperymentów wykazały, że działanie rozwiązania T-BWOA-KNN pozwoliło uzyskać obiecujące wyniki w porównaniu z innymi metodami, w tym dziesięcioma metodami na podstawie literatury, czterema typami T-BWOA z klasyfikatorem KNN. Dodatkowo, otrzymane wyniki zostały porównane i przeanalizowane innymi metodami z literatury pod kątem średniej liczby wybranych cech (SF) i współczynnika dokładności (ACC), z wykorzystaniem testu W. Kendalla. W pracy, zaproponowano nową hybrydową metodę przewidywania defektów oprogramowania, nazwaną T-BWOA-KNN, która dotyczy problemu wyboru cech. Wyniki eksperymentów wykazały, że w przypadku większości zbiorów danych T-BWOA-KNN uzyskała obiecującą wydajność w porównaniu z innymi metodami

    Towards Optimized K Means Clustering using Nature-inspired Algorithms for Software Bug Prediction

    Get PDF
    In today s software development environment the necessity for providing quality software products has undoubtedly remained the largest difficulty As a result early software bug prediction in the development phase is critical for lowering maintenance costs and improving overall software performance Clustering is a well-known unsupervised method for data classification and finding related patterns hidden in dataset

    Hybrid Metaheuristics for Classification Problems

    Get PDF
    High accuracy and short amount of time are required for the solutions of many classification problems such as real-world classification problems. Due to the practical importance of many classification problems (such as crime detection), many algorithms have been developed to tackle them. For years, metaheuristics (MHs) have been successfully used for solving classification problems. Recently, hybrid metaheuristics have been successfully used for many real-world optimization problems such as flight scheduling and load balancing in telecommunication networks. This chapter investigates the use of this new interdisciplinary field for classification problems. Moreover, it demonstrates the forms of metaheuristics hybridization as well as designing a new hybrid metaheuristic

    Integrasi Bagging Dan Greedy Forward Selection Pada Prediksi Cacat Software Dengan Menggunakan Naive Bayes

    Full text link
    Kualitas software ditemukan pada saat pemeriksaan dan pengujian. Apabila dalam pemeriksan atau pengujian tersebut terdapat cacat software maka hal tersebut akan membutuhkan waktu dan biaya dalam perbaikannya karena biaya untuk estimasi dalam memperbaiki software yang cacat dibutuhkan biaya yang mencapai 60 Miliar pertahun. Naïve bayes merupakan algoritma klasifikasi yang sederhana, mempunya kinerja yang bagus dan mudah dalam penerapannya, sudah banyak penelitian yang menggunakan algoritma naïve bayes untuk prediksi cacat software yaitu menentukan software mana yang masuk kategori cacat dan tidak cacat pada. Dataset NASA MDP merupakan dataset publik dan sudah banyak digunakan dalam penelitian karena sebanyak 64.79% menggunakan dataset tersebut dalam penelitian prediksi cacat software. Dataset NASA MDP memiliki kelemahan adalah kelas yang tidak seimbang dikarenakan kelas mayoritas berisi tidak cacat dan minoritas berisi cacat dan kelemahan lainnya adalah data tersebut memiliki dimensi yang tinggi atau fitur-fitur yang tidak relevan sehingga dapat menurunkan kinerja dari model prediksi cacat software. Untuk menangani ketidakseimbangan kelas dalam dataset NASA MDP adalah dengan menggunakan metode ensemble (bagging), bagging merupakan salah satu metode ensemble untuk memperbaiki ketidakseimbangan kelas. Sedangkan untuk menangani data yang berdimensi tinggi atau fitur-fitur yang tidak memiliki kontribusi dengan menggunakan seleksi fitur greedy forward selection. Hasil dalam penelitian ini didapatkan nilai AUC tertinggi adalah menggunakan model naïve bayes tanpa seleksi fitur adalah 0.713, naïve bayes dengan greedy forward selection sebesar 0.941 dan naïve bayes dengan greedy forward selection dan bagging adalah sebesar 0.923. Akan tetapi, dilihat dari rata-rata peringkat bahwa naïve bayes dengan greedy forward selection dan bagging merupakan model yang terbaik dalam prediksi cacat software dengan rata-rata peringkat sebesar 2.550

    Estudio de métodos de construcción de ensembles de clasificadores y aplicaciones

    Get PDF
    La inteligencia artificial se dedica a la creación de sistemas informáticos con un comportamiento inteligente. Dentro de este área el aprendizaje computacional estudia la creación de sistemas que aprenden por sí mismos. Un tipo de aprendizaje computacional es el aprendizaje supervisado, en el cual, se le proporcionan al sistema tanto las entradas como la salida esperada y el sistema aprende a partir de estos datos. Un sistema de este tipo se denomina clasificador. En ocasiones ocurre, que en el conjunto de ejemplos que utiliza el sistema para aprender, el número de ejemplos de un tipo es mucho mayor que el número de ejemplos de otro tipo. Cuando esto ocurre se habla de conjuntos desequilibrados. La combinación de varios clasificadores es lo que se denomina "ensemble", y a menudo ofrece mejores resultados que cualquiera de los miembros que lo forman. Una de las claves para el buen funcionamiento de los ensembles es la diversidad. Esta tesis, se centra en el desarrollo de nuevos algoritmos de construcción de ensembles, centrados en técnicas de incremento de la diversidad y en los problemas desequilibrados. Adicionalmente, se aplican estas técnicas a la solución de varias problemas industriales.Ministerio de Economía y Competitividad, proyecto TIN-2011-2404

    Search based software engineering: Trends, techniques and applications

    Get PDF
    © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E

    Software Fault Prediction using Bio-Inspired Algorithms to Select the Features to be employed: An Empirical Study

    Get PDF
    In recent past, the use of bio-inspired algorithms got a significant attention in software fault predictions, where they can be used to select the most relevant features for a dataset aiming to increase the prediction accuracy of estimation techniques. The most-earlier and widely investigated algorithms are Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). More recently, researchers have analyzed other algorithms inspired from nature. In this paper, we consider GA and PSO as baseline/benchmark algorithms and evaluate their performances against seven recently-employed bio-inspired algorithms and metaheuristics, namely Ant Colony Optimization, Bat Search, Bee Search, Cuckoo Search, Harmony Search, Multi-Objective Evolutionary Algorithm, and Tabu Search, for feature selection in software fault prediction. We present experiments with seven open source datasets and three estimation techniques: Random Forest, Support Vector Regression, and Linear Regression. We found that it is not always true that the recently introduced algorithms outperform the earlier introduced algorithms
    corecore