38 research outputs found

    Heterogeneous Cross-Project Defect Prediction using Encoder and Transfer Learning

    Get PDF
    Heterogeneous cross-project defect prediction (HCPDP) aims to predict defects in new software projects using defect data from previous software projects where the source and target projects have some different metrics. Most existing methods only find linear relationships in the software defect features and datasets. Additionally, these methods use multiple defect datasets from different projects as source datasets. In this paper, we propose a novel method called heterogeneous cross-project defect prediction using encoder and transfer learning (ETL). ETL uses encoders to extract the important features from source and target datasets. Also, to minimize negative transfer during transfer learning, we used an augmented dataset that contains pseudo-labels and the source dataset. Additionally, we have used very limited data to train the model. To evaluate the performance of the ETL approach, 16 datasets from four publicly available software defect projects were used. Furthermore, we compared the proposed method with four HCPDP methods namely EGW, HDP&amp;#x005F;KS, CTKCCA and EMKCA, and one WPDP method from existing literature. The proposed method on average outperforms the baseline methods in terms of PD, PF, F1-score, G-mean and AUC.</p

    Integrasi Bagging Dan Greedy Forward Selection Pada Prediksi Cacat Software Dengan Menggunakan Naive Bayes

    Full text link
    Kualitas software ditemukan pada saat pemeriksaan dan pengujian. Apabila dalam pemeriksan atau pengujian tersebut terdapat cacat software maka hal tersebut akan membutuhkan waktu dan biaya dalam perbaikannya karena biaya untuk estimasi dalam memperbaiki software yang cacat dibutuhkan biaya yang mencapai 60 Miliar pertahun. Naïve bayes merupakan algoritma klasifikasi yang sederhana, mempunya kinerja yang bagus dan mudah dalam penerapannya, sudah banyak penelitian yang menggunakan algoritma naïve bayes untuk prediksi cacat software yaitu menentukan software mana yang masuk kategori cacat dan tidak cacat pada. Dataset NASA MDP merupakan dataset publik dan sudah banyak digunakan dalam penelitian karena sebanyak 64.79% menggunakan dataset tersebut dalam penelitian prediksi cacat software. Dataset NASA MDP memiliki kelemahan adalah kelas yang tidak seimbang dikarenakan kelas mayoritas berisi tidak cacat dan minoritas berisi cacat dan kelemahan lainnya adalah data tersebut memiliki dimensi yang tinggi atau fitur-fitur yang tidak relevan sehingga dapat menurunkan kinerja dari model prediksi cacat software. Untuk menangani ketidakseimbangan kelas dalam dataset NASA MDP adalah dengan menggunakan metode ensemble (bagging), bagging merupakan salah satu metode ensemble untuk memperbaiki ketidakseimbangan kelas. Sedangkan untuk menangani data yang berdimensi tinggi atau fitur-fitur yang tidak memiliki kontribusi dengan menggunakan seleksi fitur greedy forward selection. Hasil dalam penelitian ini didapatkan nilai AUC tertinggi adalah menggunakan model naïve bayes tanpa seleksi fitur adalah 0.713, naïve bayes dengan greedy forward selection sebesar 0.941 dan naïve bayes dengan greedy forward selection dan bagging adalah sebesar 0.923. Akan tetapi, dilihat dari rata-rata peringkat bahwa naïve bayes dengan greedy forward selection dan bagging merupakan model yang terbaik dalam prediksi cacat software dengan rata-rata peringkat sebesar 2.550

    Resampling Logistic Regression Untuk Penanganan Ketidakseimbangan Class Pada Prediksi Cacat Software

    Full text link
    Software yang berkualitas tinggi adalah software yang dapat membantu proses bisnis Perusahaan dengan efektif, efesien dan tidak ditemukan cacat selama proses pengujian, pemeriksaan, dan implementasi. Perbaikan software setelah pengirimana dan implementasi, membutuhkan biaya jauh lebih mahal dari pada saat pengembangan. Biaya yang dibutuhkan untuk pengujian software menghabisakan lebih dari 50% dari biaya pengembangan. Dibutuhkan model pengujian cacat software untuk mengurangi biaya yang dikeluarkan. Saat ini belum ada model prediksi cacat software yang berlaku umum pada saat digunakan digunakan. Model Logistic Regression merupakan model paling efektif dan efesien dalam prediksi cacat software. Kelemahan dari Logistic Regression adalah rentan terhadap underfitting pada dataset yang kelasnya tidak seimbang, sehingga akan menghasilkan akurasi yang rendah. Dataset NASA MDP adalah dataset umum yang digunakan dalam prediksi cacat software. Salah satu karakter dari dataset prediksi cacat software, termasuk didalamnya dataset NASA MDP adalah memiliki ketidakseimbangan pada kelas. Untuk menangani masalah ketidakseimbangan kelas pada dataset cacat software pada penelitian ini diusulkan metode resampling. Eksperimen dilakukan untuk membandingkan hasil kinerja Logistic Regression sebelum dan setelah diterapkan metode resampling. Demikian juga dilakukan eksperimen untuk membandingkan metode yang diusulkan hasil pengklasifikasi lain seperti Naïve Bayes, Linear Descriminant Analysis, C4.5, Random Forest, Neural Network, k-Nearest Network. Hasil eksperimen menunjukkan bahwa tingkat akurasi Logistic Regression dengan resampling lebih tinggi dibandingkan dengan metode Logistric Regression yang tidak menggunakan resampling, demikian juga bila dibandingkan dengan pengkalisifkasi yang lain. Dari hasil eksperimen di atas dapat disimpulkan bahwa metode resampling terbukti efektif dalam menyelesaikan ketidakseimbangan kelas pada prediksi cacat software dengan algoritma Logistic Regression

    A Bibliometric Survey on the Reliable Software Delivery Using Predictive Analysis

    Get PDF
    Delivering a reliable software product is a fairly complex process, which involves proper coordination from the various teams in planning, execution, and testing for delivering software. Most of the development time and the software budget\u27s cost is getting spent finding and fixing bugs. Rework and side effect costs are mostly not visible in the planned estimates, caused by inherent bugs in the modified code, which impact the software delivery timeline and increase the cost. Artificial intelligence advancements can predict the probable defects with classification based on the software code changes, helping the software development team make rational decisions. Optimizing the software cost and improving the software quality is the topmost priority of the industry to remain profitable in the competitive market. Hence, there is a great urge to improve software delivery quality by minimizing defects and having reasonable control over predicted defects. This paper presents the bibliometric study for Reliable Software Delivery using Predictive analysis by selecting 450 documents from the Scopus database, choosing keywords like software defect prediction, machine learning, and artificial intelligence. The study is conducted for a year starting from 2010 to 2021. As per the survey, it is observed that Software defect prediction achieved an excellent focus among the researchers. There are great possibilities to predict and improve overall software product quality using artificial intelligence techniques

    Class Imbalance Reduction and Centroid based Relevant Project Selection for Cross Project Defect Prediction

    Get PDF
    Cross-Project Defect Prediction (CPDP) is the process of predicting defects in a target project using information from other projects. This can assist developers in prioritizing their testing efforts and finding flaws. Transfer Learning (TL) has been frequently used at CPDP to improve prediction performance by reducing the disparity in data distribution between the source and target projects. Software Defect Prediction (SDP) is a common study topic in software engineering that plays a critical role in software quality assurance. To address the cross-project class imbalance problem, Centroid-based PF-SMOTE for Imbalanced data is used. In this paper, we used a Centroid-based PF-SMOTE to balance the datasets and Centroid based relevant data selection for Cross Project Defect Prediction. These methods use the mean of all attributes in a dataset and calculating the difference between mean of all datasets. For experimentation, the open source software defect datasets namely, AEEM, Re-Link, and NASA, are considered

    Identification of Software Bugs by Analyzing Natural Language-Based Requirements Using Optimized Deep Learning Features

    Get PDF
    © 2024 Tech Science Press. All rights reserved. This is an open access article distributed under the Creative Commons Attribution License, to view a copy of the license, see: https://creativecommons.org/licenses/by/4.0/Software project outcomes heavily depend on natural language requirements, often causing diverse interpretations and issues like ambiguities and incomplete or faulty requirements. Researchers are exploring machine learning to predict software bugs, but a more precise and general approach is needed. Accurate bug prediction is crucial for software evolution and user training, prompting an investigation into deep and ensemble learning methods. However, these studies are not generalized and efficient when extended to other datasets. Therefore, this paper proposed a hybrid approach combining multiple techniques to explore their effectiveness on bug identification problems. The methods involved feature selection, which is used to reduce the dimensionality and redundancy of features and select only the relevant ones; transfer learning is used to train and test the model on different datasets to analyze how much of the learning is passed to other datasets, and ensemble method is utilized to explore the increase in performance upon combining multiple classifiers in a model. Four National Aeronautics and Space Administration (NASA) and four Promise datasets are used in the study, showing an increase in the model’s performance by providing better Area Under the Receiver Operating Characteristic Curve (AUC-ROC) values when different classifiers were combined. It reveals that using an amalgam of techniques such as those used in this study, feature selection, transfer learning, and ensemble methods prove helpful in optimizing the software bug prediction models and providing high-performing, useful end mode.Peer reviewe
    corecore