Search CORE

2 research outputs found

PERBANDINGAN REPRESENTASI FITUR PADA KATEGORISASI DAN PREDIKSI PRIORITAS LAPORAN DALAM BUG TRACKING SYSTEM

Author: Daffa Almer Fauzan -
Publication venue
Publication date: 12/01/2023
Field of study

Bug Tracking System (BTS) merupakan suatu perangkat lunak yang digunakan dalam tahap pemeliharaan perangkat lunak dan berperan untuk menyimpan riwayat dan melacak laporan terkait permintaan terhadap perubahan, perbaikan kecacatan dan kegagalan, dan dukungan teknis dalam siklus hidup pengembangan perangkat lunak. Kategori dan prioritas suatu laporan dalam BTS dapat ditetapkan secara otomatis menggunakan model pembelajaran mesin. Algoritma yang digunakan dalam penelitian ini adalah Logistic Regression. Tujuan dari penelitian adalah mengidentifikasi representasi fitur yang tepat dengan memperhatikan fitur teks secara kontekstual dan karakteristik sumber himpunan data dalam menghadapi permasalahan ketidakseimbangan kelas. Permasalahan ketidakseimbangan kelas dihadapi ketika data pada label kelas, baik berdasarkan kategori maupun prioritas, memiliki jumlah yang tidak seimbang sehingga berdampak terhadap kemampuan model dalam memprediksi label kelas dengan jumlah data yang relatif sedikit. Representasi fitur yang dibandingkan mencakup TF-IDF, TF-IDF dengan SMOTE dan variasinya (ADASYN dan BorderlineSMOTE), TF-IDF dengan Word2Vec (CBOW dan skip-gram), dan TF-CHI dengan Word2Vec (CBOW dan skip-gram). Hasil menunjukkan bahwa model direpresentasikan dengan TF-CHI dengan Word2Vec (CBOW) dapat meningkatkan nilai precision paling tinggi sebesar 51%, recall paling tinggi sebesar 29%, F-score paling tinggi sebesar 35%, dan accuracy paling tinggi sebesar 21%. Namun, TF-IDF dengan SMOTE dan variasinya dapat menjadi alternatif solusi ketika anomali terjadi pada TF-CHI, yakni terbentuknya suatu kluster yang terdiri atas sebagian besar atau seluruh label kelas. -------- Bug Tracking System (BTS) is a software that is used in the stage of software maintenance and plays a role in keeping history and tracking reports regarding modification requests, defect fixes, and technical support in the software development life cycle. The category and priority of a report can be set automatically using a machine learning model. The algorithm that is used in this research is Logistic Regression. The objective of this research is to identify the appropriate feature representation by considering the text features contextually and the characteristic of the dataset in dealing with class imbalance problem. The class imbalance problem is faced when the data on their class label, either based on their category or priority, has an imbalance number in terms of amount which affects the capability of the model in predicting the class label with lower amount of data. The feature representation that are being compared includes TF-IDF, TF-IDF with SMOTE and its variations (ADASYN and BorderlineSMOTE), TF-IDF with Word2Vec (CBOW and skip-gram), and TF-CHI with Word2Vec (CBOW and skip-gram). The results show that the model represented by TF-CHI with Word2Vec (CBOW) can increase its precision maximum by 51%, recall maximum by 29%, F-score maximum by 35%, and accuracy maximum by 21%. However, TF-IDF with SMOTE and its variation can be an alternative solution when an anomaly occurs in TF-CHI, that is the formation of a cluster which consists of most or all class labels

Repository UPI

Transfer Learning based Low Shot Classifier for Software Defect Prediction

Author: Bhupendra Kumar Sharma
Sanjay Kumar Dubey
Vikas Suhag
Publication venue: Universitas Airlangga
Publication date: 01/11/2023
Field of study

Background: The rapid growth and increasing complexity of software applications are causing challenges in maintaining software quality within constraints of time and resources. This challenge led to the emergence of a new field of study known as Software Defect Prediction (SDP), which focuses on predicting future defect in advance, thereby reducing costs and improving productivity in software industry. Objective: This study aimed to address data distribution disparities when applying transfer learning in multi-project scenarios, and to mitigate performance issues resulting from data scarcity in SDP. Methods: The proposed approach, namely Transfer Learning based Low Shot Classifier (TLLSC), combined transfer learning and low shot learning approaches to create an SDP model. This model was designed for application in both new projects and those with minimal historical defect data. Results: Experiments were conducted using standard datasets from projects within the National Aeronautics and Space Administration (NASA) and Software Research Laboratory (SOFTLAB) repository. TLLSC showed an average increase in F1-Measure of 31.22%, 27.66%, and 27.54% for project AR3, AR4, and AR5, respectively. These results surpassed those from Transfer Component Analysis (TCA+), Canonical Correlation Analysis (CCA+), and Kernel Canonical Correlation Analysis plus (KCCA+). Conclusion: The results of the comparison between TLLSC and state-of-the-art algorithms, namely TCA+, CCA+, and KCCA+ from the existing literature consistently showed that TLLSC performed better in terms of F1-Measure. Keywords: Just-in-time, Defect Prediction, Deep Learning, Transfer Learning, Low Shot Learnin

Directory of Open Access Journals