32 research outputs found

    Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

    Full text link
    Software is highly contextual. While there are cross-cutting `global' lessons, individual software projects exhibit many `local' properties. This data heterogeneity makes drawing local conclusions from global data dangerous. A key research challenge is to construct locally accurate prediction models that are informed by global characteristics and data volumes. Previous work has tackled this problem using clustering and transfer learning approaches, which identify locally similar characteristics. This paper applies a simpler approach known as Bayesian hierarchical modeling. We show that hierarchical modeling supports cross-project comparisons, while preserving local context. To demonstrate the approach, we conduct a conceptual replication of an existing study on setting software metrics thresholds. Our emerging results show our hierarchical model reduces model prediction error compared to a global approach by up to 50%.Comment: Short paper, published at MSR '18: 15th International Conference on Mining Software Repositories May 28--29, 2018, Gothenburg, Swede

    Parameter tuning in KNN for software defect prediction: an empirical analysis

    Get PDF
    Software Defect Prediction (SDP) provides insights that can help software teams to allocate their limited resources in developing software systems. It predicts likely defective modules and helps avoid pitfalls that are associated with such modules. However, these insights may be inaccurate and unreliable if parameters of SDP models are not taken into consideration. In this study, the effect of parameter tuning on the k nearest neighbor (k-NN) in SDP was investigated. More specifically, the impact of varying and selecting optimal k value, the influence of distance weighting and the impact of distance functions on k-NN. An experiment was designed to investigate this problem in SDP over 6 software defect datasets. The experimental results revealed that k value should be greater than 1 (default) as the average RMSE values of k-NN when k>1(0.2727) is less than when k=1(default) (0.3296). In addition, the predictive performance of k-NN with distance weighing improved by 8.82% and 1.7% based on AUC and accuracy respectively. In terms of the distance function, kNN models based on Dilca distance function performed better than the Euclidean distance function (default distance function). Hence, we conclude that parameter tuning has a positive effect on the predictive performance of k-NN in SDP

    PERBANDINGAN REPRESENTASI FITUR PADA KATEGORISASI DAN PREDIKSI PRIORITAS LAPORAN DALAM BUG TRACKING SYSTEM

    Get PDF
    Bug Tracking System (BTS) merupakan suatu perangkat lunak yang digunakan dalam tahap pemeliharaan perangkat lunak dan berperan untuk menyimpan riwayat dan melacak laporan terkait permintaan terhadap perubahan, perbaikan kecacatan dan kegagalan, dan dukungan teknis dalam siklus hidup pengembangan perangkat lunak. Kategori dan prioritas suatu laporan dalam BTS dapat ditetapkan secara otomatis menggunakan model pembelajaran mesin. Algoritma yang digunakan dalam penelitian ini adalah Logistic Regression. Tujuan dari penelitian adalah mengidentifikasi representasi fitur yang tepat dengan memperhatikan fitur teks secara kontekstual dan karakteristik sumber himpunan data dalam menghadapi permasalahan ketidakseimbangan kelas. Permasalahan ketidakseimbangan kelas dihadapi ketika data pada label kelas, baik berdasarkan kategori maupun prioritas, memiliki jumlah yang tidak seimbang sehingga berdampak terhadap kemampuan model dalam memprediksi label kelas dengan jumlah data yang relatif sedikit. Representasi fitur yang dibandingkan mencakup TF-IDF, TF-IDF dengan SMOTE dan variasinya (ADASYN dan BorderlineSMOTE), TF-IDF dengan Word2Vec (CBOW dan skip-gram), dan TF-CHI dengan Word2Vec (CBOW dan skip-gram). Hasil menunjukkan bahwa model direpresentasikan dengan TF-CHI dengan Word2Vec (CBOW) dapat meningkatkan nilai precision paling tinggi sebesar 51%, recall paling tinggi sebesar 29%, F-score paling tinggi sebesar 35%, dan accuracy paling tinggi sebesar 21%. Namun, TF-IDF dengan SMOTE dan variasinya dapat menjadi alternatif solusi ketika anomali terjadi pada TF-CHI, yakni terbentuknya suatu kluster yang terdiri atas sebagian besar atau seluruh label kelas. -------- Bug Tracking System (BTS) is a software that is used in the stage of software maintenance and plays a role in keeping history and tracking reports regarding modification requests, defect fixes, and technical support in the software development life cycle. The category and priority of a report can be set automatically using a machine learning model. The algorithm that is used in this research is Logistic Regression. The objective of this research is to identify the appropriate feature representation by considering the text features contextually and the characteristic of the dataset in dealing with class imbalance problem. The class imbalance problem is faced when the data on their class label, either based on their category or priority, has an imbalance number in terms of amount which affects the capability of the model in predicting the class label with lower amount of data. The feature representation that are being compared includes TF-IDF, TF-IDF with SMOTE and its variations (ADASYN and BorderlineSMOTE), TF-IDF with Word2Vec (CBOW and skip-gram), and TF-CHI with Word2Vec (CBOW and skip-gram). The results show that the model represented by TF-CHI with Word2Vec (CBOW) can increase its precision maximum by 51%, recall maximum by 29%, F-score maximum by 35%, and accuracy maximum by 21%. However, TF-IDF with SMOTE and its variation can be an alternative solution when an anomaly occurs in TF-CHI, that is the formation of a cluster which consists of most or all class labels

    The Devil is in the Tails:How Long-Tailed Code Distributions Impact Large Language Models

    Get PDF
    Learning-based techniques, especially advanced Large Language Models (LLMs) for code, have gained considerable popularity in various software engineering (SE) tasks. However, most existing works focus on designing better learning-based models and pay less attention to the properties of datasets. Learning-based models, including popular LLMs for code, heavily rely on data, and the data's properties (e.g., data distribution) could significantly affect their behavior. We conducted an exploratory study on the distribution of SE data and found that such data usually follows a skewed distribution (i.e., long-tailed distribution) where a small number of classes have an extensive collection of samples, while a large number of classes have very few samples. We investigate three distinct SE tasks and analyze the impacts of long-tailed distribution on the performance of LLMs for code. Our experimental results reveal that the long-tailed distribution has a substantial impact on the effectiveness of LLMs for code. Specifically, LLMs for code perform between 30.0\% and 254.0\% worse on data samples associated with infrequent labels compared to data samples of frequent labels. Our study provides a better understanding of the effects of long-tailed distributions on popular LLMs for code and insights for the future development of SE automation

    Deep Incremental Learning of Imbalanced Data for Just-In-Time Software Defect Prediction

    Full text link
    This work stems from three observations on prior Just-In-Time Software Defect Prediction (JIT-SDP) models. First, prior studies treat the JIT-SDP problem solely as a classification problem. Second, prior JIT-SDP studies do not consider that class balancing processing may change the underlying characteristics of software changeset data. Third, only a single source of concept drift, the class imbalance evolution is addressed in prior JIT-SDP incremental learning models. We propose an incremental learning framework called CPI-JIT for JIT-SDP. First, in addition to a classification modeling component, the framework includes a time-series forecast modeling component in order to learn temporal interdependent relationship in the changesets. Second, the framework features a purposefully designed over-sampling balancing technique based on SMOTE and Principal Curves called SMOTE-PC. SMOTE-PC preserves the underlying distribution of software changeset data. In this framework, we propose an incremental deep neural network model called DeepICP. Via an evaluation using \numprojs software projects, we show that: 1) SMOTE-PC improves the model's predictive performance; 2) to some software projects it can be beneficial for defect prediction to harness temporal interdependent relationship of software changesets; and 3) principal curves summarize the underlying distribution of changeset data and reveals a new source of concept drift that the DeepICP model is proposed to adapt to

    complexFuzzy: A novel clustering method for selecting training instances of cross-project defect prediction

    Get PDF
    Over the last decade, researchers have investigated to what extent cross-project defect prediction (CPDP) shows advantages over traditional defect prediction settings. These works do not take training and testing data of defect prediction from the same project. Instead, dissimilar projects are employed. Selecting proper training data plays an important role in terms of the success of CPDP. In this study, a novel clustering method named complexFuzzy is presented for selecting training data of CPDP. The method is developed by determining membership values with the help of some metrics which can be considered as indicators of complexity. First, CPDP combinations are created on 29 different data sets. Subsequently, complexFuzzy is evaluated by considering cluster centers of data sets and comparing some performance measures including area under the curve (AUC) and F-measure. The method is superior to other five comparison algorithms in terms of the distance of cluster centers and prediction performance

    Does class size matter? An in-depth assessment of the effect of class size in software defect prediction

    Full text link
    In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have shown that size cannot be simply controlled or ignored, when building prediction models. As such, there remains a question whether, and when, to control for class size. This study provides a new in-depth examination of the impact of class size on the relationship between OO metrics and software defects or defect-proneness. We assess the impact of class size on the number of defects and defect-proneness in software systems by employing a regression-based mediation (with bootstrapping) and moderation analysis to investigate the direct and indirect effect of class size in count and binary defect prediction. Our results show that the size effect is not always significant for all metrics. Of the seven OO metrics we investigated, size consistently has significant mediation impact only on the relationship between Coupling Between Objects (CBO) and defects/defect-proneness, and a potential moderation impact on the relationship between Fan-out and defects/defect-proneness. Based on our results we make three recommendations. One, we encourage researchers and practitioners to examine the impact of class size for the specific data they have in hand and through the use of the proposed statistical mediation/moderation procedures. Two, we encourage empirical studies to investigate the indirect effect of possible additional variables in their models when relevant. Three, the statistical procedures adopted in this study could be used in other empirical software engineering research to investigate the influence of potential mediators/moderators.Comment: Accepted to Empirical Software Engineering (to appear). arXiv admin note: text overlap with arXiv:2104.1234
    corecore