6,262 research outputs found

    Evaluating defect prediction approaches: a benchmark and an extensive comparison

    Get PDF
    Reliably predicting software defects is one of the holy grails of software engineering. Researchers have devised and implemented a plethora of defect/bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches. We present a benchmark for defect prediction, in the form of a publicly available dataset consisting of several software systems, and provide an extensive comparison of well-known bug prediction approaches, together with novel approaches we devised. We evaluate the performance of the approaches using different performance indicators: classification of entities as defect-prone or not, ranking of the entities, with and without taking into account the effort to review an entity. We performed three sets of experiments aimed at (1) comparing the approaches across different systems, (2) testing whether the differences in performance are statistically significant, and (3) investigating the stability of approaches across different learners. Our results indicate that, while some approaches perform better than others in a statistically significant manner, external validity in defect prediction is still an open problem, as generalizing results to different contexts/learners proved to be a partially unsuccessful endeavo

    On the feasibility of automated prediction of bug and non-bug issues

    Get PDF
    Context Issue tracking systems are used to track and describe tasks in the development process, e.g., requested feature improvements or reported bugs. However, past research has shown that the reported issue types often do not match the description of the issue. Objective We want to understand the overall maturity of the state of the art of issue type prediction with the goal to predict if issues are bugs and evaluate if we can improve existing models by incorporating manually specified knowledge about issues. Method We train different models for the title and description of the issue to account for the difference in structure between these fields, e.g., the length. Moreover, we manually detect issues whose description contains a null pointer exception, as these are strong indicators that issues are bugs. Results Our approach performs best overall, but not significantly different from an approach from the literature based on the fastText classifier from Facebook AI Research. The small improvements in prediction performance are due to structural information about the issues we used. We found that using information about the content of issues in form of null pointer exceptions is not useful. We demonstrate the usefulness of issue type prediction through the example of labelling bugfixing commits. Conclusions Issue type prediction can be a useful tool if the use case allows either for a certain amount of missed bug reports or the prediction of too many issues as bug is acceptable

    Active Learning of Discriminative Subgraph Patterns for API Misuse Detection

    Full text link
    A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern's frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects

    On the feasibility of automated prediction of bug and non-bug issues

    Get PDF
    Context: Issue tracking systems are used to track and describe tasks in the development process, e.g., requested feature improvements or reported bugs. However, past research has shown that the reported issue types often do not match the description of the issue. Objective: We want to understand the overall maturity of the state of the art of issue type prediction with the goal to predict if issues are bugs and evaluate if we can improve existing models by incorporating manually specified knowledge about issues. Method: We train different models for the title and description of the issue to account for the difference in structure between these fields, e.g., the length. Moreover, we manually detect issues whose description contains a null pointer exception, as these are strong indicators that issues are bugs. Results: Our approach performs best overall, but not significantly different from an approach from the literature based on the fastText classifier from Facebook AI Research. The small improvements in prediction performance are due to structural information about the issues we used. We found that using information about the content of issues in form of null pointer exceptions is not useful. We demonstrate the usefulness of issue type prediction through the example of labelling bugfixing commits. Conclusions: Issue type prediction can be a useful tool if the use case allows either for a certain amount of missed bug reports or the prediction of too many issues as bug is acceptable

    Extended Rate, more GFUN

    Get PDF
    We present a software package that guesses formulae for sequences of, for example, rational numbers or rational functions, given the first few terms. We implement an algorithm due to Bernhard Beckermann and George Labahn, together with some enhancements to render our package efficient. Thus we extend and complement Christian Krattenthaler's program Rate, the parts concerned with guessing of Bruno Salvy and Paul Zimmermann's GFUN, the univariate case of Manuel Kauers' Guess.m and Manuel Kauers' and Christoph Koutschan's qGeneratingFunctions.m.Comment: 26 page

    FairMask: Better Fairness via Model-based Rebalancing of Protected Attributes

    Full text link
    Context: Machine learning software can generate models that inappropriately discriminate against specific protected social groups (e.g., groups based on gender, ethnicity, etc). Motivated by those results, software engineering researchers have proposed many methods for mitigating those discriminatory effects. While those methods are effective in mitigating bias, few of them can provide explanations on what is the root cause of bias. Objective: We aim at better detection and mitigation of algorithmic discrimination in machine learning software problems. Method: Here we propose xFAIR, a model-based extrapolation method, that is capable of both mitigating bias and explaining the cause. In our xFAIR approach, protected attributes are represented by models learned from the other independent variables (and these models offer extrapolations over the space between existing examples). We then use the extrapolation models to relabel protected attributes later seen in testing data or deployment time. Our approach aims to offset the biased predictions of the classification model via rebalancing the distribution of protected attributes. Results: The experiments of this paper show that, without compromising (original) model performance, xFAIR can achieve significantly better group and individual fairness (as measured in different metrics) than benchmark methods. Moreover, when compared to another instance-based rebalancing method, our model-based approach shows faster runtime and thus better scalability. Conclusion: Algorithmic decision bias can be removed via extrapolation that smooths away outlier points. As evidence for this, our proposed xFAIR is not only performance-wise better (measured by fairness and performance metrics) than two state-of-the-art fairness algorithms.Comment: 14 pages, 6 figures, 7 tables, accepted by TS

    Weighted KNN Menggunakan Grey Relational Analysis untuk Solusi Nilai yang Absen pada Prediksi Kesalahan Perangkat Lunak Lintas Ranah

    Get PDF
    Prediksi kesalahan perangkat lunak memiliki peranan penting dalam mendeteksi komponen yang paling rentan terjadi kesalahan perangkat lunak. Beberapa penelitian telah berupaya meningkatkan akurasi prediksi kesalahan perangkat lunak agar dapat mengelola sumber daya (manusia, biaya dan waktu) lebih baik. Namun penelitian sebelumnya masih membuat model prediksi kesalahan perangkat lunak untuk ranah tertentu saja. Belum terdapat penanganan dataset yang lintas ranah. Penelitian ini memperbaiki model prediksi kesalahan perangkat lunak agar dapat menangani dataset yang digabung (lintas ranah) dengan jumlah fitur yang berbeda-beda. Agar jumlah fitur tiap dataset seimbang maka pengisian nilai yang absen akibat penggabungan dataset lintas ranah dilakukan. Penelitian ini mengembangkan metode weighted KNN untuk mengisi nilai yang absen tersebut. Dataset yang diperlengkapi tesebut selanjutnya diklasifikasi menggunakan naive bayes dan random forest. Penelitian ini juga mencari kumpulan fitur apa yang relevan dalam mendeteksi defect dengan cara melakukan analisis perbandingan metode seleksi fitur. Untuk pengujian, penelitian ini menggunakan tujuh dataset NASA public MDP (Modular toolkit for Data Preprocessing). Hasil pengujian menunjukkan bahwa data tidak imbang (imbalance) menghasilkan nilai balance terbaik jika metode naive bayes dikombinasi dengan metode seleksi fitur information gain (IG) atau symmetric uncertainty (SU), yaitu 0.4975. Hasil pengujian juga menunjukkan bahwa data imbang (balance) menghasilkan nilai balance terbaik jika metode random forest dikombinasi dengan metode seleksi fitur gain ratio (GR), yaitu 0.7795. Secara umum, hasil klasifikasi dengan masing-masing tujuh dataset NASA public MDP relative tidak jauh berbeda dari hasil klasifikasi data lintas ranah dimana hasil klasifikasi lintah ranah adalah 0.4975. Bahkan hasil ini masih diatas dari hasil klasifikasi atas dataset PC2, yaitu 0.4033. ================================================================================================ Defect prediction plays important roles in detecting vulnerable component within a software. Some researches have tried to improve the accuration of software defect prediction so that it helps developer to manage resources (human, cost, and time) better. Those researches focus on building the software defect prediction model only for a specific domain. Research on cross-project domain has not been carried out before. This research developed a software defect prediction model for cross-project domain. Thus, the domain contains datasets with different number of features. To extend shorted features in a dataset, the method calculates the missing values. This research developed a method, called weighted KNN, to fill in the missing value. The refill datasets are then classified using naive bayes and random forest. This research also conducted a feature selection process to select relevant featues for detecting defects by means of a comparative analysis of methods of selection of features. For the experimentation, this research used seven NASA public dataset MDPs. The results show that for imbalance data, naïve bayes combined with information gain (IG) or symmetric uncertainty (SU) feature selection produced the best balance, i.e. 0.4975. It also shows that for balance data, random forest combined with gain ratio (GR) produced the best balance, i.e. 0.7795. In general, the developed method performed relatively alike the previous method, which classifiy only specific domain, i.e. 0.4975. It even outperformed previous method for dataset PC2, i.e. 0.4033
    corecore