3,857 research outputs found
Towards Automated Performance Bug Identification in Python
Context: Software performance is a critical non-functional requirement,
appearing in many fields such as mission critical applications, financial, and
real time systems. In this work we focused on early detection of performance
bugs; our software under study was a real time system used in the
advertisement/marketing domain.
Goal: Find a simple and easy to implement solution, predicting performance
bugs.
Method: We built several models using four machine learning methods, commonly
used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian
Networks, and Logistic Regression.
Results: Our empirical results show that a C4.5 model, using lines of code
changed, file's age and size as explanatory variables, can be used to predict
performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that
reducing the number of changes delivered on a commit, can decrease the chance
of performance bug injection.
Conclusions: We believe that our approach can help practitioners to eliminate
performance bugs early in the development cycle. Our results are also of
interest to theoreticians, establishing a link between functional bugs and
(non-functional) performance bugs, and explicitly showing that attributes used
for prediction of functional bugs can be used for prediction of performance
bugs
Bayesian Hierarchical Modelling for Tailoring Metric Thresholds
Software is highly contextual. While there are cross-cutting `global'
lessons, individual software projects exhibit many `local' properties. This
data heterogeneity makes drawing local conclusions from global data dangerous.
A key research challenge is to construct locally accurate prediction models
that are informed by global characteristics and data volumes. Previous work has
tackled this problem using clustering and transfer learning approaches, which
identify locally similar characteristics. This paper applies a simpler approach
known as Bayesian hierarchical modeling. We show that hierarchical modeling
supports cross-project comparisons, while preserving local context. To
demonstrate the approach, we conduct a conceptual replication of an existing
study on setting software metrics thresholds. Our emerging results show our
hierarchical model reduces model prediction error compared to a global approach
by up to 50%.Comment: Short paper, published at MSR '18: 15th International Conference on
Mining Software Repositories May 28--29, 2018, Gothenburg, Swede
Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference
Mutation analysis can effectively capture the dependency between source code
and test results. This has been exploited by Mutation Based Fault Localisation
(MBFL) techniques. However, MBFL techniques suffer from the need to expend the
high cost of mutation analysis after the observation of failures, which may
present a challenge for its practical adoption. We introduce SIMFL (Statistical
Inference for Mutation-based Fault Localisation), an MBFL technique that allows
users to perform the mutation analysis in advance against an earlier version of
the system. SIMFL uses mutants as artificial faults and aims to learn the
failure patterns among test cases against different locations of mutations.
Once a failure is observed, SIMFL requires either almost no or very small
additional cost for analysis, depending on the used inference model. An
empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL
can successfully localise up to 103 faults at the top, and 152 faults within
the top five, on par with state-of-the-art alternatives. The cost of mutation
analysis can be further reduced by mutation sampling: SIMFL retains over 80% of
its localisation accuracy at the top rank when using only 10% of generated
mutants, compared to results obtained without sampling
Software quality and reliability prediction using Dempster -Shafer theory
As software systems are increasingly deployed in mission critical applications, accurate quality and reliability predictions are becoming a necessity. Most accurate prediction models require extensive testing effort, implying increased cost and slowing down the development life cycle. We developed two novel statistical models based on Dempster-Shafer theory, which provide accurate predictions from relatively small data sets of direct and indirect software reliability and quality predictors. The models are flexible enough to incorporate information generated throughout the development life-cycle to improve the prediction accuracy.;Our first contribution is an original algorithm for building Dempster-Shafer Belief Networks using prediction logic. This model has been applied to software quality prediction. We demonstrated that the prediction accuracy of Dempster-Shafer Belief Networks is higher than that achieved by logistic regression, discriminant analysis, random forests, as well as the algorithms in two machine learning software packages, See5 and WEKA. The difference in the performance of the Dempster-Shafer Belief Networks over the other methods is statistically significant.;Our second contribution is also based on a practical extension of Dempster-Shafer theory. The major limitation of the Dempsters rule and other known rules of evidence combination is the inability to handle information coming from correlated sources. Motivated by inherently high correlations between early life-cycle predictors of software reliability, we extended Murphy\u27s rule of combination to account for these correlations. When used as a part of the methodology that fuses various software reliability prediction systems, this rule provided more accurate predictions than previously reported methods. In addition, we proposed an algorithm, which defines the upper and lower bounds of the belief function of the combination results. To demonstrate its generality, we successfully applied it in the design of the Online Safety Monitor, which fuses multiple correlated time varying estimations of convergence of neural network learning in an intelligent flight control system
Software Defect Prediction Based on Classication Rule Mining
There has been rapid growth of software development. Due to various causes, the software comes with many defects. In Software development process, testing of software is the main phase which reduces the defects of the software. If a developer or a tester can predict the software defects properly then, it reduces the cost, time and eort. In this paper, we show a comparative analysis of software defect prediction based on classifcation rule mining. We propose a scheme for this process and we choose different classication algorithms. Showing the comparison of predictions in software defects analysis. This evaluation analyzes the prediction performance of competing learning schemes for given historical data sets(NASA MDP Data Set). The result of this scheme evaluation shows that we have to choose different classifer rule for different data set
A Review of Metrics and Modeling Techniques in Software Fault Prediction Model Development
This paper surveys different software fault predictions progressed through different data analytic techniques reported in the software engineering literature. This study split in three broad areas; (a) The description of software metrics suites reported and validated in the literature. (b) A brief outline of previous research published in the development of software fault prediction model based on various analytic techniques. This utilizes the taxonomy of analytic techniques while summarizing published research. (c) A review of the advantages of using the combination of metrics. Though, this area is comparatively new and needs more research efforts
- …