26 research outputs found

    Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

    Full text link
    Software is highly contextual. While there are cross-cutting `global' lessons, individual software projects exhibit many `local' properties. This data heterogeneity makes drawing local conclusions from global data dangerous. A key research challenge is to construct locally accurate prediction models that are informed by global characteristics and data volumes. Previous work has tackled this problem using clustering and transfer learning approaches, which identify locally similar characteristics. This paper applies a simpler approach known as Bayesian hierarchical modeling. We show that hierarchical modeling supports cross-project comparisons, while preserving local context. To demonstrate the approach, we conduct a conceptual replication of an existing study on setting software metrics thresholds. Our emerging results show our hierarchical model reduces model prediction error compared to a global approach by up to 50%.Comment: Short paper, published at MSR '18: 15th International Conference on Mining Software Repositories May 28--29, 2018, Gothenburg, Swede

    How reliable are systematic reviews in empirical software engineering?

    Get PDF
    BACKGROUND – the systematic review is becoming a more commonly employed research instrument in empirical software engineering. Before undue reliance is placed on the outcomes of such reviews it would seem useful to consider the robustness of the approach in this particular research context. OBJECTIVE – the aim of this study is to assess the reliability of systematic reviews as a research instrument. In particular we wish to investigate the consistency of process and the stability of outcomes. METHOD – we compare the results of two independent reviews under taken with a common research question. RESULTS – the two reviews find similar answers to the research question, although the means of arriving at those answers vary. CONCLUSIONS – in addressing a well-bounded research question, groups of researchers with similar domain experience can arrive at the same review outcomes, even though they may do so in different ways. This provides evidence that, in this context at least, the systematic review is a robust research method

    Comparative analysis of meta-analysis methods: when to use which?

    Full text link
    Background: Several meta-analysis methods can be used to quantitatively combine the results of a group of experiments, including the weighted mean difference, statistical vote counting, the parametric response ratio and the non-parametric response ratio. The software engineering community has focused on the weighted mean difference method. However, other meta-analysis methods have distinct strengths, such as being able to be used when variances are not reported. There are as yet no guidelines to indicate which method is best for use in each case. Aim: Compile a set of rules that SE researchers can use to ascertain which aggregation method is best for use in the synthesis phase of a systematic review. Method: Monte Carlo simulation varying the number of experiments in the meta analyses, the number of subjects that they include, their variance and effect size. We empirically calculated the reliability and statistical power in each case Results: WMD is generally reliable if the variance is low, whereas its power depends on the effect size and number of subjects per meta-analysis; the reliability of RR is generally unaffected by changes in variance, but it does require more subjects than WMD to be powerful; NPRR is the most reliable method, but it is not very powerful; SVC behaves well when the effect size is moderate, but is less reliable with other effect sizes. Detailed tables of results are annexed. Conclusions: Before undertaking statistical aggregation in software engineering, it is worthwhile checking whether there is any appreciable difference in the reliability and power of the methods. If there is, software engineers should select the method that optimizes both parameters

    Cross-Dataset Design Discussion Mining

    Full text link
    Being able to identify software discussions that are primarily about design, which we call design mining, can improve documentation and maintenance of software systems. Existing design mining approaches have good classification performance using natural language processing (NLP) techniques, but the conclusion stability of these approaches is generally poor. A classifier trained on a given dataset of software projects has so far not worked well on different artifacts or different datasets. In this study, we replicate and synthesize these earlier results in a meta-analysis. We then apply recent work in transfer learning for NLP to the problem of design mining. However, for our datasets, these deep transfer learning classifiers perform no better than less complex classifiers. We conclude by discussing some reasons behind the transfer learning approach to design mining.Comment: accepted for SANER 2020, Feb, London, ON. 12 pages. Replication package: https://doi.org/10.5281/zenodo.359012

    Understanding the Role and Methods of Meta-Analysis in IS Research

    Get PDF
    Four methods for reviewing a body of research literature - narrative review, descriptive review, vote-counting, and meta-analysis - are compared. Meta-analysis as a formalized, systematic review method is discussed in detail in terms of its history, current status, advantages, common analytic methods, and recent developments

    Reporting experiments to satisfy professionals information needs

    Get PDF
    Although the aim of empirical software engineering is to provide evidence for selecting the appropriate technology, it appears that there is a lack of recognition of this work in industry. Results from empirical research only rarely seem to find their way to company decision makers. If information relevant for software managers is provided in reports on experiments, such reports can be considered as a source of information for them when they are faced with making decisions about the selection of software engineering technologies. To bridge this communication gap between researchers and professionals, we propose characterizing the information needs of software managers in order to show empirical software engineering researchers which information is relevant for decision-making and thus enable them to make this information available. We empirically investigated decision makers? information needs to identify which information they need to judge the appropriateness and impact of a software technology. We empirically developed a model that characterizes these needs. To ensure that researchers provide relevant information when reporting results from experiments, we extended existing reporting guidelines accordingly.We performed an experiment to evaluate our model with regard to its effectiveness. Software managers who read an experiment report according to the proposed model judged the technology?s appropriateness significantly better than those reading a report about the same experiment that did not explicitly address their information needs. Our research shows that information regarding a technology, the context in which it is supposed to work, and most importantly, the impact of this technology on development costs and schedule as well as on product quality is crucial for decision makers

    Understanding replication of experiments in software engineering: a classification

    Get PDF
    Context: Replication plays an important role in experimental disciplines. There are still many uncertain- ties about how to proceed with replications of SE experiments. Should replicators reuse the baseline experiment materials? How much liaison should there be among the original and replicating experiment- ers, if any? What elements of the experimental configuration can be changed for the experiment to be considered a replication rather than a new experiment? Objective: To improve our understanding of SE experiment replication, in this work we propose a classi- fication which is intend to provide experimenters with guidance about what types of replication they can perform. Method: The research approach followed is structured according to the following activities: (1) a litera- ture review of experiment replication in SE and in other disciplines, (2) identification of typical elements that compose an experimental configuration, (3) identification of different replications purposes and (4) development of a classification of experiment replications for SE. Results: We propose a classification of replications which provides experimenters in SE with guidance about what changes can they make in a replication and, based on these, what verification purposes such a replication can serve. The proposed classification helped to accommodate opposing views within a broader framework, it is capable of accounting for less similar replications to more similar ones regarding the baseline experiment. Conclusion: The aim of replication is to verify results, but different types of replication serve special ver- ification purposes and afford different degrees of change. Each replication type helps to discover partic- ular experimental conditions that might influence the results. The proposed classification can be used to identify changes in a replication and, based on these, understand the level of verification

    Do software models based on the UML aid in source-code comprehensibility? Aggregating evidence from 12 controlled experiments

    Get PDF
    In this paper, we present the results of long-term research conducted in order to study the contribution made by software models based on the Unified Modeling Language (UML) to the comprehensibility of Java source-code deprived of comments. We have conducted 12 controlled experiments in different experimental contexts and on different sites with participants with different levels of expertise (i.e., Bachelor’s, Master’s, and PhD students and software practitioners from Italy and Spain). A total of 333 observations were obtained from these experiments. The UML models in our experiments were those produced in the analysis and design phases. The models produced in the analysis phase were created with the objective of abstracting the environment in which the software will work (i.e., the problem domain), while those produced in the design phase were created with the goal of abstracting implementation aspects of the software (i.e., the solution/application domain). Source-code comprehensibility was assessed with regard to correctness of understanding, time taken to accomplish the comprehension tasks, and efficiency as regards accomplishing those tasks. In order to study the global effect of UML models on source-code comprehensibility, we aggregated results from the individual experiments using a meta-analysis. We made every effort to account for the heterogeneity of our experiments when aggregating the results obtained from them. The overall results suggest that the use of UML models affects the comprehensibility of source-code, when it is deprived of comments. Indeed, models produced in the analysis phase might reduce source-code comprehensibility, while increasing the time taken to complete comprehension tasks. That is, browsing source code and this kind of models together negatively impacts on the time taken to complete comprehension tasks without having a positive effect on the comprehensibility of source code. One plausible justification for this is that the UML models produced in the analysis phase focus on the problem domain. That is, models produced in the analysis phase say nothing about source code and there should be no expectation that they would, in any way, be beneficial to comprehensibility. On the other hand, UML models produced in the design phase improve source-code comprehensibility. One possible justification for this result is that models produced in the design phase are more focused on implementation details. Therefore, although the participants had more material to read and browse, this additional effort was paid back in the form of an improved comprehension of source code
    corecore