Search CORE

165,887 research outputs found

Robust Statistical Methods for Empirical Software Engineering

Author: A Agresti
A Arcuri
A Vargha
AF Tappenden
Amnart Pohthong
Barbara Kitchenham
BL Welch
D Budgen
D Budgen
David Budgen
DE Stout
DM Erceg-Hurn
DW Zimmerman
DW Zimmerman
E Brunner
GEP Box
GS Mudholkar
HC Kraemer
J Demšar
Jacky Keung
JT Behrens
JW Cohen
JW Cohen
K Dejaeger
KK Yuen
L Acion
L Madeyski
L Madeyski
L Madeyski
Lech Madeyski
LK John
M El-Attar
M El-Attar
M Jureczko
MG Akritas
MG Akritas
MW Lipsey
N Cliff
NM Razali
P Shrout
PA Whigham
Pearl Brereton
PH Ramsey
R Bergmann
RB D’Agostino
RJ Grissom
RM Price
RR Wilcox
RR Wilcox
Shirley Gibbs
SL Braver
SS Shapiro
Stuart Charters
T Dybå
T Micceri
T Tian
TC Urdan
VB Kampenes
W Conover
W Viechtbauer
WR Shadish
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

Predicting with sparse data

Author: Cartwright MH
Shepperd MJ
Publication venue: IEEE Computer Society
Publication date: 01/01/2001
Field of study

It is well known that effective prediction of project cost related factors is an important aspect of software engineering. Unfortunately, despite extensive research over more than 30 years, this remains a significant problem for many practitioners. A major obstacle is the absence of reliable and systematic historic data, yet this is a sine qua non for almost all proposed methods: statistical, machine learning or calibration of existing models. In this paper we describe our sparse data method (SDM) based upon a pairwise comparison technique and Saaty's Analytic Hierarchy Process (AHP). Our minimum data requirement is a single known point. The technique is supported by a software tool known as DataSalvage. We show, for data from two companies, how our approach — based upon expert judgement — adds value to expert judgement by producing significantly more accurate and less biased results. A sensitivity analysis shows that our approach is robust to pairwise comparison errors. We then describe the results of a small usability trial with a practising project manager. From this empirical work we conclude that the technique is promising and may help overcome some of the present barriers to effective project prediction

Brunel University Research Archive

Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

Author: Feldt Robert
Furia Carlo A.
Gren Lucas
Huang Ziwei
Neto Francisco Gomes de Oliveira
Torkar Richard
Publication venue
Publication date: 01/01/2019
Field of study

Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

arXiv.org e-Print Archive

Chalmers Research

Using blind analysis for software engineering experiments

Author: Efron B.
Hutton J.
Kirsopp C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/04/2015
Field of study

Context: In recent years there has been growing concern about conflicting experimental results in empirical software engineering. This has been paralleled by awareness of how bias can impact research results. Objective: To explore the practicalities of blind analysis of experimental results to reduce bias. Method : We apply blind analysis to a real software engineering experiment that compares three feature weighting approaches with a na ̈ıve benchmark (sample mean) to the Finnish software effort data set. We use this experiment as an example to explore blind analysis as a method to reduce researcher bias. Results: Our experience shows that blinding can be a relatively straightforward procedure. We also highlight various statistical analysis decisions which ought not be guided by the hunt for statistical significance and show that results can be inverted merely through a seemingly inconsequential statistical nicety (i.e., the degree of trimming). Conclusion: Whilst there are minor challenges and some limits to the degree of blinding possible, blind analysis is a very practical and easy to implement method that supports more objective analysis of experimental results. Therefore we argue that blind analysis should be the norm for analysing software engineering experiments

Crossref

Brunel University Research Archive

How reliable are systematic reviews in empirical software engineering?

Author: Kitchenham BA
MacDonell SG
Mendes E
Shepperd MJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2010
Field of study

BACKGROUND – the systematic review is becoming a more commonly employed research instrument in empirical software engineering. Before undue reliance is placed on the outcomes of such reviews it would seem useful to consider the robustness of the approach in this particular research context. OBJECTIVE – the aim of this study is to assess the reliability of systematic reviews as a research instrument. In particular we wish to investigate the consistency of process and the stability of outcomes. METHOD – we compare the results of two independent reviews under taken with a common research question. RESULTS – the two reviews ﬁnd similar answers to the research question, although the means of arriving at those answers vary. CONCLUSIONS – in addressing a well-bounded research question, groups of researchers with similar domain experience can arrive at the same review outcomes, even though they may do so in different ways. This provides evidence that, in this context at least, the systematic review is a robust research method

Crossref

AUT Scholarly Commons

Brunel University Research Archive

Recommended from our members

Data sets and data quality in software engineering

Author: Liebchen G
Shepperd M J
Publication venue: ACM Press
Publication date: 01/01/2008
Field of study

OBJECTIVE - to assess the extent and types of techniques used to manage quality within software engineering data sets. We consider this a particularly interesting question in the context of initiatives to promote sharing and secondary analysis of data sets. METHOD - we perform a systematic review of available empirical software engineering studies. RESULTS - only 23 out of the many hundreds of studies assessed, explicitly considered data quality. CONCLUSIONS - first, the community needs to consider the quality and appropriateness of the data set being utilised; not all data sets are equal. Second, we need more research into means of identifying, and ideally repairing, noisy cases. Third, it should become routine to use sensitivity analysis to assess conclusion stability with respect to the assumptions that must be made concerning noise levels

Brunel University Research Archive

The consistency of empirical comparisons of regression and analogy-based software project cost prediction

Author: Mair CM
Shepperd M J
Publication venue: IEEE Computer Society
Publication date: 01/01/2005
Field of study

OBJECTIVE - to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD – we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS – our analysis found that about 25% of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogy-based methods. CONCLUSIONS – we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: “What is the best prediction system?

Brunel University Research Archive

Evaluating prediction systems in software project estimation

Author: MacDonell S
Shepperd M
Publication venue: 'Elsevier BV'
Publication date: 14/06/2012
Field of study

This is the Pre-print version of the Article - Copyright @ 2012 ElsevierContext: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal foundation to interpret results with a particular focus on continuous prediction systems. Method: A new framework is proposed for evaluating competing prediction systems based upon (1) an unbiased statistic, Standardised Accuracy, (2) testing the result likelihood relative to the baseline technique of random ‘predictions’, that is guessing, and (3) calculation of effect sizes. Results: Previously published empirical evaluations of prediction systems are re-examined and the original conclusions shown to be unsafe. Additionally, even the strongest results are shown to have no more than a medium effect size relative to random guessing. Conclusions: Biased accuracy statistics such as MMRE are deprecated. By contrast this new empirical validation framework leads to meaningful results. Such steps will assist in performing future meta-analyses and in providing more robust and usable recommendations to practitioners.Martin Shepperd was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/H050329

AUT Scholarly Commons

Brunel University Research Archive