19 research outputs found

    The relevance of application domains in empirical findings

    Get PDF
    The term 'software ecosystem' refers to a collection of software systems that are related in some way. Researchers have been using different levels of aggregation to define an ecosystem: grouping them by a common named project (e.g., the Apache ecosystem); or considering all the projects contained in online repositories (e.g., the GoogleCode ecosystem). In this paper we propose a definition of ecosystem based on application domains: software systems are in the same ecosystem if they share the same application domain, as described by a similar technological scope, context or objective. As an example, all projects implementing networking capabilities to trade Bitcoin and other virtual currencies can be considered as part of the same "cryp-tocurrency" ecosystem. Utilising a sample of 100 Java software systems, we derive their application domains using the Latent Dirichlet Allocation (LDA) approach. We then evaluate a suite of object-oriented metrics per ecosystem, and test a null hypothesis: 'the OO metrics of all ecosystems come from the same population'. Our results show that the null hypothesis is rejected for most of the metrics chosen: the ecosystems that we extracted, based on application domains, show different structural properties. From the point of view of the interested stakeholders, this could mean that the health of a software system depends on domain-dependent factors, that could be common to the projects in the same domain-based ecosystem

    An empirical analysis of source code metrics and smart contract resource consumption

    Get PDF
    A smart contract (SC) is a programme stored in the Ethereum blockchain by a contract‐creation transaction. SC developers deploy an instance of the SC and attempt to execute it in exchange for a fee, paid in Ethereum coins (Ether). If the computation needed for their execution turns out to be larger than the effort proposed by the developer (i.e., the gasLimit ), their client instantiation will not be completed successfully. In this paper, we examine SCs from 11 Ethereum blockchain‐oriented software projects hosted on GitHub.com, and we evaluate the resources needed for their deployment (i.e., the gasUsed ). For each of these contracts, we also extract a suite of object‐oriented metrics, to evaluate their structural characteristics. Our results show a statistically significant correlation between some of the object‐oriented (OO) metrics and the resources consumed on the Ethereum blockchain network when deploying SCs. This result has a direct impact on how Ethereum developers engage with a SC: evaluating its structural characteristics, they will be able to produce a better estimate of the resources needed to deploy it. Other results show specific source code metrics to be prioritised based on application domains when the projects are clustered based on common themes

    Application domains in the Research Papers at BENEVOL: a retrospective

    Get PDF
    Research on empirical software engineering has increasingly used the data that is made available in online repositories , specifically Free/Libre/Open Source Software projects (FLOSS). The latest trends for researchers is to gather "as much data as possible" to (i) prevent bias in the representation of a small sample, (ii) work with a sample as close as the population itself, and (iii) showcase the performance of existing or new tools in treating vast amount of data. The effects of harvesting enormous amounts of data have been only marginally considered so far: data could be corrupted; repositories could be forked; and developer identities could be duplicated. In this paper we posit that there is a fundamental flaw in harvesting large amounts of data, and when generalising the conclusions: the application domain, or context, of the analysed systems must be the primary factor for the cluster sampling of FLOSS projects. This paper presents two contributions: first, we analyse a collection of 100 BENEVOL papers that appeared showing whether (and how much) FLOSS data has been harvested, and how many times the authors flagged an issue in their different application domains. Second, we discuss the implications of using 'application domain' as the clustering factor in FLOSS sampling, and the generalisations within and outside the clusters

    The role and value of replication in empirical software engineering results

    Get PDF
    Context: Concerns have been raised from many quarters regarding the reliability of empirical research findings and this includes software engineering. Replication has been proposed as an important means of increasing confidence. Objective: We aim to better understand the value of replication studies, the level of confirmation between replication and original studies, what confirmation means in a statistical sense and what factors modify this relationship. Method: We perform a systematic review to identify relevant replication experimental studies in the areas of (i) software project effort prediction and (ii) pair programming. Where sufficient details are provided we compute prediction intervals. Results: Our review locates 28 unique articles that describe replications of 35 original studies that address 75 research questions. Of these 10 are external, 15 internal and 3 internal-same-article replications. The odds ratio of internal to external (conducted by independent researchers) replications of obtaining a ‘confirmatory’ result is 8.64. We also found incomplete reporting hampered our ability to extract estimates of effect sizes. Where we are able to compute replication prediction intervals these were surprisingly large. Conclusion: We show that there is substantial evidence to suggest that current approaches to empirical replications are highly problematic. There is a consensus that replications are important, but there is a need for better reporting of both original and replicated studies. Given the low power and incomplete reporting of many original studies, it can be unclear the extent to which a replication is confirmatory and to what extent it yields additional knowledge to the software engineering community. We recommend attention is switched from replication research to meta-analysis
    corecore