19 research outputs found
The relevance of application domains in empirical findings
The term 'software ecosystem' refers to a collection of software systems that are related in some way. Researchers have been using different levels of aggregation to define an ecosystem: grouping them by a common named project (e.g., the Apache ecosystem); or considering all the projects contained in online repositories (e.g., the GoogleCode ecosystem). In this paper we propose a definition of ecosystem based on application domains: software systems are in the same ecosystem if they share the same application domain, as described by a similar technological scope, context or objective. As an example, all projects implementing networking capabilities to trade Bitcoin and other virtual currencies can be considered as part of the same "cryp-tocurrency" ecosystem. Utilising a sample of 100 Java software systems, we derive their application domains using the Latent Dirichlet Allocation (LDA) approach. We then evaluate a suite of object-oriented metrics per ecosystem, and test a null hypothesis: 'the OO metrics of all ecosystems come from the same population'. Our results show that the null hypothesis is rejected for most of the metrics chosen: the ecosystems that we extracted, based on application domains, show different structural properties. From the point of view of the interested stakeholders, this could mean that the health of a software system depends on domain-dependent factors, that could be common to the projects in the same domain-based ecosystem
Recommended from our members
Lexical content as a cooperation aide: a study based on Java software
Collaborative development is a paradigm shift in software development. Loosely coupled developers coordinate their work via distributed versioning systems (SVN, Git, and others), code reviews and priority-led bug tracking systems. This development approach allows many different developers to input additional source code to the same source artifact. This article focuses on the lexical content of the source code produced in a collaborative environment. The lexical content is described as the 'dictionary' of the key terms contained within a source artifact. We posit that the lexical content of a Java class will increase as long as more developers add more content to the same class. We analyse the 100 top-ranked GitHub applications (at the time of the sampling) written in Java. Each of their classes is reduced to its lexical content , its size (in LOCs) recorded, as well as the number of different developers who contributed to its source code. Our results show that (i) the lexical content of Java classes is bounded in size, (ii) more developers make the size of the lexical content larger, and (iii) the lexical content of a system's classes might increase with more developers, but depending on its application domain. The implications for practitioners are twofold: (i) classes with a large set of lexical content should be split in multiple classes, to minimize the need for further maintenance; and (ii) classes developed by many developers should adhere to specific guidelines so that its lexical content does not increase boundlessly. We tested our results in a tailored case study and we confirmed our findings: larger-than-threshold class corpora tend to deteriorate the class cohesion
An empirical analysis of source code metrics and smart contract resource consumption
A smart contract (SC) is a programme stored in the Ethereum blockchain by a contract‐creation transaction. SC developers deploy an instance of the SC and attempt to execute it in exchange for a fee, paid in Ethereum coins (Ether). If the computation needed for their execution turns out to be larger than the effort proposed by the developer (i.e., the gasLimit ), their client instantiation will not be completed successfully.
In this paper, we examine SCs from 11 Ethereum blockchain‐oriented software projects hosted on GitHub.com, and we evaluate the resources needed for their deployment (i.e., the gasUsed ). For each of these contracts, we also extract a suite of object‐oriented metrics, to evaluate their structural characteristics.
Our results show a statistically significant correlation between some of the object‐oriented (OO) metrics and the resources consumed on the Ethereum blockchain network when deploying SCs. This result has a direct impact on how Ethereum developers engage with a SC: evaluating its structural characteristics, they will be able to produce a better estimate of the resources needed to deploy it. Other results show specific source code metrics to be prioritised based on application domains when the projects are clustered based on common themes
Application domains in the Research Papers at BENEVOL: a retrospective
Research on empirical software engineering has increasingly used the data that is made available in online repositories , specifically Free/Libre/Open Source Software projects (FLOSS). The latest trends for researchers is to gather "as much data as possible" to (i) prevent bias in the representation of a small sample, (ii) work with a sample as close as the population itself, and (iii) showcase the performance of existing or new tools in treating vast amount of data. The effects of harvesting enormous amounts of data have been only marginally considered so far: data could be corrupted; repositories could be forked; and developer identities could be duplicated. In this paper we posit that there is a fundamental flaw in harvesting large amounts of data, and when generalising the conclusions: the application domain, or context, of the analysed systems must be the primary factor for the cluster sampling of FLOSS projects. This paper presents two contributions: first, we analyse a collection of 100 BENEVOL papers that appeared showing whether (and how much) FLOSS data has been harvested, and how many times the authors flagged an issue in their different application domains. Second, we discuss the implications of using 'application domain' as the clustering factor in FLOSS sampling, and the generalisations within and outside the clusters
Recommended from our members
The effect of multiple developers on structural attributes: a study based on Java software
Context: Long-term software projects employ different software developers who collaborate on shared artifacts. The accumulation of changes pushed by different developers leave traces on the underlying code, that have an effect on its future maintainability, and even reuse.
Objective: This study focuses on the how the changes by different developers might have an impact on the code: we investigate whether the work of multiple developers, and their experience, have a visible effect on the structural metrics of the underlying code.
Method: We consider nine object-oriented (OO) attributes and we measure them in a GitHub sample containing the top 200 ‘forked’ projects. For each of their classes, we evaluated the number of distinct developers contributing to its source code, and their experience in the project.
Results: We show that the presence of multiple developers working on the same class has a visible effect on the chosen OO metrics, and often in the opposite direction to what the guidelines for each attribute suggest. We also show how the relative experience of developers in a project plays an important role in the distribution of those metrics, and the future maintenance of the Java classes.
Conclusions: Our results show how distributed development has an effect on the structural attributes of a software system and how the experience of developers plays a fundamental role in that effect. We also discover workarounds and best practices in 4 applied case studies
Recommended from our members
Detecting Java software similarities by using different clustering techniques
Background: Research on empirical software engineering has increasingly been conducted by analysing and measuring vast amounts of software systems. Hundreds, thousands and even millions of systems have been (and are) considered by researchers, and often within the same study, in order to test theories, demonstrate approaches or run prediction models. A much less investigated aspect is whether the collected metrics might be context-specific, or whether systems should be better analysed in clusters.
Objective: The objectives of this study are (i) to define a set of clustering techniques that might be used to group similar software systems, and (ii) to evaluate whether a suite of well-known object-oriented metrics is context-specific, and its values differ along the defined clusters.
Method: We group software systems based on three different clustering techniques, and we collect the values of the metrics suite in each cluster. We then test whether clusters are statistically different between each other, using the Kolgomorov-Smirnov (KS) hypothesis testing.
Results: Our results show that, for two of the used techniques, the KS null hypothesis (e.g., the clusters come from the same population) is rejected for most of the metrics chosen: the clusters that we extracted, based on application domains, show statistically different structural properties.
Conclusions: The implications for researchers can be profound: metrics and their interpretation might be more sensitive to context than acknowledged so far, and application domains represent a promising filter to cluster similar systems
Recommended from our members
A framework for a decision support system to optimize cloud-hosted services for multitenancy isolation
The role and value of replication in empirical software engineering results
Context: Concerns have been raised from many quarters regarding the reliability
of empirical research findings and this includes software engineering.
Replication has been proposed as an important means of increasing confidence.
Objective: We aim to better understand the value of replication studies, the
level of confirmation between replication and original studies, what confirmation
means in a statistical sense and what factors modify this relationship.
Method: We perform a systematic review to identify relevant replication experimental
studies in the areas of (i) software project effort prediction and (ii)
pair programming. Where sufficient details are provided we compute prediction
intervals.
Results: Our review locates 28 unique articles that describe replications of 35
original studies that address 75 research questions. Of these 10 are external,
15 internal and 3 internal-same-article replications. The odds ratio of internal
to external (conducted by independent researchers) replications of obtaining a
‘confirmatory’ result is 8.64. We also found incomplete reporting hampered our
ability to extract estimates of effect sizes. Where we are able to compute replication
prediction intervals these were surprisingly large.
Conclusion: We show that there is substantial evidence to suggest that current
approaches to empirical replications are highly problematic. There is a
consensus that replications are important, but there is a need for better reporting
of both original and replicated studies. Given the low power and incomplete
reporting of many original studies, it can be unclear the extent to which a replication
is confirmatory and to what extent it yields additional knowledge to
the software engineering community. We recommend attention is switched from
replication research to meta-analysis