805 research outputs found
Recommended from our members
Similarities, challenges and opportunities of wikipedia content and open source projects
Copyright @ 2012 John Wiley & Sons, Ltd.Several years of research and evidence have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to
increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of “usefulness” and “modularity” we isolate valuable content in both Wikipedia pages and OSS projects
A Quantitative Analysis of Open Source Software Code Quality: Insights from Metric Distributions
Code quality is a crucial construct in open-source software (OSS) with three
dimensions: maintainability, reliability, and functionality. To accurately
measure them, we divide 20 distinct metrics into two types: 1) threshold-type
metrics that influence code quality in a monotonic manner; 2)
non-threshold-type metrics that lack a monotonic relationship to evaluate. We
propose a distribution-based method to provide scores for metrics, which
demonstrates great explainability on OSS adoption. Our empirical analysis
includes more than 36,460 OSS projects and their raw metrics from SonarQube and
CK. Our work contributes to the understanding of the multi-dimensional
construct of code quality and its metric measurements
Recommended from our members
An empirical study of package coupling in Java open-source
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Excessive coupling between object-oriented classes in systems is generally acknowledged as harmful and is recognised as a maintenance problem that can result in a higher propensity for faults in systems and a „stored up‟ future problem. Characterisation and understanding coupling at different levels of abstraction is therefore important for both the project manager and developer both of whom have a vested interest in software quality. In this Thesis, coupling trends are empirically investigated over multiple versions of seven Java open-source systems (OSS). The first investigation explores the trends in longitudinal changes to open-source systems given by six coupling metrics. Coupling trends are then explored from the perspective of: the relationship between removed classes and their coupling with other classes in the same package; the relationships between coupling and 'warnings’ in packages and the time interval between versions in Java OSS; the relationship between some of these coupling metrics are also explored. Finally, the existence of an 80/20 rule for the coupling metrics is inspected. Results suggest that developer activity comprises a set of high and low periods (peak and trough‟ effect) evident as a system evolves. Findings also demonstrate that addition of coupling may have beneficial effects on a system, particularly if they are added as new functionality through the package Java feature. The fan-in and fan-out coupling metrics reveal particular features and exhibited a wide range of traits in the classes depending on their high or low values; finally, we revealed that one metric (fan-in) is the only metric that appears consistently to exhibit an 80/20 (Pareto) relationship
"My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors
GitHub Sponsors was launched in 2019, enabling donations to open-source
software developers to provide financial support, as per GitHub's slogan:
"Invest in the projects you depend on". However, a 2022 study on GitHub
Sponsors found that only two-fifths of developers who were seeking sponsorship
received a donation. The study found that, other than internal actions (such as
offering perks to sponsors), developers had advertised their GitHub Sponsors
profiles on social media, such as Twitter (also known as X). Therefore, in this
work, we investigate the impact of tweets that contain links to GitHub Sponsors
profiles on sponsorship, as well as their reception on Twitter/X. We further
characterize these tweets to understand their context and find that (1) such
tweets have the impact of increasing the number of sponsors acquired, (2)
compared to other donation platforms such as Open Collective and Patreon,
GitHub Sponsors has significantly fewer interactions but is more visible on
Twitter/X, and (3) developers tend to contribute more to open-source software
during the week of posting such tweets. Our findings are the first step toward
investigating the impact of social media on obtaining funding to sustain
open-source software
IDENTIFICATION AND QUANTIFICATION OF VARIABILITY MEASURES AFFECTING CODE REUSABILITY IN OPEN SOURCE ENVIRONMENT
Open source software (OSS) is one of the emerging areas in software engineering, and
is gaining the interest of the software development community. OSS was started as a
movement, and for many years software developers contributed to it as their hobby
(non commercial purpose). Now, OSS components are being reused in CBSD
(commercial purpose). However, recently, the use of OSS in SPL is envisioned
recently by software engineering researchers, thus bringing it into a new arena. Being
an emerging research area, it demands exploratory study to explore the dimensions of
this phenomenon. Furthermore, there is a need to assess the reusability of OSS which
is the focal point of these disciplines (CBSE, SPL, and OSS). In this research, a mixed
method based approach is employed which is specifically 'partially mixed sequential
dominant study'. It involves both qualitative (interviews) and quantitative phases
(survey and experiment). During the qualitative phase seven respondents were
involved, sample size of survey was 396, and three experiments were conducted. The
main contribution of this study is results of exploration of the phenomenon 'reuse of
OSS in reuse intensive software development'. The findings include 7 categories and
39 dimensions. One of the dimension factors affecting reusability was carried to the
quantitative phase (survey and experiment). On basis of the findings, proposal for
reusability attribute model was presented at class and package level. Variability is one
of the newly identified attribute of reusability. A comprehensive theoretical analysis
of variability implementation mechanisms is conducted to propose metrics for its
assessment. The reusability attribute model is validated by statistical analysis of I 03
classes and 77 packages. An evolutionary reusability analysis of two open source
software was conducted, where different versions of software are analyzed for their
reusability. The results show a positive correlation between variability and reusability
at package level and validate the other identified attributes. The results would be
helpful to conduct further studies in this area
Who is the Real Hero? Measuring Developer Contribution via Multi-dimensional Data Integration
Proper incentives are important for motivating developers in open-source
communities, which is crucial for maintaining the development of open-source
software healthy. To provide such incentives, an accurate and objective
developer contribution measurement method is needed. However, existing methods
rely heavily on manual peer review, lacking objectivity and transparency. The
metrics of some automated works about effort estimation use only syntax-level
or even text-level information, such as changed lines of code, which lack
robustness. Furthermore, some works about identifying core developers provide
only a qualitative understanding without a quantitative score or have some
project-specific parameters, which makes them not practical in real-world
projects. To this end, we propose CValue, a multidimensional information
fusion-based approach to measure developer contributions. CValue extracts both
syntax and semantic information from the source code changes in four
dimensions: modification amount, understandability, inter-function and
intra-function impact of modification. It fuses the information to produce the
contribution score for each of the commits in the projects. Experimental
results show that CValue outperforms other approaches by 19.59% on 10
real-world projects with manually labeled ground truth. We validated and proved
that the performance of CValue, which takes 83.39 seconds per commit, is
acceptable to be applied in real-world projects. Furthermore, we performed a
large-scale experiment on 174 projects and detected 2,282 developers having
inflated commits. Of these, 2,050 developers did not make any syntax
contribution; and 103 were identified as bots
Identification-method research for open-source software ecosystems
In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework
Analysing BitTorrent's seeding strategies
BitTorrent is a typical peer-to-peer (P2P) file distribution application that has gained tremendous popularity in recent years. A considerable amount of research exists regarding BitTorrent’s choking algorithm, which has proved to be effective in preventing freeriders. However, the effect of the seeding strategy on the resistance to freeriders in BitTorrent has been largely overlooked. In addition to this, a category of selfish leechers (termed exploiters), who leave the overlay immediately after completion, has never been taken into account in the previous research. In this paper two popular seeding strategies, the Original Seeding Strategy (OSS) and the Time- based Seeding Strategy (TSS), are chosen and we study via mathematical models and simulation their effects on freeriders and exploiters in BitTorrent networks. The mathematical model is verified and we discover that both freeriders and exploiters impact on system performance, despite the seeding strategy that is employed. However, a selfish-leechers threshold is identified; once the threshold is exceeded, we find that TSS outperforms OSS – that is, TSS reduces the negative impact of selfish lechers more effectively than OSS. Based on these results we discuss the choice of seeding strategy and speculate as to how more effective BitTorrent-based file distribu- tion applications can be built
- …