Search CORE

805 research outputs found

Recommended from our members

Similarities, challenges and opportunities of wikipedia content and open source projects

Author: Capiluppi A
Publication venue: 'Wiley'
Publication date: 04/09/2012
Field of study

Copyright @ 2012 John Wiley & Sons, Ltd.Several years of research and evidence have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of “usefulness” and “modularity” we isolate valuable content in both Wikipedia pages and OSS projects

Brunel University Research Archive

A Quantitative Analysis of Open Source Software Code Quality: Insights from Metric Distributions

Author: Chen Bichao
Guo Yekai
He Yuejiang
Jin Siyuan
Li Ziyuan
Xia Yong
Zhang Mianmian
Zhu Bing
Publication venue
Publication date: 22/07/2023
Field of study

Code quality is a crucial construct in open-source software (OSS) with three dimensions: maintainability, reliability, and functionality. To accurately measure them, we divide 20 distinct metrics into two types: 1) threshold-type metrics that influence code quality in a monotonic manner; 2) non-threshold-type metrics that lack a monotonic relationship to evaluate. We propose a distribution-based method to provide scores for metrics, which demonstrates great explainability on OSS adoption. Our empirical analysis includes more than 36,460 OSS projects and their raw metrics from SonarQube and CK. Our work contributes to the understanding of the multi-dimensional construct of code quality and its metric measurements

arXiv.org e-Print Archive

Recommended from our members

An empirical study of package coupling in Java open-source

Author: Mubarak Asma
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Excessive coupling between object-oriented classes in systems is generally acknowledged as harmful and is recognised as a maintenance problem that can result in a higher propensity for faults in systems and a „stored up‟ future problem. Characterisation and understanding coupling at different levels of abstraction is therefore important for both the project manager and developer both of whom have a vested interest in software quality. In this Thesis, coupling trends are empirically investigated over multiple versions of seven Java open-source systems (OSS). The first investigation explores the trends in longitudinal changes to open-source systems given by six coupling metrics. Coupling trends are then explored from the perspective of: the relationship between removed classes and their coupling with other classes in the same package; the relationships between coupling and 'warnings’ in packages and the time interval between versions in Java OSS; the relationship between some of these coupling metrics are also explored. Finally, the existence of an 80/20 rule for the coupling metrics is inspected. Results suggest that developer activity comprises a set of high and low periods (peak and trough‟ effect) evident as a system evolves. Findings also demonstrate that addition of coupling may have beneficial effects on a system, particularly if they are added as new functionality through the package Java feature. The fan-in and fan-out coupling metrics reveal particular features and exhibited a wide range of traits in the classes depending on their high or low values; finally, we revealed that one metric (fan-in) is the only metric that appears consistently to exhibit an 80/20 (Pareto) relationship

Brunel University Research Archive

"My GitHub Sponsors profile is live!" Investigating the Impact of Twitter/X Mentions on GitHub Sponsors

Author: Fan Youmei
Hata Hideaki
Matsumoto Kenichi
Treude Christoph
Xiao Tao
Publication venue
Publication date: 05/01/2024
Field of study

GitHub Sponsors was launched in 2019, enabling donations to open-source software developers to provide financial support, as per GitHub's slogan: "Invest in the projects you depend on". However, a 2022 study on GitHub Sponsors found that only two-fifths of developers who were seeking sponsorship received a donation. The study found that, other than internal actions (such as offering perks to sponsors), developers had advertised their GitHub Sponsors profiles on social media, such as Twitter (also known as X). Therefore, in this work, we investigate the impact of tweets that contain links to GitHub Sponsors profiles on sponsorship, as well as their reception on Twitter/X. We further characterize these tweets to understand their context and find that (1) such tweets have the impact of increasing the number of sponsors acquired, (2) compared to other donation platforms such as Open Collective and Patreon, GitHub Sponsors has significantly fewer interactions but is more visible on Twitter/X, and (3) developers tend to contribute more to open-source software during the week of posting such tweets. Our findings are the first step toward investigating the impact of social media on obtaining funding to sustain open-source software

arXiv.org e-Print Archive

IDENTIFICATION AND QUANTIFICATION OF VARIABILITY MEASURES AFFECTING CODE REUSABILITY IN OPEN SOURCE ENVIRONMENT

Author: E-AMIN FAZAL
Publication venue
Publication date: 01/03/2012
Field of study

Open source software (OSS) is one of the emerging areas in software engineering, and is gaining the interest of the software development community. OSS was started as a movement, and for many years software developers contributed to it as their hobby (non commercial purpose). Now, OSS components are being reused in CBSD (commercial purpose). However, recently, the use of OSS in SPL is envisioned recently by software engineering researchers, thus bringing it into a new arena. Being an emerging research area, it demands exploratory study to explore the dimensions of this phenomenon. Furthermore, there is a need to assess the reusability of OSS which is the focal point of these disciplines (CBSE, SPL, and OSS). In this research, a mixed method based approach is employed which is specifically 'partially mixed sequential dominant study'. It involves both qualitative (interviews) and quantitative phases (survey and experiment). During the qualitative phase seven respondents were involved, sample size of survey was 396, and three experiments were conducted. The main contribution of this study is results of exploration of the phenomenon 'reuse of OSS in reuse intensive software development'. The findings include 7 categories and 39 dimensions. One of the dimension factors affecting reusability was carried to the quantitative phase (survey and experiment). On basis of the findings, proposal for reusability attribute model was presented at class and package level. Variability is one of the newly identified attribute of reusability. A comprehensive theoretical analysis of variability implementation mechanisms is conducted to propose metrics for its assessment. The reusability attribute model is validated by statistical analysis of I 03 classes and 77 packages. An evolutionary reusability analysis of two open source software was conducted, where different versions of software are analyzed for their reusability. The results show a positive correlation between variability and reusability at package level and validate the other identified attributes. The results would be helpful to conduct further studies in this area

UTPedia

Automated Repair of Security Vulnerabilities using Coverage-guided Fuzzing

Author: João Fernando da Costa Meireles Barbosa
Publication venue
Publication date: 14/07/2021
Field of study

Repositório Aberto da Universidade do Porto

Who is the Real Hero? Measuring Developer Contribution via Multi-dimensional Data Integration

Author: Liu Chengwei
Liu Yang
Sun Yuqiang
Xu Zhengzi
Zhang Yiran
Publication venue
Publication date: 17/08/2023
Field of study

Proper incentives are important for motivating developers in open-source communities, which is crucial for maintaining the development of open-source software healthy. To provide such incentives, an accurate and objective developer contribution measurement method is needed. However, existing methods rely heavily on manual peer review, lacking objectivity and transparency. The metrics of some automated works about effort estimation use only syntax-level or even text-level information, such as changed lines of code, which lack robustness. Furthermore, some works about identifying core developers provide only a qualitative understanding without a quantitative score or have some project-specific parameters, which makes them not practical in real-world projects. To this end, we propose CValue, a multidimensional information fusion-based approach to measure developer contributions. CValue extracts both syntax and semantic information from the source code changes in four dimensions: modification amount, understandability, inter-function and intra-function impact of modification. It fuses the information to produce the contribution score for each of the commits in the projects. Experimental results show that CValue outperforms other approaches by 19.59% on 10 real-world projects with manually labeled ground truth. We validated and proved that the performance of CValue, which takes 83.39 seconds per commit, is acceptable to be applied in real-world projects. Furthermore, we performed a large-scale experiment on 174 projects and detected 2,282 developers having inflated commits. Of these, 2,050 developers did not make any syntax contribution; and 103 were identified as bots

arXiv.org e-Print Archive

Identification-method research for open-source software ecosystems

Author: Liao Zhifang
Liu Hui
Liu Shengzong
Wang Ningwei
Zhang Qi
Zhang Yan
Publication venue: 'MDPI AG'
Publication date: 01/02/2019
Field of study

In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

ResearchOnline@GCU

Analysing BitTorrent's seeding strategies

Author: Chen Xinuo
Jarvis Stephen A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

BitTorrent is a typical peer-to-peer (P2P) file distribution application that has gained tremendous popularity in recent years. A considerable amount of research exists regarding BitTorrent’s choking algorithm, which has proved to be effective in preventing freeriders. However, the effect of the seeding strategy on the resistance to freeriders in BitTorrent has been largely overlooked. In addition to this, a category of selfish leechers (termed exploiters), who leave the overlay immediately after completion, has never been taken into account in the previous research. In this paper two popular seeding strategies, the Original Seeding Strategy (OSS) and the Time- based Seeding Strategy (TSS), are chosen and we study via mathematical models and simulation their effects on freeriders and exploiters in BitTorrent networks. The mathematical model is verified and we discover that both freeriders and exploiters impact on system performance, despite the seeding strategy that is employed. However, a selfish-leechers threshold is identified; once the threshold is exceeded, we find that TSS outperforms OSS – that is, TSS reduces the negative impact of selfish lechers more effectively than OSS. Based on these results we discuss the choice of seeding strategy and speculate as to how more effective BitTorrent-based file distribu- tion applications can be built

Crossref

University of Birmingham Research Portal

Warwick Research Archives Portal Repository