1,549 research outputs found
We Don't Need Another Hero? The Impact of "Heroes" on Software Development
A software project has "Hero Developers" when 80% of contributions are
delivered by 20% of the developers. Are such heroes a good idea? Are too many
heroes bad for software quality? Is it better to have more/less heroes for
different kinds of projects? To answer these questions, we studied 661 open
source projects from Public open source software (OSS) Github and 171 projects
from an Enterprise Github.
We find that hero projects are very common. In fact, as projects grow in
size, nearly all project become hero projects. These findings motivated us to
look more closely at the effects of heroes on software development. Analysis
shows that the frequency to close issues and bugs are not significantly
affected by the presence of project type (Public or Enterprise). Similarly, the
time needed to resolve an issue/bug/enhancement is not affected by heroes or
project type. This is a surprising result since, before looking at the data, we
expected that increasing heroes on a project will slow down howfast that
project reacts to change. However, we do find a statistically significant
association between heroes, project types, and enhancement resolution rates.
Heroes do not affect enhancement resolution rates in Public projects. However,
in Enterprise projects, the more heroes increase the rate at which project
complete enhancements.
In summary, our empirical results call for a revision of a long-held truism
in software engineering. Software heroes are far more common and valuable than
suggested by the literature, particularly for medium to large Enterprise
developments. Organizations should reflect on better ways to find and retain
more of these heroesComment: 8 pages + 1 references, Accepted to International conference on
Software Engineering - Software Engineering in Practice, 201
GitHub: Factors Influencing Project Activity Levels
Open source software projects typically extend the capabilities of their software by incorporating code contributions from a diverse cross-section of developers. This GitHub structural path modelling study captures the current top 100 JavaScript projects in operation for at least one year or more. It draws on three theories (information integration, planned behavior, and social translucence) to help frame its comparative path approach, and to show ways to speed the collaborative development of GitHub OSS projects. It shows a project’s activity level increases with: (1) greater responder-group collaborative efforts, (2) increased numbers of major critical project version releases, and (3) the generation of further commits. However, the generation of additional forks negatively impacts overall project activity levels
Open source software GitHub ecosystem: a SEM approach
Open source software (OSS) is a collaborative effort. Getting affordable high-quality software with less probability of errors or fails is not far away. Thousands of open-source projects (termed repos) are alternatives to proprietary software development. More than two-thirds of companies are contributing to open source. Open source technologies like OpenStack, Docker and KVM are being used to build the next generation of digital infrastructure. An iconic example of OSS is 'GitHub' - a successful social site. GitHub is a hosting platform that host repositories (repos) based on the Git version control system.
GitHub is a knowledge-based workspace. It has several features that facilitate user communication and work integration. Through this thesis I employ data extracted from GitHub, and seek to better understand the OSS ecosystem, and to what extent each of its deployed elements affects the successful development of the OSS ecosystem. In addition, I investigate a repo's growth over different time periods to test the changing behavior of the repo. From our observations developers do not follow one development methodology when developing, and growing their project, and such developers tend to cherry-pick from differing available software methodologies.
GitHub API remains the main OSS location engaged to extract the metadata for this thesis's research. This extraction process is time-consuming - due to restrictive access limitations (even with authentication). I apply Structure Equation Modelling (termed SEM) to investigate the relative path relationships between the GitHub- deployed OSS elements, and I determine the path strength contributions of each element to determine the OSS repo's activity level.
SEM is a multivariate statistical analysis technique used to analyze structural relationships. This technique is the combination of factor analysis and multiple regression analysis. It is used to analyze the structural relationship between measured variables and/or latent constructs.
This thesis bridges the research gap around longitude OSS studies. It engages large sample-size OSS repo metadata sets, data-quality control, and multiple programming language comparisons. Querying GitHub is not direct (nor simple) yet querying for all valid repos remains important - as sometimes illegal, or unrepresentative outlier repos (which may even be quite popular) do arise, and these then need to be removed from each initial OSS's language-specific metadata set.
Eight top GitHub programming languages, (selected as the most forked repos) are separately engaged in this thesis's research. This thesis observes these eight metadata sets of GitHub repos. Over time, it measures the different repo contributions of the deployed elements of each metadata set.
The number of stars-provided to the repo delivers a weaker contribution to its software development processes. Sometimes forks work against the repo's progress by generating very minor negative total effects into its commit (activity) level, and by sometimes diluting the focus of the repo's software development strategies. Here, a fork may generate new ideas, create a new repo, and then draw some original repo developers off into this new software development direction, thus retarding the original repo's commit (activity) level progression.
Multiple intermittent and minor version releases exert lesser GitHub JavaScript repo commit (or activity) changes because they often involve only slight OSS improvements, and because they only require minimal commit/commits contributions. More commit(s) also bring more changes to documentation, and again the GitHub OSS repo's commit (activity) level rises.
There are both direct and indirect drivers of the repo's OSS activity. Pulls and commits are the strongest drivers. This suggests creating higher levels of pull requests is likely a preferred prime target consideration for the repo creator's core team of developers.
This study offers a big data direction for future work. It allows for the deployment of more sophisticated statistical comparison techniques. It offers further indications around the internal and broad relationships that likely exist between GitHub's OSS big data. Its data extraction ideas suggest a link through to business/consumer consumption, and possibly how these may be connected using improved repo search algorithms that release individual business value components
The Effect of Knowledge Sharing on Open Source Contribution: A Multi-platform Perspective
Open source software (OSS) community plays a key role in contemporary software development. However, there is a need to better understand the factors which influence individuals’ voluntary contribution on open source platforms. In this paper, we investigate how different types of knowledge sharing affect an individuals’ contribution towards open source projects. We further refine knowledge sharing taxonomy by classifying explicit knowledge sharing into two sub-types – strong explicit knowledge sharing and weak explicit knowledge sharing, depending on the extent of interpersonal interaction required for knowledge transfer. In this paper, we take a multi-platform perspective – we collect data from GitHub – the biggest online platform to host open source software development, and Gitter – an open source instant messaging and chat room application designed for developers. We map the user identities across these two platforms. We analyze monthly panel data for the year 2017 consisting of 3,695 individuals. The results demonstrate that both strong and weak explicit knowledge sharing have positive relationship with open source contribution. Moreover, the tacit knowledge sharing positively moderates these relationships. Our paper extends the theoretical understanding of different knowledge sharing types and their inter-relationship, and their respective impact on contribution. Our findings have important implications for the OSS community, and especially help OSS platform designers get a better understanding of the symbiosis between different OSS platforms
Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects
Pull Requests (PRs) that are neither progressed nor resolved clutter the list
of PRs, making it difficult for the maintainers to manage and prioritize
unresolved PRs. To automatically track, follow up, and close such inactive PRs,
Stale bot was introduced by GitHub. Despite its increasing adoption, there are
ongoing debates on whether using Stale bot alleviates or exacerbates the
problem of inactive PRs. To better understand if and how Stale bot helps
projects in their pull-based development workflow, we perform an empirical
study of 20 large and popular open-source projects. We find that Stale bot can
help deal with a backlog of unresolved PRs as the projects closed more PRs
within the first few months of adoption. Moreover, Stale bot can help improve
the efficiency of the PR review process as the projects reviewed PRs that ended
up merged and resolved PRs that ended up closed faster after the adoption.
However, Stale bot can also negatively affect the contributors as the projects
experienced a considerable decrease in their number of active contributors
after the adoption. Therefore, relying solely on Stale bot to deal with
inactive PRs may lead to decreased community engagement and an increased
probability of contributor abandonment.Comment: Manuscript submitted to ACM Transactions on Software Engineering and
Methodolog
Coordinating Interdependencies in an Open Source Software Project: A Replication of Lindberg, et al.
The current study is a full replication (conceptual and empirical) of “Coordinating Interdependencies in Online Communities: A Study of an Open Source Software Project” Lindberg et al (2016), which addresses the question of how OSS communities address unresolved interdependencies. Following the original study, we analyze project development data, archived in the GitHub repository, for the OSS project Rubinius. The analysis explores relationships among development and developer interdependencies as well as activity and order variation. Further, we extend the original study by examining the core relationships in the original study and investigating the external generalizability of the results by replicating the analysis on three analogous OSS projects: JRuby, mruby, and RubyMotion. These offer an opportunity to evaluate the generalizability of the original study to projects of different sizes and amount of activity, yet similar otherwise to the project in the original study. Another extension is the use of an additional control variable, length of activity sequence, which proves to have substantial implications of the study’s focal relationships. We find that three out of the four projects we analyze support the findings of the original study as it pertains to four relationships in the original study: order variation and developer interdependencies, activity variation and developer interdependencies, order variation and development interdependencies, and development and developer interdependencies. We also discuss the implications of our findings, especially in cases where the replication results differ from those in the original study and offer suggestions for future research that can help advance this stream of research
Analysis of Textual and Non-Textual Sources of Sentiment in Github
Github is a collaborative platform that is used primarily for the development of software. In order to gain more insight into how teams work on Github, we wish to analyze the sentiment content available via communication on the platform.
In order to do so, we first use existing sentiment analysis classifiers and compare the Github data to other social networks, Twitter and Reddit. By identifying that users are able to provide reactions to other users posts on Github, we use this as an indicator or label of sentiment information. Using this we first investigate whether repeated user interaction has an impact on sentiment and find that it is positively correlated to the amount of prior interaction as well as the directness of interaction. We also investigate if metrics corresponding to a user's status or power in a project correlate with positive sentiment received and find that it does.
We then build sentiment classifiers using both textual and non textual information, both which outperform the generic sentiment scorer systems. In addition we show that a sentiment classifier built using only non-textual information can perform at a comparable level to that of a text-based classifier, indicating that there is significant sentiment information contained in non-textual information in the Github network
Modeling User-Affected Software Properties for Open Source Software Supply Chains
Background: Open Source Software development community relies heavily on users of the software and contributors outside of the core developers to produce top-quality software and provide long-term support. However, the relationship between a software and its contributors in terms of exactly how they are related through dependencies and how the users of a software affect many of its properties are not very well understood.
Aim: My research covers a number of aspects related to answering the overarching question of modeling the software properties affected by users and the supply chain structure of software ecosystems, viz. 1) Understanding how software usage affect its perceived quality; 2) Estimating the effects of indirect usage (e.g. dependent packages) on software popularity; 3) Investigating the patch submission and issue creation patterns of external contributors; 4) Examining how the patch acceptance probability is related to the contributors\u27 characteristics. 5) A related topic, the identification of bots that commit code, aimed at improving the accuracy of these and other similar studies was also investigated.
Methodology: Most of the Research Questions are addressed by studying the NPM ecosystem, with data from various sources like the World of Code, GHTorrent, and the GiHub API. Different supervised and unsupervised machine learning models, including Regression, Random Forest, Bayesian Networks, and clustering, were used to answer appropriate questions.
Results: 1) Software usage affects its perceived quality even after accounting for code complexity measures. 2) The number of dependents and dependencies of a software were observed to be able to predict the change in its popularity with good accuracy. 3) Users interact (contribute issues or patches) primarily with their direct dependencies, and rarely with transitive dependencies. 4) A user\u27s earlier interaction with the repository to which they are contributing a patch, and their familiarity with related topics were important predictors impacting the chance of a pull request getting accepted. 5) Developed BIMAN, a systematic methodology for identifying bots.
Conclusion: Different aspects of how users and their characteristics affect different software properties were analyzed, which should lead to a better understanding of the complex interaction between software developers and users/ contributors
- …