1,549 research outputs found

    We Don't Need Another Hero? The Impact of "Heroes" on Software Development

    Full text link
    A software project has "Hero Developers" when 80% of contributions are delivered by 20% of the developers. Are such heroes a good idea? Are too many heroes bad for software quality? Is it better to have more/less heroes for different kinds of projects? To answer these questions, we studied 661 open source projects from Public open source software (OSS) Github and 171 projects from an Enterprise Github. We find that hero projects are very common. In fact, as projects grow in size, nearly all project become hero projects. These findings motivated us to look more closely at the effects of heroes on software development. Analysis shows that the frequency to close issues and bugs are not significantly affected by the presence of project type (Public or Enterprise). Similarly, the time needed to resolve an issue/bug/enhancement is not affected by heroes or project type. This is a surprising result since, before looking at the data, we expected that increasing heroes on a project will slow down howfast that project reacts to change. However, we do find a statistically significant association between heroes, project types, and enhancement resolution rates. Heroes do not affect enhancement resolution rates in Public projects. However, in Enterprise projects, the more heroes increase the rate at which project complete enhancements. In summary, our empirical results call for a revision of a long-held truism in software engineering. Software heroes are far more common and valuable than suggested by the literature, particularly for medium to large Enterprise developments. Organizations should reflect on better ways to find and retain more of these heroesComment: 8 pages + 1 references, Accepted to International conference on Software Engineering - Software Engineering in Practice, 201

    GitHub: Factors Influencing Project Activity Levels

    Get PDF
    Open source software projects typically extend the capabilities of their software by incorporating code contributions from a diverse cross-section of developers. This GitHub structural path modelling study captures the current top 100 JavaScript projects in operation for at least one year or more. It draws on three theories (information integration, planned behavior, and social translucence) to help frame its comparative path approach, and to show ways to speed the collaborative development of GitHub OSS projects. It shows a project’s activity level increases with: (1) greater responder-group collaborative efforts, (2) increased numbers of major critical project version releases, and (3) the generation of further commits. However, the generation of additional forks negatively impacts overall project activity levels

    Open source software GitHub ecosystem: a SEM approach

    Get PDF
    Open source software (OSS) is a collaborative effort. Getting affordable high-quality software with less probability of errors or fails is not far away. Thousands of open-source projects (termed repos) are alternatives to proprietary software development. More than two-thirds of companies are contributing to open source. Open source technologies like OpenStack, Docker and KVM are being used to build the next generation of digital infrastructure. An iconic example of OSS is 'GitHub' - a successful social site. GitHub is a hosting platform that host repositories (repos) based on the Git version control system. GitHub is a knowledge-based workspace. It has several features that facilitate user communication and work integration. Through this thesis I employ data extracted from GitHub, and seek to better understand the OSS ecosystem, and to what extent each of its deployed elements affects the successful development of the OSS ecosystem. In addition, I investigate a repo's growth over different time periods to test the changing behavior of the repo. From our observations developers do not follow one development methodology when developing, and growing their project, and such developers tend to cherry-pick from differing available software methodologies. GitHub API remains the main OSS location engaged to extract the metadata for this thesis's research. This extraction process is time-consuming - due to restrictive access limitations (even with authentication). I apply Structure Equation Modelling (termed SEM) to investigate the relative path relationships between the GitHub- deployed OSS elements, and I determine the path strength contributions of each element to determine the OSS repo's activity level. SEM is a multivariate statistical analysis technique used to analyze structural relationships. This technique is the combination of factor analysis and multiple regression analysis. It is used to analyze the structural relationship between measured variables and/or latent constructs. This thesis bridges the research gap around longitude OSS studies. It engages large sample-size OSS repo metadata sets, data-quality control, and multiple programming language comparisons. Querying GitHub is not direct (nor simple) yet querying for all valid repos remains important - as sometimes illegal, or unrepresentative outlier repos (which may even be quite popular) do arise, and these then need to be removed from each initial OSS's language-specific metadata set. Eight top GitHub programming languages, (selected as the most forked repos) are separately engaged in this thesis's research. This thesis observes these eight metadata sets of GitHub repos. Over time, it measures the different repo contributions of the deployed elements of each metadata set. The number of stars-provided to the repo delivers a weaker contribution to its software development processes. Sometimes forks work against the repo's progress by generating very minor negative total effects into its commit (activity) level, and by sometimes diluting the focus of the repo's software development strategies. Here, a fork may generate new ideas, create a new repo, and then draw some original repo developers off into this new software development direction, thus retarding the original repo's commit (activity) level progression. Multiple intermittent and minor version releases exert lesser GitHub JavaScript repo commit (or activity) changes because they often involve only slight OSS improvements, and because they only require minimal commit/commits contributions. More commit(s) also bring more changes to documentation, and again the GitHub OSS repo's commit (activity) level rises. There are both direct and indirect drivers of the repo's OSS activity. Pulls and commits are the strongest drivers. This suggests creating higher levels of pull requests is likely a preferred prime target consideration for the repo creator's core team of developers. This study offers a big data direction for future work. It allows for the deployment of more sophisticated statistical comparison techniques. It offers further indications around the internal and broad relationships that likely exist between GitHub's OSS big data. Its data extraction ideas suggest a link through to business/consumer consumption, and possibly how these may be connected using improved repo search algorithms that release individual business value components

    The Effect of Knowledge Sharing on Open Source Contribution: A Multi-platform Perspective

    Get PDF
    Open source software (OSS) community plays a key role in contemporary software development. However, there is a need to better understand the factors which influence individuals’ voluntary contribution on open source platforms. In this paper, we investigate how different types of knowledge sharing affect an individuals’ contribution towards open source projects. We further refine knowledge sharing taxonomy by classifying explicit knowledge sharing into two sub-types – strong explicit knowledge sharing and weak explicit knowledge sharing, depending on the extent of interpersonal interaction required for knowledge transfer. In this paper, we take a multi-platform perspective – we collect data from GitHub – the biggest online platform to host open source software development, and Gitter – an open source instant messaging and chat room application designed for developers. We map the user identities across these two platforms. We analyze monthly panel data for the year 2017 consisting of 3,695 individuals. The results demonstrate that both strong and weak explicit knowledge sharing have positive relationship with open source contribution. Moreover, the tacit knowledge sharing positively moderates these relationships. Our paper extends the theoretical understanding of different knowledge sharing types and their inter-relationship, and their respective impact on contribution. Our findings have important implications for the OSS community, and especially help OSS platform designers get a better understanding of the symbiosis between different OSS platforms

    Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects

    Full text link
    Pull Requests (PRs) that are neither progressed nor resolved clutter the list of PRs, making it difficult for the maintainers to manage and prioritize unresolved PRs. To automatically track, follow up, and close such inactive PRs, Stale bot was introduced by GitHub. Despite its increasing adoption, there are ongoing debates on whether using Stale bot alleviates or exacerbates the problem of inactive PRs. To better understand if and how Stale bot helps projects in their pull-based development workflow, we perform an empirical study of 20 large and popular open-source projects. We find that Stale bot can help deal with a backlog of unresolved PRs as the projects closed more PRs within the first few months of adoption. Moreover, Stale bot can help improve the efficiency of the PR review process as the projects reviewed PRs that ended up merged and resolved PRs that ended up closed faster after the adoption. However, Stale bot can also negatively affect the contributors as the projects experienced a considerable decrease in their number of active contributors after the adoption. Therefore, relying solely on Stale bot to deal with inactive PRs may lead to decreased community engagement and an increased probability of contributor abandonment.Comment: Manuscript submitted to ACM Transactions on Software Engineering and Methodolog

    Coordinating Interdependencies in an Open Source Software Project: A Replication of Lindberg, et al.

    Get PDF
    The current study is a full replication (conceptual and empirical) of “Coordinating Interdependencies in Online Communities: A Study of an Open Source Software Project” Lindberg et al (2016), which addresses the question of how OSS communities address unresolved interdependencies. Following the original study, we analyze project development data, archived in the GitHub repository, for the OSS project Rubinius. The analysis explores relationships among development and developer interdependencies as well as activity and order variation. Further, we extend the original study by examining the core relationships in the original study and investigating the external generalizability of the results by replicating the analysis on three analogous OSS projects: JRuby, mruby, and RubyMotion. These offer an opportunity to evaluate the generalizability of the original study to projects of different sizes and amount of activity, yet similar otherwise to the project in the original study. Another extension is the use of an additional control variable, length of activity sequence, which proves to have substantial implications of the study’s focal relationships. We find that three out of the four projects we analyze support the findings of the original study as it pertains to four relationships in the original study: order variation and developer interdependencies, activity variation and developer interdependencies, order variation and development interdependencies, and development and developer interdependencies. We also discuss the implications of our findings, especially in cases where the replication results differ from those in the original study and offer suggestions for future research that can help advance this stream of research

    Analysis of Textual and Non-Textual Sources of Sentiment in Github

    Get PDF
    Github is a collaborative platform that is used primarily for the development of software. In order to gain more insight into how teams work on Github, we wish to analyze the sentiment content available via communication on the platform. In order to do so, we first use existing sentiment analysis classifiers and compare the Github data to other social networks, Twitter and Reddit. By identifying that users are able to provide reactions to other users posts on Github, we use this as an indicator or label of sentiment information. Using this we first investigate whether repeated user interaction has an impact on sentiment and find that it is positively correlated to the amount of prior interaction as well as the directness of interaction. We also investigate if metrics corresponding to a user's status or power in a project correlate with positive sentiment received and find that it does. We then build sentiment classifiers using both textual and non textual information, both which outperform the generic sentiment scorer systems. In addition we show that a sentiment classifier built using only non-textual information can perform at a comparable level to that of a text-based classifier, indicating that there is significant sentiment information contained in non-textual information in the Github network

    Modeling User-Affected Software Properties for Open Source Software Supply Chains

    Get PDF
    Background: Open Source Software development community relies heavily on users of the software and contributors outside of the core developers to produce top-quality software and provide long-term support. However, the relationship between a software and its contributors in terms of exactly how they are related through dependencies and how the users of a software affect many of its properties are not very well understood. Aim: My research covers a number of aspects related to answering the overarching question of modeling the software properties affected by users and the supply chain structure of software ecosystems, viz. 1) Understanding how software usage affect its perceived quality; 2) Estimating the effects of indirect usage (e.g. dependent packages) on software popularity; 3) Investigating the patch submission and issue creation patterns of external contributors; 4) Examining how the patch acceptance probability is related to the contributors\u27 characteristics. 5) A related topic, the identification of bots that commit code, aimed at improving the accuracy of these and other similar studies was also investigated. Methodology: Most of the Research Questions are addressed by studying the NPM ecosystem, with data from various sources like the World of Code, GHTorrent, and the GiHub API. Different supervised and unsupervised machine learning models, including Regression, Random Forest, Bayesian Networks, and clustering, were used to answer appropriate questions. Results: 1) Software usage affects its perceived quality even after accounting for code complexity measures. 2) The number of dependents and dependencies of a software were observed to be able to predict the change in its popularity with good accuracy. 3) Users interact (contribute issues or patches) primarily with their direct dependencies, and rarely with transitive dependencies. 4) A user\u27s earlier interaction with the repository to which they are contributing a patch, and their familiarity with related topics were important predictors impacting the chance of a pull request getting accepted. 5) Developed BIMAN, a systematic methodology for identifying bots. Conclusion: Different aspects of how users and their characteristics affect different software properties were analyzed, which should lead to a better understanding of the complex interaction between software developers and users/ contributors
    corecore