396 research outputs found

    Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

    Full text link
    The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also perform permutation feature importance and SHAP analyses to understand the importance and impact of different features on the predicted response latencies. Our best-performing models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. Our findings indicate that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.Comment: Manuscript submitted to IEEE Transactions on Software Engineering (TSE

    Nudge: Accelerating Overdue Pull Requests Towards Completion

    Full text link
    Pull requests are a key part of the collaborative software development and code review process today. However, pull requests can also slow down the software development process when the reviewer(s) or the author do not actively engage with the pull request. In this work, we design an end-to-end service, Nudge, for accelerating overdue pull requests towards completion by reminding the author or the reviewer(s) to engage with their overdue pull requests. First, we use models based on effort estimation and machine learning to predict the completion time for a given pull request. Second, we use activity detection to reduce false positives. Lastly, we use dependency determination to understand the blocker of the pull request and nudge the appropriate actor(author or reviewer(s)). We also do a correlation analysis to understand the statistical relationship between the pull request completion times and various pull request and developer related attributes. Nudge has been deployed on 147 repositories at Microsoft since 2019. We do a large scale evaluation based on the implicit and explicit feedback we received from sending the Nudge notifications on 8,500 pull requests. We observe significant reduction in completion time, by over 60%, for pull requests which were nudged thus increasing the efficiency of the code review process and accelerating the pull request progression

    Integration of Fault Localization into your GitHub Repository

    Get PDF
    Com a crescente complexidade e escala do software, existe uma forte necessidade de técnicas que auxiliem os desenvolvedores de software a localizar falhas com o mínimo de intervenção humana possível. O objetivo desta dissertação é analisar o uso de abordagens de localização de falhas baseadas em espectro para ajudar a descobrir falhas em programas Java, bem como o uso de bots no ciclo de vida do desenvolvimento de um software. As técnicas de localização de falhas baseadas em espectro foram escolhidas na área de pesquisa de localização de falhas de software devido aos seus baixos custos de execução e popularidade. Três ferramentas (GZoltar, FLACOCO e Jaguar) destacaram-se como as principais escolhas para a localização de falhas baseada em espectro em Java, de acordo com a pesquisa, e embora todas produzissem resultados comparáveis, o GZoltar foi preferido. Foi criada uma Action do GitHub que, quando integrada com o GZoltar, permite a análise de relatórios de localização de falhas baseada em espectro em qualquer repositório Java no GitHub. O resultado é um relatório detalhado das linhas de código potencialmente com falhas, personalizável pelo utilizador. Esta Action foi avaliada tanto em um repositório de exemplo como em vários projetos open-source. Embora a integração tenha sido bem sucedida no repositório de exemplo, as limitações do GZoltar impedem a sua integração na maioria dos projetos open-source, destacando a necessidade de atualizações e testes adicionais de compatibilidade.With the increased complexity and scale of software, there is a strong demand for techniques to guide software engineers to locate faults with less human intervention as possible. The purpose of this dissertation is to look into the usage of Spectrum-based Fault Localization approaches to help discover faults in Java programs, as well as the use of bots in the software development lifecycle. Spectrum-based Fault Localization techniques were found to be chosen in the research area of software fault localization due to their low execution costs and popularity. Three tools (GZoltar, FLACOCO, and Jaguar) stood out as the top choices for spectrum-based fault localization in Java according to the research, and even though all produced comparable outcomes, GZoltar was preferred. A GitHub Action was created that, when integrated with GZoltar, allows analysis of Spectrum-based Fault Localization reports in any Java repository on GitHub. The outcome of it is a detailed report of potentially faulty lines of code, customizable by the user. This action is tested in both a sample repository and several open-source projects. While successful integration is achieved with the sample repository, limitations of GZoltar hinder its integration with most open-source projects, highlighting the need for updates and further compatibility testing

    Modeling User-Affected Software Properties for Open Source Software Supply Chains

    Get PDF
    Background: Open Source Software development community relies heavily on users of the software and contributors outside of the core developers to produce top-quality software and provide long-term support. However, the relationship between a software and its contributors in terms of exactly how they are related through dependencies and how the users of a software affect many of its properties are not very well understood. Aim: My research covers a number of aspects related to answering the overarching question of modeling the software properties affected by users and the supply chain structure of software ecosystems, viz. 1) Understanding how software usage affect its perceived quality; 2) Estimating the effects of indirect usage (e.g. dependent packages) on software popularity; 3) Investigating the patch submission and issue creation patterns of external contributors; 4) Examining how the patch acceptance probability is related to the contributors\u27 characteristics. 5) A related topic, the identification of bots that commit code, aimed at improving the accuracy of these and other similar studies was also investigated. Methodology: Most of the Research Questions are addressed by studying the NPM ecosystem, with data from various sources like the World of Code, GHTorrent, and the GiHub API. Different supervised and unsupervised machine learning models, including Regression, Random Forest, Bayesian Networks, and clustering, were used to answer appropriate questions. Results: 1) Software usage affects its perceived quality even after accounting for code complexity measures. 2) The number of dependents and dependencies of a software were observed to be able to predict the change in its popularity with good accuracy. 3) Users interact (contribute issues or patches) primarily with their direct dependencies, and rarely with transitive dependencies. 4) A user\u27s earlier interaction with the repository to which they are contributing a patch, and their familiarity with related topics were important predictors impacting the chance of a pull request getting accepted. 5) Developed BIMAN, a systematic methodology for identifying bots. Conclusion: Different aspects of how users and their characteristics affect different software properties were analyzed, which should lead to a better understanding of the complex interaction between software developers and users/ contributors

    Pull request latency explained:an empirical overview

    Get PDF
    Pull request latency evaluation is an essential application of effort evaluation in the pull-based development scenario. It can help the reviewers sort the pull request queue, remind developers about the review processing time, speed up the review process and accelerate software development. There is a lack of work that systematically organizes the factors that affect pull request latency. Also, there is no related work discussing the differences and variations in characteristics in different scenarios and contexts. In this paper, we collected relevant factors through a literature review approach. Then we assessed their relative importance in five scenarios and six different contexts using the mixed-effects linear regression model. The most important factors differ in different scenarios. The length of the description is most important when pull requests are submitted. The existence of comments is most important when closing pull requests, using CI tools, and when the contributor and the integrator are different. When there exist comments, the latency of the first comment is the most important. Meanwhile, the influence of factors may change in different contexts. For example, the number of commits in a pull request has a more significant impact on pull request latency when closing than submitting due to changes in contributions brought about by the review process. Both human and bot comments are positively correlated with pull request latency. In contrast, the bot’s first comments are more strongly correlated with latency, but the number of comments is less correlated. Future research and tool implementation needs to consider the impact of different contexts. Researchers can conduct related studies based on our publicly available datasets and replication scripts

    Nip it in the Bud: Moderation Strategies in Open Source Software Projects and the Role of Bots

    Full text link
    Much of our modern digital infrastructure relies critically upon open sourced software. The communities responsible for building this cyberinfrastructure require maintenance and moderation, which is often supported by volunteer efforts. Moderation, as a non-technical form of labor, is a necessary but often overlooked task that maintainers undertake to sustain the community around an OSS project. This study examines the various structures and norms that support community moderation, describes the strategies moderators use to mitigate conflicts, and assesses how bots can play a role in assisting these processes. We interviewed 14 practitioners to uncover existing moderation practices and ways that automation can provide assistance. Our main contributions include a characterization of moderated content in OSS projects, moderation techniques, as well as perceptions of and recommendations for improving the automation of moderation tasks. We hope that these findings will inform the implementation of more effective moderation practices in open source communities
    corecore