135 research outputs found
Mitigating Turnover with Code Review Recommendation: Balancing Expertise, Workload, and Knowledge Distribution
Developer turnover is inevitable on software projects and leads to knowledge loss, a reduction in productivity, and an increase in defects. Mitigation strategies to deal with turnover tend to disrupt and increase workloads for developers. In this work, we suggest that through code review recommendation we can distribute knowledge and mitigate turnover with minimal impact on the development process. We evaluate review recommenders in the context of ensuring expertise during review, Expertise, reducing the review workload of the core team, CoreWorkload, and reducing the
Files at Risk to turnover, FaR. We find that prior work that assigns reviewers based on file ownership concentrates knowledge on a small group of core developers increasing risk of knowledge loss
from turnover by up to 65%. We propose learning and retention aware review recommenders that when combined are effective at reducing the risk of turnover by -29% but they unacceptably reduce
the overall expertise during reviews by -26%. We develop the Sophia recommender that suggest experts when none of the files under review are hoarded by developers but distributes knowledge when files are at risk. In this way, we are able to simultaneously increase expertise during review with a ΔExpertise of 6%, with a negligible impact on workload of ΔCoreWorkload of 0.09%, and reduce the files at risk by ΔFaR -28%. Sophia is integrated into GitHub pull requests allowing developers to select an appropriate expert or “learner” based on the context of the review. We release the Sophia bot as well as the code and data for replication purposes
Factoring Expertise, Workload, and Turnover into Code Review Recommendation
Developer turnover is inevitable on software projects and leads to knowledge
loss, a reduction in productivity, and an increase in defects. Mitigation
strategies to deal with turnover tend to disrupt and increase workloads for
developers. In this work, we suggest that through code review recommendation we
can distribute knowledge and mitigate turnover while more evenly distributing
review workload.
We conduct historical analyses to understand the natural concentration of
review workload and the degree of knowledge spreading that is inherent in code
review. Even though review workload is highly concentrated, we show that code
review natural spreads knowledge thereby reducing the files at risk to
turnover.
Using simulation, we evaluate existing code review recommenders and develop
novel recommenders to understand their impact on the level of expertise during
review, the workload of reviewers, and the files at risk to turnover. Our
simulations use seeded random replacement of reviewers to allow us to compare
the reviewer recommenders without the confounding variation of different
reviewers being replaced for each recommender.
Combining recommenders, we develop the SofiaWL recommender that suggests
experts with low active review workload when none of the files under review are
known by only one developer. In contrast, when knowledge is concentrated on one
developer, it sends the review to other reviewers to spread knowledge. For the
projects we study, we are able to globally increase expertise during reviews,
+3%, reduce workload concentration, -12%, and reduce the files at risk, -28%.
We make our scripts and data available in our replication package. Developers
can optimize for a particular outcome measure based on the needs of their
project, or use our GitHub bot to automatically balance the outcomes
Topic-based integrator matching for pull request
Pull Request (PR) is the main method for code contributions from the external
contributors in GitHub. PR review is an essential part of open source software
developments to maintain the quality of software. Matching a new PR for an
appropriate integrator will make the PR reviewing more effective. However, PR
and integrator matching are now organized manually in GitHub. To make this
process more efficient, we propose a Topic-based Integrator Matching Algorithm
(TIMA) to predict highly relevant collaborators(the core developers) as the
integrator to incoming PRs . TIMA takes full advantage of the textual semantics
of PRs. To define the relationships between topics and collaborators, TIMA
builds a relation matrix about topic and collaborators. According to the
relevance between topics and collaborators, TIMA matches the suitable
collaborators as the PR integrator
Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders
Code review ensures that a peer engineer manually examines the code before it
is integrated and released into production. At Meta, we develop a wide range of
software at scale, from social networking to software development
infrastructure, such as calendar and meeting tools to continuous integration.
We are constantly improving our code review system, and in this work we
describe a series of experiments that were conducted across 10's of thousands
of engineers and 100's of thousands of reviews.
We build upon the recommender that has been in production since 2018,
RevRecV1. We found that reviewers were being assigned based on prior authorship
of files. We reviewed the literature for successful features and experimented
with them with RevRecV2 in production. The most important feature in our new
model was the familiarity of the author and reviewer, we saw an overall
improvement in accuracy of 14 percentage points.
Prior research has shown that reviewer workload is skewed. To balance
workload, we divide the reviewer score from RevRecV2 by each candidate
reviewers workload. We experimented with multiple types of workload to develop
RevRecWL. We find that reranking candidate reviewers by workload often leads to
a reviewers with lower workload being selected by authors.
The bystander effect can occur when a team of reviewers is assigned the
review. We mitigate the bystander effect by randomly assigning one of the
recommended reviewers. Having an individual who is responsible for the review,
reduces the time take for reviews by -11%
Nudge: Accelerating Overdue Pull Requests Towards Completion
Pull requests are a key part of the collaborative software development and
code review process today. However, pull requests can also slow down the
software development process when the reviewer(s) or the author do not actively
engage with the pull request. In this work, we design an end-to-end service,
Nudge, for accelerating overdue pull requests towards completion by reminding
the author or the reviewer(s) to engage with their overdue pull requests.
First, we use models based on effort estimation and machine learning to predict
the completion time for a given pull request. Second, we use activity detection
to reduce false positives. Lastly, we use dependency determination to
understand the blocker of the pull request and nudge the appropriate
actor(author or reviewer(s)). We also do a correlation analysis to understand
the statistical relationship between the pull request completion times and
various pull request and developer related attributes. Nudge has been deployed
on 147 repositories at Microsoft since 2019. We do a large scale evaluation
based on the implicit and explicit feedback we received from sending the Nudge
notifications on 8,500 pull requests. We observe significant reduction in
completion time, by over 60%, for pull requests which were nudged thus
increasing the efficiency of the code review process and accelerating the pull
request progression
- …