95 research outputs found
Personalized First Issue Recommender for Newcomers in Open Source Projects
Many open source projects provide good first issues (GFIs) to attract and
retain newcomers. Although several automated GFI recommenders have been
proposed, existing recommenders are limited to recommending generic GFIs
without considering differences between individual newcomers. However, we
observe mismatches between generic GFIs and the diverse background of
newcomers, resulting in failed attempts, discouraged onboarding, and delayed
issue resolution. To address this problem, we assume that personalized first
issues (PFIs) for newcomers could help reduce the mismatches. To justify the
assumption, we empirically analyze 37 newcomers and their first issues resolved
across multiple projects. We find that the first issues resolved by the same
newcomer share similarities in task type, programming language, and project
domain. These findings underscore the need for a PFI recommender to improve
over state-of-the-art approaches. For that purpose, we identify features that
influence newcomers' personalized selection of first issues by analyzing the
relationship between possible features of the newcomers and the characteristics
of the newcomers' chosen first issues. We find that the expertise preference,
OSS experience, activeness, and sentiment of newcomers drive their personalized
choice of the first issues. Based on these findings, we propose a Personalized
First Issue Recommender (PFIRec), which employs LamdaMART to rank candidate
issues for a given newcomer by leveraging the identified influential features.
We evaluate PFIRec using a dataset of 68,858 issues from 100 GitHub projects.
The evaluation results show that PFIRec outperforms existing first issue
recommenders, potentially doubling the probability that the top recommended
issue is suitable for a specific newcomer and reducing one-third of a
newcomer's unsuccessful attempts to identify suitable first issues, in the
median.Comment: The 38th IEEE/ACM International Conference on Automated Software
Engineering (ASE 2023
Simplifying Deep-Learning-Based Model for Code Search
To accelerate software development, developers frequently search and reuse
existing code snippets from a large-scale codebase, e.g., GitHub. Over the
years, researchers proposed many information retrieval (IR) based models for
code search, which match keywords in query with code text. But they fail to
connect the semantic gap between query and code. To conquer this challenge, Gu
et al. proposed a deep-learning-based model named DeepCS. It jointly embeds
method code and natural language description into a shared vector space, where
methods related to a natural language query are retrieved according to their
vector similarities. However, DeepCS' working process is complicated and
time-consuming. To overcome this issue, we proposed a simplified model
CodeMatcher that leverages the IR technique but maintains many features in
DeepCS. Generally, CodeMatcher combines query keywords with the original order,
performs a fuzzy search on name and body strings of methods, and returned the
best-matched methods with the longer sequence of used keywords. We verified its
effectiveness on a large-scale codebase with about 41k repositories.
Experimental results showed the simplified model CodeMatcher outperforms DeepCS
by 97% in terms of MRR (a widely used accuracy measure for code search), and it
is over 66 times faster than DeepCS. Besides, comparing with the
state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by
73%. We also observed that: fusing the advantages of IR-based and
deep-learning-based models is promising because they compensate with each other
by nature; improving the quality of method naming helps code search, since
method name plays an important role in connecting query and code
How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?
Although the open source model bears many advantages in software development,
open source projects are always hard to sustain. Previous research on open
source sustainability mainly focuses on projects that have already reached a
certain level of maturity (e.g., with communities, releases, and downstream
projects). However, limited attention is paid to the development of
(sustainable) open source projects in their infancy, and we believe an
understanding of early sustainability determinants is crucial for project
initiators, incubators, newcomers, and users.
In this paper, we aim to explore the relationship between early participation
factors and long-term project sustainability. We leverage a novel methodology
combining the Blumberg model of performance and machine learning to predict the
sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost
model based on early participation (first three months of activity) in 290,255
GitHub projects and we interpret the model using LIME. We quantitatively show
that early participants have a positive effect on project's future sustained
activity if they have prior experience in OSS project incubation and
demonstrate concentrated focus and steady commitment. Participation from
non-code contributors and detailed contribution documentation also promote
project's sustained activity. Compared with individual projects, building a
community that consists of more experienced core developers and more active
peripheral developers is important for organizational projects. This study
provides unique insights into the incubation and recognition of sustainable
open source projects, and our interpretable prediction approach can also offer
guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and
Symposium on the Foundations of Software Engineering (ESEC/FSE 2023
Including Everyone, Everywhere:Understanding Opportunities and Challenges of Geographic Gender-Inclusion in OSS
The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our study presents a multi-region geographical analysis of gender inclusion on GitHub. This mixed-methods approach includes quantitatively investigating differences in gender inclusion in projects across geographic regions and investigate these trends over time using data from contributions to 21,456 project repositories. We also qualitatively understand the unique experiences of developers contributing to these projects through a survey that is strategically targeted to developers in various regions worldwide. Our findings indicate that gender diversity is low across all parts of the world, with no substantial difference across regions. However, there has been statistically significant improvement in diversity worldwide since 2014, with certain regions such as Africa improving at faster pace. We also find that most motivations and barriers to contributions (e.g., lack of resources to contribute and poor working environment) were shared across regions, however, some insightful differences, such as how to make projects more inclusive, did arise. From these findings, we derive and present implications for tools that can foster inclusion in open source software communities and empower contributions from everyone, everywhere
The Role of Mentoring and Project Characteristics for Onboarding in Open Source Software Projects
Context: Onboarding is a process that helps newcomers become integrated members of their organisation. Successful onboarding programs can result in increased performance in conventional organisations, but there is little guidance on how to onboard new developers in Open Source Software (OSS) projects. Goal: In this study, we examine how mentoring and project characteristics influence the effectiveness and efficiency of the onboarding process. We study a collaboration program involving a total of nine Open Source Software projects and more than 120 students from different universities around the world as part of Facebook's Education Modernization Program. Method: We use quantitative measurements of source code repositories, issue tracking systems, and discussion fora to examine how newcomers become contributing members of their OSS projects. Results: We found that developers receiving deliberate onboarding support through mentoring were more active at an earlier stage than developers entering projects through conventional means. Also, we found that project size and lifetime influenced onboarding. Conclusion: Empirical decision support can contribute to a more effective onboarding process in OSS projects. Mentor support in critical stages can accelerate the process, but project maturity is also a significant factor that increases the effect of onboarding.Peer reviewe
Recommended from our members
It Takes a Village: Open Source Software Sustainability
This Guidebook is designed to serve as a practical reference source to help open source software programs serving cultural and scientific heritage organizations plan for long-term sustainability, ensuring that commitment and resources will be available at levels sufficient for the software to remain viable and effective as long as it is needed.
One of the most significant themes of this Guidebook is that sustainability is not a linear process, with set beginning and end points. Program sustainability shifts and evolves over time across a number of phases and facets. The phases speak to where a program is in its lifecycle: getting started, growing, or stable but not static. The facets describe the different components of sustainability, each of which is critical to overall program health, but may have different timelines, goals, and resource needs. The facets deemed
most critical by the Guidebook’s authors and contributors are: Governance, Technology, Resources (Financial and Human), and Community Engagement.
Sections of the Guidebook will: define the phases and facets of sustainability; identify goals, characteristics, and common roadblocks for each phase in each facet; provide guidance for moving an OSS program to the next phase in a given facet, with the understanding that the same program may be in different phases along different facets of sustainability; and highlight case studies and additional resources to help a program’s research and decision-making process.
The Guidebook is intended for a broad audience. While certain paths may be of more interest than others, we would recommend reading through each of the facets before returning to the one that aligns most closely with a specific role, e.g., governance for a program manager, technology for a technical lead, engagement for a community manager, or resources for an administrator. The worksheet in Appendix A can help identify the specific phase a program is in along each facet.
The open source landscape is wide and varied. Bringing open source programs serving cultural and scientific heritage together under one shared umbrella can provide us all with the power to better advocate for our needs, develop shared sustainability strategies, and provide our communities with the information needed to assess and contribute to the sustainability of the programs they depend on
- …