Although the open source model bears many advantages in software development,
open source projects are always hard to sustain. Previous research on open
source sustainability mainly focuses on projects that have already reached a
certain level of maturity (e.g., with communities, releases, and downstream
projects). However, limited attention is paid to the development of
(sustainable) open source projects in their infancy, and we believe an
understanding of early sustainability determinants is crucial for project
initiators, incubators, newcomers, and users.
In this paper, we aim to explore the relationship between early participation
factors and long-term project sustainability. We leverage a novel methodology
combining the Blumberg model of performance and machine learning to predict the
sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost
model based on early participation (first three months of activity) in 290,255
GitHub projects and we interpret the model using LIME. We quantitatively show
that early participants have a positive effect on project's future sustained
activity if they have prior experience in OSS project incubation and
demonstrate concentrated focus and steady commitment. Participation from
non-code contributors and detailed contribution documentation also promote
project's sustained activity. Compared with individual projects, building a
community that consists of more experienced core developers and more active
peripheral developers is important for organizational projects. This study
provides unique insights into the incubation and recognition of sustainable
open source projects, and our interpretable prediction approach can also offer
guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and
Symposium on the Foundations of Software Engineering (ESEC/FSE 2023