Background: Open source software has an increasing importance in modern
software development. However, there is also a growing concern on the
sustainability of such projects, which are usually managed by a small number of
developers, frequently working as volunteers. Aims: In this paper, we propose
an approach to identify GitHub projects that are not actively maintained. Our
goal is to alert users about the risks of using these projects and possibly
motivate other developers to assume the maintenance of the projects. Method: We
train machine learning models to identify unmaintained or sparsely maintained
projects, based on a set of features about project activity (commits, forks,
issues, etc). We empirically validate the model with the best performance with
the principal developers of 129 GitHub projects. Results: The proposed machine
learning approach has a precision of 80%, based on the feedback of real open
source developers; and a recall of 96%. We also show that our approach can be
used to assess the risks of projects becoming unmaintained. Conclusions: The
model proposed in this paper can be used by open source users and developers to
identify GitHub projects that are not actively maintained anymore.Comment: Accepted at 12th International Symposium on Empirical Software
Engineering and Measurement (ESEM), 10 pages, 201