Build systems are an essential part of modern software engineering projects.
As software projects change continuously, it is crucial to understand how the
build system changes because neglecting its maintenance can lead to expensive
build breakage. Recent studies have investigated the (co-)evolution of build
configurations and reasons for build breakage, but they did this only on a
coarse grained level. In this paper, we present BUILDDIFF, an approach to
extract detailed build changes from MAVEN build files and classify them into 95
change types. In a manual evaluation of 400 build changing commits, we show
that BUILDDIFF can extract and classify build changes with an average precision
and recall of 0.96 and 0.98, respectively. We then present two studies using
the build changes extracted from 30 open source Java projects to study the
frequency and time of build changes. The results show that the top 10 most
frequent change types account for 73% of the build changes. Among them, changes
to version numbers and changes to dependencies of the projects occur most
frequently. Furthermore, our results show that build changes occur frequently
around releases. With these results, we provide the basis for further research,
such as for analyzing the (co-)evolution of build files with other artifacts or
improving effort estimation approaches. Furthermore, our detailed change
information enables improvements of refactoring approaches for build
configurations and improvements of models to identify error-prone build files.Comment: Accepted at the International Conference of Mining Software
Repositories (MSR), 201