39,526 research outputs found
The Co-Evolution of Test Maintenance and Code Maintenance through the lens of Fine-Grained Semantic Changes
Automatic testing is a widely adopted technique for improving software
quality. Software developers add, remove and update test methods and test
classes as part of the software development process as well as during the
evolution phase, following the initial release. In this work we conduct a large
scale study of 61 popular open source projects and report the relationships we
have established between test maintenance, production code maintenance, and
semantic changes (e.g, statement added, method removed, etc.). performed in
developers' commits.
We build predictive models, and show that the number of tests in a software
project can be well predicted by employing code maintenance profiles (i.e., how
many commits were performed in each of the maintenance activities: corrective,
perfective, adaptive). Our findings also reveal that more often than not,
developers perform code fixes without performing complementary test maintenance
in the same commit (e.g., update an existing test or add a new one). When
developers do perform test maintenance, it is likely to be affected by the
semantic changes they perform as part of their commit.
Our work is based on studying 61 popular open source projects, comprised of
over 240,000 commits consisting of over 16,000,000 semantic change type
instances, performed by over 4,000 software engineers.Comment: postprint, ICSME 201
Extracting Build Changes with BUILDDIFF
Build systems are an essential part of modern software engineering projects.
As software projects change continuously, it is crucial to understand how the
build system changes because neglecting its maintenance can lead to expensive
build breakage. Recent studies have investigated the (co-)evolution of build
configurations and reasons for build breakage, but they did this only on a
coarse grained level. In this paper, we present BUILDDIFF, an approach to
extract detailed build changes from MAVEN build files and classify them into 95
change types. In a manual evaluation of 400 build changing commits, we show
that BUILDDIFF can extract and classify build changes with an average precision
and recall of 0.96 and 0.98, respectively. We then present two studies using
the build changes extracted from 30 open source Java projects to study the
frequency and time of build changes. The results show that the top 10 most
frequent change types account for 73% of the build changes. Among them, changes
to version numbers and changes to dependencies of the projects occur most
frequently. Furthermore, our results show that build changes occur frequently
around releases. With these results, we provide the basis for further research,
such as for analyzing the (co-)evolution of build files with other artifacts or
improving effort estimation approaches. Furthermore, our detailed change
information enables improvements of refactoring approaches for build
configurations and improvements of models to identify error-prone build files.Comment: Accepted at the International Conference of Mining Software
Repositories (MSR), 201
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
- …