16,531 research outputs found
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
Mining developer communication data streams
This paper explores the concepts of modelling a software development project
as a process that results in the creation of a continuous stream of data. In
terms of the Jazz repository used in this research, one aspect of that stream
of data would be developer communication. Such data can be used to create an
evolving social network characterized by a range of metrics. This paper
presents the application of data stream mining techniques to identify the most
useful metrics for predicting build outcomes. Results are presented from
applying the Hoeffding Tree classification method used in conjunction with the
Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results
indicate that only a small number of the available metrics considered have any
significance for predicting the outcome of a build
We Don't Need Another Hero? The Impact of "Heroes" on Software Development
A software project has "Hero Developers" when 80% of contributions are
delivered by 20% of the developers. Are such heroes a good idea? Are too many
heroes bad for software quality? Is it better to have more/less heroes for
different kinds of projects? To answer these questions, we studied 661 open
source projects from Public open source software (OSS) Github and 171 projects
from an Enterprise Github.
We find that hero projects are very common. In fact, as projects grow in
size, nearly all project become hero projects. These findings motivated us to
look more closely at the effects of heroes on software development. Analysis
shows that the frequency to close issues and bugs are not significantly
affected by the presence of project type (Public or Enterprise). Similarly, the
time needed to resolve an issue/bug/enhancement is not affected by heroes or
project type. This is a surprising result since, before looking at the data, we
expected that increasing heroes on a project will slow down howfast that
project reacts to change. However, we do find a statistically significant
association between heroes, project types, and enhancement resolution rates.
Heroes do not affect enhancement resolution rates in Public projects. However,
in Enterprise projects, the more heroes increase the rate at which project
complete enhancements.
In summary, our empirical results call for a revision of a long-held truism
in software engineering. Software heroes are far more common and valuable than
suggested by the literature, particularly for medium to large Enterprise
developments. Organizations should reflect on better ways to find and retain
more of these heroesComment: 8 pages + 1 references, Accepted to International conference on
Software Engineering - Software Engineering in Practice, 201
Beyond Surveys: Analyzing Software Development Artifacts to Assess Teaching Efforts
This Innovative Practice Full Paper presents an approach of using software
development artifacts to gauge student behavior and the effectiveness of
changes to curriculum design. There is an ongoing need to adapt university
courses to changing requirements and shifts in industry. As an educator it is
therefore vital to have access to methods, with which to ascertain the effects
of curriculum design changes. In this paper, we present our approach of
analyzing software repositories in order to gauge student behavior during
project work. We evaluate this approach in a case study of a university
undergraduate software development course teaching agile development
methodologies. Surveys revealed positive attitudes towards the course and the
change of employed development methodology from Scrum to Kanban. However,
surveys were not usable to ascertain the degree to which students had adapted
their workflows and whether they had done so in accordance with course goals.
Therefore, we analyzed students' software repository data, which represents
information that can be collected by educators to reveal insights into learning
successes and detailed student behavior. We analyze the software repositories
created during the last five courses, and evaluate differences in workflows
between Kanban and Scrum usage
On the evolution and impact of architectural smells—an industrial case study
Architectural smells (AS) are notorious for their long-term impact on the Maintainability and Evolvability of software systems. The majority of research work has investigated this topic by mining software repositories of open source Java systems, making it hard to generalise and apply them to an industrial context and other programming languages. To address this research gap, we conducted an embedded multiple-case case study, in collaboration with a large industry partner, to study how AS evolve in industrial embedded systems. We detect and track AS in 9 C/C++ projects with over 30 releases for each project that span over two years of development, with over 20 millions lines of code in the last release only. In addition to these quantitative results, we also interview 12 among the developers and architects working on these projects, collecting over six hours of qualitative data about the usefulness of AS analysis and the issues they experienced while maintaining and evolving artefacts affected by AS. Our quantitative findings show how individual smell instances evolve over time, how long they typically survive within the system, how they overlap with instances of other smell types, and finally what the introduction order of smell types is when they overlap. Our qualitative findings, instead, provide insights on the effects of AS on the long-term maintainability and evolvability of the system, supported by several excerpts from our interviews. Practitioners also mention what parts of the AS analysis actually provide actionable insights that they can use to plan refactoring activities
- …