5 research outputs found
Gitana: a SQL-based Git Repository Inspector
International audienceSoftware development projects are notoriously complex and difficult to deal with. Several support tools such as issue tracking, code review and Source Control Management (SCM) systems have been introduced in the past decades to ease development activities. While such tools efficiently track the evolution of a given aspect of the project (e.g., bug reports), they provide just a partial view of the project and often lack of advanced querying mechanisms limiting themselves to command line or simple GUI support. This is particularly true for projects that rely on Git, the most popular SCM system today. In this paper, we propose a conceptual schema for Git and an approach that, given a Git repository, exports its data to a relational database in order to (1) promote data integration with other existing SCM tools and (2) enable writing queries on Git data using standard SQL syntax. To ensure efficiency, our approach comes with an incremental propagation mechanism that refreshes the database content with the latest modifications. We have implemented our approach in Gitana, an open-source tool available on GitHub
Mega Software Engineering
Techinical Report of Software Engineering Lab in Osaka Univ. SEL-Sep-22-200
Synchronous development in open-source projects: A higher-level perspective
Mailing lists are a major communication channel for supporting developer coordina tion in open-source software projects. In a recent study, researchers explored tempo ral relationships (e.g., synchronization) between developer activities on source code
and on the mailing list, relying on simple heuristics of developer collaboration (e.g.,
co-editing fles) and developer communication (e.g., sending e-mails to the mailing
list). We propose two methods for studying synchronization between collaboration
and communication activities from a higher-level perspective, which captures the
complex activities and views of developers more precisely than the rather technical
perspective of previous work. On the one hand, we explore developer collaboration
at the level of features (not fles), which are higher-level concepts of the domain and
not mere technical artifacts. On the other hand, we lift the view of developer com munication from a message-based model, which treats each e-mail individually, to
a conversation-based model, which is semantically richer due to grouping e-mails
that represent conceptually related discussions. By means of an empirical study, we
investigate whether the diferent abstraction levels afect the observed relationship
between commit activity and e-mail communication using state-of-the-art time series analysis. For this purpose, we analyze a combined history of 40 years of data
for three highly active and widely deployed open-source projects: QEMU, BusyBox,
and OpenSSL. Overall, we found evidence that a higher-level view on the coordina tion of developers leads to identifying a stronger statistical dependence between the
technical activities of developers than a less abstract and rather technical view
Evidence-based Software Process Recovery
Developing a large software system involves many complicated, varied, and
inter-dependent tasks, and these tasks are typically implemented using a
combination of defined processes, semi-automated tools, and ad hoc
practices. Stakeholders in the development process --- including software
developers, managers, and customers --- often want to be able to track the
actual practices being employed within a project. For example, a customer
may wish to be sure that the process is ISO 9000 compliant, a manager may
wish to track the amount of testing that has been done in the current
iteration, and a developer may wish to determine who has recently been
working on a subsystem that has had several major bugs appear in it.
However, extracting the software development processes from an existing
project is expensive if one must rely upon manual inspection of artifacts
and interviews of developers and their managers. Previously, researchers
have suggested the live observation and instrumentation of a project to
allow for more measurement, but this is costly, invasive, and also requires
a live running project.
In this work, we propose an approach that we call software process
recovery that is based on after-the-fact analysis of various kinds of
software development artifacts. We use a variety of supervised and
unsupervised techniques from machine learning, topic analysis, natural
language processing, and statistics on software repositories such as version
control systems, bug trackers, and mailing list archives. We show how we can
combine all of these methods to recover process signals that we map back to
software development processes such as the Unified Process. The Unified
Process has been visualized using a time-line view that shows effort per
parallel discipline occurring across time. This visualization is called the
Unified Process diagram. We use this diagram as inspiration to produce
Recovered Unified Process Views (RUPV) that are a concrete version of this
theoretical Unified Process diagram. We then validate these methods using
case studies of multiple open source software systems
Process-Centric Analytical Processing of Version Control Data
This paper introduces a novel approach to enabling analytical processing of project data. The approach exploits source code repositories for information about project evolution. Furthermore this paper proposes a new perspective on analyzing version control data. It takes up a processcentric viewpoint, addresses related analysis problems like collaboration of programmers and proposes metrics for them. The research has yielded an implementation of the approach, which comprises visualizations that assist in examining the evolution of software process