32,042 research outputs found
Visual querying and analysis of large software repositories
We present a software framework for mining software repositories. Our extensible framework enables the integration of data extraction from repositories with data analysis and interactive visualization. We demonstrate the applicability of the framework by presenting several case studies performed on industry-size software repositories. In each study we use the framework to give answers to one or several software engineering questions addressing a specific project. Next, we validate the answers by comparing them with existing project documentation, by interviewing domain experts and by detailed analyses of the source code. The results show that our framework can be used both for supporting case studies on mining software repository techniques and for building end-user tools for software maintenanc
Mining Architectural Information: A Systematic Mapping Study
Context: Mining Software Repositories (MSR) has become an essential activity
in software development. Mining architectural information to support
architecting activities, such as architecture understanding and recovery, has
received a significant attention in recent years. However, there is an absence
of a comprehensive understanding of the state of research on mining
architectural information. Objective: This work aims to identify, analyze, and
synthesize the literature on mining architectural information in software
repositories in terms of architectural information and sources mined,
architecting activities supported, approaches and tools used, and challenges
faced. Method: A Systematic Mapping Study (SMS) has been conducted on the
literature published between January 2006 and November 2021. Results: Of the 79
primary studies finally selected, 8 categories of architectural information
have been mined, among which architectural description is the most mined
architectural information; 12 architecting activities can be supported by the
mined architectural information, among which architecture understanding is the
most supported activity; 81 approaches and 52 tools were proposed and employed
in mining architectural information; and 4 types of challenges in mining
architectural information were identified. Conclusions: This SMS provides
researchers with promising future directions and help practitioners be aware of
what approaches and tools can be used to mine what architectural information
from what sources to support various architecting activities.Comment: 68 pages, 5 images, 15 tables, Manuscript submitted to a Journal
(2022
A Benchmark Study on Sentiment Analysis for Software Engineering Research
A recent research trend has emerged to identify developers' emotions, by
applying sentiment analysis to the content of communication traces left in
collaborative development environments. Trying to overcome the limitations
posed by using off-the-shelf sentiment analysis tools, researchers recently
started to develop their own tools for the software engineering domain. In this
paper, we report a benchmark study to assess the performance and reliability of
three sentiment analysis tools specifically customized for software
engineering. Furthermore, we offer a reflection on the open challenges, as they
emerge from a qualitative analysis of misclassified texts.Comment: Proceedings of 15th International Conference on Mining Software
Repositories (MSR 2018
Mining software repositories to support software evolution
Software evolution represents a major phase in the development life cycle of software systems. In recent years, software evolution has been recognized as one of the most important and challenging areas in the field of software engineering. Studies even show that 65-80% of the system lifetime will be spent on maintenance and evolution activities. Software repositories, such as versioning and bug tracking systems are essential parts of various software maintenance activities. Given the often large amounts of information stored in these repositories, researchers have proposed to mine and analyze these large knowledge bases in order to study and support various aspects of the evolution of a software system. In this thesis, we introduce a common ontological representation to support the mining and analysis of software repositories. In addition to this common representation, we introduce the SVN-Ontologizer and Bugzilla-Ontologizer tools that provide automation for both data extraction from remote repositories and ontology populations. A case study is presented to illustrate the applicability of the present approach in supporting software maintainers during the analysis and mining of these software repositorie
A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits
In order to understand the state and evolution of the entirety of open source
software we need to get a handle on the set of distinct software projects. Most
of open source projects presently utilize Git, which is a distributed version
control system allowing easy creation of clones and resulting in numerous
repositories that are almost entirely based on some parent repository from
which they were cloned. Git commits are based on Merkle Tree and two commits
are highly unlikely to be produced independently. Shared commits, therefore,
appear like an excellent way to group cloned repositories and obtain an
accurate map for such repositories. We use World of Code infrastructure
containing approximately 2B commits and 100M repositories to create and share
such a map. We discover that the largest group contains almost 14M repositories
most of which are unrelated to each other. As it turns out, the developers can
push git object to an arbitrary repository or pull objects from unrelated
repositories, thus linking unrelated repositories. To address this, we apply
Louvain community detection algorithm to this very large graph consisting of
links between commits and projects. The approach successfully reduces the size
of the megacluster with the largest group of highly interconnected projects
containing under 100K repositories. We expect the tools that the resulting map
of related projects as well as tools and methods to handle the very large graph
will serve as a reference set for mining software projects and other
applications. Further work is needed to determine different types of
relationships among projects induced by shared commits and other relationships,
for example, by shared source code or similar filenames.Comment: 5 page
Filling the gaps of development logs and bug issue data
It has been suggested that the data from bug repositories is not always in sync or complete compared to the logs detailing the actions of developers on source code. In this paper, we trace two sources of information relative to software bugs: the change logs of the actions of developers and the issues reported as bugs. The aim is to identify and quantify the discrepancies between the two sources in recording and storing the developer logs relative to bugs. Focussing on the databases produced by two mining software repository tools, CVSAnalY and Bicho, we use part of the SZZ algorithm to identify bugs and to compare how the"defects-fixing changes" are recorded in the two databases. We use a working example to show how to do so. The results indicate that there is a significant amount of information, not in sync when tracing bugs in the two databases. We, therefore, propose an automatic approach to re-align the two databases, so that the collected information is mirrored and in sync.Dr. Felipe Orteg
- …