32,042 research outputs found

    Visual querying and analysis of large software repositories

    Get PDF
    We present a software framework for mining software repositories. Our extensible framework enables the integration of data extraction from repositories with data analysis and interactive visualization. We demonstrate the applicability of the framework by presenting several case studies performed on industry-size software repositories. In each study we use the framework to give answers to one or several software engineering questions addressing a specific project. Next, we validate the answers by comparing them with existing project documentation, by interviewing domain experts and by detailed analyses of the source code. The results show that our framework can be used both for supporting case studies on mining software repository techniques and for building end-user tools for software maintenanc

    Mining Architectural Information: A Systematic Mapping Study

    Full text link
    Context: Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information to support architecting activities, such as architecture understanding and recovery, has received a significant attention in recent years. However, there is an absence of a comprehensive understanding of the state of research on mining architectural information. Objective: This work aims to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. Method: A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and November 2021. Results: Of the 79 primary studies finally selected, 8 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 12 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 81 approaches and 52 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. Conclusions: This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.Comment: 68 pages, 5 images, 15 tables, Manuscript submitted to a Journal (2022

    A Benchmark Study on Sentiment Analysis for Software Engineering Research

    Full text link
    A recent research trend has emerged to identify developers' emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a benchmark study to assess the performance and reliability of three sentiment analysis tools specifically customized for software engineering. Furthermore, we offer a reflection on the open challenges, as they emerge from a qualitative analysis of misclassified texts.Comment: Proceedings of 15th International Conference on Mining Software Repositories (MSR 2018

    Mining software repositories to support software evolution

    Get PDF
    Software evolution represents a major phase in the development life cycle of software systems. In recent years, software evolution has been recognized as one of the most important and challenging areas in the field of software engineering. Studies even show that 65-80% of the system lifetime will be spent on maintenance and evolution activities. Software repositories, such as versioning and bug tracking systems are essential parts of various software maintenance activities. Given the often large amounts of information stored in these repositories, researchers have proposed to mine and analyze these large knowledge bases in order to study and support various aspects of the evolution of a software system. In this thesis, we introduce a common ontological representation to support the mining and analysis of software repositories. In addition to this common representation, we introduce the SVN-Ontologizer and Bugzilla-Ontologizer tools that provide automation for both data extraction from remote repositories and ontology populations. A case study is presented to illustrate the applicability of the present approach in supporting software maintainers during the analysis and mining of these software repositorie

    A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits

    Full text link
    In order to understand the state and evolution of the entirety of open source software we need to get a handle on the set of distinct software projects. Most of open source projects presently utilize Git, which is a distributed version control system allowing easy creation of clones and resulting in numerous repositories that are almost entirely based on some parent repository from which they were cloned. Git commits are based on Merkle Tree and two commits are highly unlikely to be produced independently. Shared commits, therefore, appear like an excellent way to group cloned repositories and obtain an accurate map for such repositories. We use World of Code infrastructure containing approximately 2B commits and 100M repositories to create and share such a map. We discover that the largest group contains almost 14M repositories most of which are unrelated to each other. As it turns out, the developers can push git object to an arbitrary repository or pull objects from unrelated repositories, thus linking unrelated repositories. To address this, we apply Louvain community detection algorithm to this very large graph consisting of links between commits and projects. The approach successfully reduces the size of the megacluster with the largest group of highly interconnected projects containing under 100K repositories. We expect the tools that the resulting map of related projects as well as tools and methods to handle the very large graph will serve as a reference set for mining software projects and other applications. Further work is needed to determine different types of relationships among projects induced by shared commits and other relationships, for example, by shared source code or similar filenames.Comment: 5 page

    Filling the gaps of development logs and bug issue data

    Get PDF
    It has been suggested that the data from bug repositories is not always in sync or complete compared to the logs detailing the actions of developers on source code. In this paper, we trace two sources of information relative to software bugs: the change logs of the actions of developers and the issues reported as bugs. The aim is to identify and quantify the discrepancies between the two sources in recording and storing the developer logs relative to bugs. Focussing on the databases produced by two mining software repository tools, CVSAnalY and Bicho, we use part of the SZZ algorithm to identify bugs and to compare how the"defects-fixing changes" are recorded in the two databases. We use a working example to show how to do so. The results indicate that there is a significant amount of information, not in sync when tracing bugs in the two databases. We, therefore, propose an automatic approach to re-align the two databases, so that the collected information is mirrored and in sync.Dr. Felipe Orteg
    • …
    corecore