2 research outputs found

    Pitfalls and Guidelines for Using Time-Based Git Data

    Get PDF
    Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantify how often such data is dirty. Depending on the research task and method used, including such dirty data could aect the research results. This paper presents an extended survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 754 technical track and data papers published in MSR 2004{2021, we saw at least 290 (38%) papers utilized time-based data. We also observed that most time-based data used in research papers comes in the form of Git commits, often from GitHub. Based on those results, we then used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty Git timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data from Git repositories

    Open source software ecosystems : a systematic mapping

    Get PDF
    Context: Open source software (OSS) and software ecosystems (SECOs) are two consolidated research areas in software engineering. OSS influences the way organizations develop, acquire, use and commercialize software. SECOs have emerged as a paradigm to understand dynamics and heterogeneity in collaborative software development. For this reason, SECOs appear as a valid instrument to analyze OSS systems. However, there are few studies that blend both topics together. Objective: The purpose of this study is to evaluate the current state of the art in OSS ecosystems (OSSECOs) research, specifically: (a) what the most relevant definitions related to OSSECOs are; (b) what the particularities of this type of SECO are; and (c) how the knowledge about OSSECO is represented. Method: We conducted a systematic mapping following recommended practices. We applied automatic and manual searches on different sources and used a rigorous method to elicit the keywords from the research questions and selection criteria to retrieve the final papers. As a result, 82 papers were selected and evaluated. Threats to validity were identified and mitigated whenever possible. Results: The analysis allowed us to answer the research questions. Most notably, we did the following: (a) identified 64 terms related to the OSSECO and arranged them into a taxonomy; (b) built a genealogical tree to understand the genesis of the OSSECO term from related definitions; (c) analyzed the available definitions of SECO in the context of OSS; and (d) classified the existing modelling and analysis techniques of OSSECOs. Conclusion: As a summary of the systematic mapping, we conclude that existing research on several topics related to OSSECOs is still scarce (e.g., modelling and analysis techniques, quality models, standard definitions, etc.). This situation calls for further investigation efforts on how organizations and OSS communities actually understand OSSECOs.Peer ReviewedPostprint (author's final draft
    corecore