12,951 research outputs found

    Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub

    Full text link
    Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to mining these datasets have also grown. As the dataset grows, it becomes gradually unrealistic for human to confirm quality of all samples. Some studies have investigated this problem and provided solutions to avoid threats in sample selection, but some of these solutions (e.g., finding development projects) require human intervention. When the amount of data to be processed increases, these semi-automatic solutions become less useful since the effort in need for human intervention is far beyond affordable. To solve this problem, we investigated the GHTorrent dataset and proposed a method to detect public development projects. The results show that our method can effectively improve the sample selection process in two ways: (1) We provide a simple model to automatically select samples (with 0.827 precision and 0.947 recall); (2) We also offer a complex model to help researchers carefully screen samples (with 63.2% less effort than manually confirming all samples, and can achieve 0.926 precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc

    SZZ Unleashed: An Open Implementation of the SZZ Algorithm -- Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project

    Full text link
    Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute

    Healthy or Not: A Way to Predict Ecosystem Health in GitHub

    Get PDF
    With the development of open source community, through the interaction of developers, the collaborative development of software, and the sharing of software tools, the formation of open source software ecosystem has matured. Natural ecosystems provide ecological services on which human beings depend. Maintaining a healthy natural ecosystem is a necessity for the sustainable development of mankind. Similarly, maintaining a healthy ecosystem of open source software is also a prerequisite for the sustainable development of open source communities, such as GitHub. This paper takes GitHub as an example to analyze the health condition of open source ecosystem and, also, it is a research area in Symmetry. Firstly, the paper presents the healthy definition of GitHub open source ecosystem health and, then, according to the main components of natural ecosystem health, the paper proposes the health indicators and health indicators evaluation method. Based on the above, the GitHub ecosystem health prediction method is proposed. By analyzing the projects and data collected in GitHub, it is found that, using the proposed evaluation indicators and method, we can analyze the healthy development trend of the GitHub ecosystem and contribute to the stability of ecosystem development

    Influence of developer factors on code quality: a data study

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Automatic source-code inspection tools help to assess, monitor and improve code quality. Since these tools only examine the software project’s codebase, they overlook other possible factors that may impact code quality and the assessment of the technical debt (TD). Our initial hypothesis is that human factors associated with the software developers, like coding expertise, communication skills, and experience in the project have some measurable impact on the code quality. In this exploratory study, we test this hypothesis on two large open source repositories, using TD as a code quality metric and the data that may be inferred from the version control systems. The preliminary results of our statistical analysis suggest that the level of participation of the developers and their experience in the project have a positive correlation with the amount of TD that they introduce. On the contrary, communication skills have barely any impact on TD.Peer ReviewedPostprint (author's final draft

    International conference on software engineering and knowledge engineering: Session chair

    Get PDF
    The Thirtieth International Conference on Software Engineering and Knowledge Engineering (SEKE 2018) will be held at the Hotel Pullman, San Francisco Bay, USA, from July 1 to July 3, 2018. SEKE2018 will also be dedicated in memory of Professor Lofti Zadeh, a great scholar, pioneer and leader in fuzzy sets theory and soft computing. The conference aims at bringing together experts in software engineering and knowledge engineering to discuss on relevant results in either software engineering or knowledge engineering or both. Special emphasis will be put on the transference of methods between both domains. The theme this year is soft computing in software engineering & knowledge engineering. Submission of papers and demos are both welcome
    • …
    corecore