Search CORE

64 research outputs found

Attribution Required: Stack Overflow Code Snippets in GitHub Projects

Author: Baltes Sebastian
Diehl Stephan
Kiefer Richard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/07/2017
Field of study

Stack Overflow (SO) is the largest Q&A website for developers, providing a huge amount of copyable code snippets. Using these snippets raises various maintenance and legal issues. The SO license requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO's license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. In this paper, we present the research design and summarized results of an empirical study analyzing attributed and unattributed usages of SO code snippets in GitHub projects. On average, 3.22% of all analyzed repositories and 7.33% of the popular ones contained a reference to SO. Further, we found that developers rather refer to the whole thread on SO than to a specific answer. For Java, at least two thirds of the copied snippets were not attributed.Comment: 3 pages, 1 figure, Proceedings of the 39th International Conference on Software Engineering Companion (ICSE-C 2017), IEEE, 2017, pp. 161-16

arXiv.org e-Print Archive

Crossref

The RISCOSS platform for risk management in open source software adoption

Author: Ameller David
Annosi Maria Carmela
Ben-Jacob Ron
Blumenfeld Yehuda
Franch Gutiérrez Javier
Franco Bedoya Óscar Hernán
Gross Daniel
Kenett Ron
López Cuesta Lidia
Mancinelli Fabio
Morandini Mirko
Oriol Hilari Marc
Siena Alberto
Susi Angelo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Managing risks related to OSS adoption is a must for organizations that need to smoothly integrate OSS-related practices in their development processes. Adequate tool support may pave the road to effective risk management and ensure the sustainability of such activity. In this paper, we present the RISCOSS platform for managing risks in OSS adoption. RISCOSS builds upon a highly configurable data model that allows customization to several types of scopes. It implements two different working modes: exploration, where the impact of decisions may be assessed before making them; and continuous assessment, where risk variables (and their possible consequences on business goals) are continuously monitored and reported to decision-makers. The blackboard-oriented architecture of the platform defines several interfaces for the identified techniques, allowing new techniques to be plugged in.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub

Author: Cheng Can
Li Bing
Li Zengyang
Liang Peng
Publication venue: 'KSI Research Inc.'
Publication date: 08/05/2018
Field of study

Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to mining these datasets have also grown. As the dataset grows, it becomes gradually unrealistic for human to confirm quality of all samples. Some studies have investigated this problem and provided solutions to avoid threats in sample selection, but some of these solutions (e.g., finding development projects) require human intervention. When the amount of data to be processed increases, these semi-automatic solutions become less useful since the effort in need for human intervention is far beyond affordable. To solve this problem, we investigated the GHTorrent dataset and proposed a method to detect public development projects. The results show that our method can effectively improve the sample selection process in two ways: (1) We provide a simple model to automatically select samples (with 0.827 precision and 0.947 recall); (2) We also offer a complex model to help researchers carefully screen samples (with 63.2% less effort than manually confirming all samples, and can achieve 0.926 precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc

arXiv.org e-Print Archive

Crossref

The Unexplored Treasure Trove of Phabricator Code Reviews

Author: Kudrjavets Gunnar
Nagappan Nachiappan
Rastogi Ayushi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/03/2022
Field of study

Phabricator is a modern code collaboration tool used by popular projects like FreeBSD and Mozilla. However, unlike the other well-known code review environments, such as Gerrit or GitHub, there is no readily accessible public code review dataset for Phabricator. This paper describes our experience mining code reviews from five different projects that use Phabricator (Blender, FreeBSD, KDE, LLVM, and Mozilla). We discuss the challenges associated with the data retrieval process and our solutions, resulting in a dataset with details regarding 317,476 Phabricator code reviews. Our dataset11https://doi.org/10.6084/m9.figshare.17139245 is available in both JSON and MySQL database dump formats. The dataset enables analyses of the history of code reviews at a more granular level than other platforms. In addition, given that the projects we mined are publicly accessible via the Conduit API [18], our dataset can be used as a foundation to fetch additional details and insights

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

The Unexplored Treasure Trove of Phabricator Code Reviews

Author: Kudrjavets Gunnar
Nagappan Nachiappan
Rastogi Ayushi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/03/2022
Field of study

University of Groningen

GitDelver Enterprise Dataset (GDED):An Industrial Closed-source Dataset for Socio-Technical Research

Author: Devroey Xavier
RIQUET Nicolas
Vanderose Benoit
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2022
Field of study

Repository of the University of Namur

Towards a Critical Open-Source Software Database

Author: Dam Tobias
Klausner Lukas Daniel
Neumaier Sebastian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/05/2023
Field of study

Open-source software (OSS) plays a vital role in the modern software ecosystem. However, the maintenance and sustainability of OSS projects can be challenging. In this paper, we present the CrOSSD project, which aims to build a database of OSS projects and measure their current project "health" status. In the project, we will use both quantitative and qualitative metrics to evaluate the health of OSS projects. The quantitative metrics will be gathered through automated crawling of meta information such as the number of contributors, commits and lines of code. Qualitative metrics will be gathered for selected "critical" projects through manual analysis and automated tools, including aspects such as sustainability, funding, community engagement and adherence to security policies. The results of the analysis will be presented on a user-friendly web platform, which will allow users to view the health of individual OSS projects as well as the overall health of the OSS ecosystem. With this approach, the CrOSSD project provides a comprehensive and up-to-date view of the health of OSS projects, making it easier for developers, maintainers and other stakeholders to understand the health of OSS projects and make informed decisions about their use and maintenance.Comment: 4 pages, 1 figur

arXiv.org e-Print Archive