64 research outputs found
Attribution Required: Stack Overflow Code Snippets in GitHub Projects
Stack Overflow (SO) is the largest Q&A website for developers, providing a
huge amount of copyable code snippets. Using these snippets raises various
maintenance and legal issues. The SO license requires attribution, i.e.,
referencing the original question or answer, and requires derived work to adopt
a compatible license. While there is a heated debate on SO's license model for
code snippets and the required attribution, little is known about the extent to
which snippets are copied from SO without proper attribution. In this paper, we
present the research design and summarized results of an empirical study
analyzing attributed and unattributed usages of SO code snippets in GitHub
projects. On average, 3.22% of all analyzed repositories and 7.33% of the
popular ones contained a reference to SO. Further, we found that developers
rather refer to the whole thread on SO than to a specific answer. For Java, at
least two thirds of the copied snippets were not attributed.Comment: 3 pages, 1 figure, Proceedings of the 39th International Conference
on Software Engineering Companion (ICSE-C 2017), IEEE, 2017, pp. 161-16
The RISCOSS platform for risk management in open source software adoption
Managing risks related to OSS adoption is a must for organizations that need to smoothly integrate OSS-related practices in their development processes. Adequate tool support may pave the road to effective risk management and ensure the sustainability of such activity. In this paper, we present the RISCOSS platform for managing risks in OSS adoption. RISCOSS builds upon a highly configurable data model that allows customization to several types of scopes. It implements two different working modes: exploration, where the impact of decisions may be assessed before making them; and continuous assessment, where risk variables (and their possible consequences on business goals) are continuously monitored and reported to decision-makers. The blackboard-oriented architecture of the platform defines several interfaces for the identified techniques, allowing new techniques to be plugged in.Peer ReviewedPostprint (author’s final draft
Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub
Hosting over 10 million of software projects, GitHub is one of the most
important data sources to study behavior of developers and software projects.
However, with the increase of the size of open source datasets, the potential
threats to mining these datasets have also grown. As the dataset grows, it
becomes gradually unrealistic for human to confirm quality of all samples. Some
studies have investigated this problem and provided solutions to avoid threats
in sample selection, but some of these solutions (e.g., finding development
projects) require human intervention. When the amount of data to be processed
increases, these semi-automatic solutions become less useful since the effort
in need for human intervention is far beyond affordable. To solve this problem,
we investigated the GHTorrent dataset and proposed a method to detect public
development projects. The results show that our method can effectively improve
the sample selection process in two ways: (1) We provide a simple model to
automatically select samples (with 0.827 precision and 0.947 recall); (2) We
also offer a complex model to help researchers carefully screen samples (with
63.2% less effort than manually confirming all samples, and can achieve 0.926
precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc
The Unexplored Treasure Trove of Phabricator Code Reviews
Phabricator is a modern code collaboration tool used by popular projects like FreeBSD and Mozilla. However, unlike the other well-known code review environments, such as Gerrit or GitHub, there is no readily accessible public code review dataset for Phabricator. This paper describes our experience mining code reviews from five different projects that use Phabricator (Blender, FreeBSD, KDE, LLVM, and Mozilla). We discuss the challenges associated with the data retrieval process and our solutions, resulting in a dataset with details regarding 317,476 Phabricator code reviews. Our dataset11https://doi.org/10.6084/m9.figshare.17139245 is available in both JSON and MySQL database dump formats. The dataset enables analyses of the history of code reviews at a more granular level than other platforms. In addition, given that the projects we mined are publicly accessible via the Conduit API [18], our dataset can be used as a foundation to fetch additional details and insights
GitDelver Enterprise Dataset (GDED):An Industrial Closed-source Dataset for Socio-Technical Research
Towards a Critical Open-Source Software Database
Open-source software (OSS) plays a vital role in the modern software
ecosystem. However, the maintenance and sustainability of OSS projects can be
challenging. In this paper, we present the CrOSSD project, which aims to build
a database of OSS projects and measure their current project "health" status.
In the project, we will use both quantitative and qualitative metrics to
evaluate the health of OSS projects. The quantitative metrics will be gathered
through automated crawling of meta information such as the number of
contributors, commits and lines of code. Qualitative metrics will be gathered
for selected "critical" projects through manual analysis and automated tools,
including aspects such as sustainability, funding, community engagement and
adherence to security policies. The results of the analysis will be presented
on a user-friendly web platform, which will allow users to view the health of
individual OSS projects as well as the overall health of the OSS ecosystem.
With this approach, the CrOSSD project provides a comprehensive and up-to-date
view of the health of OSS projects, making it easier for developers,
maintainers and other stakeholders to understand the health of OSS projects and
make informed decisions about their use and maintenance.Comment: 4 pages, 1 figur
- …