Search CORE

675 research outputs found

Identifying Redundancies in Fork-based Development

Author: Kästner Christian
Ren Luyao
Wasowski Andrzej
Zhou Shurui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Crossref

The IT University of Copenhagen's Repository

Analysis and Detection of Information Types of Open Source Software Issue Discussions

Author: Arya Deeksha
Cheng Jinghui
Guo Jin L. C.
Wang Wenting
Publication venue
Publication date: 01/01/2019
Field of study

Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering (ICSE2019

arXiv.org e-Print Archive

Crossref

PolyPublie

Code reviewer intelligent prediction in open source industrial software project

Author: Huang Xuechun
Liao Zhifang
Yu Song
Zhang Bolin
Zhang Yan
Publication venue
Publication date: 23/04/2023
Field of study

ResearchOnline@GCU

On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests

Author: Abdalkareem Rabe
Costa Diego Elias
Khatoonabadi SayedHassan
Shihab Emad
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/05/2022
Field of study

Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors, wasting the effort and time put in by both the contributors and maintainers. To better understand the underlying dynamics of contributor-abandoned PRs, we conduct a mixed-methods study using both quantitative and qualitative methods. We curate a dataset consisting of 265,325 PRs including 4,450 abandoned ones from ten popular and mature GitHub projects and measure 16 features characterizing PRs, contributors, review processes, and projects. Using statistical and machine learning techniques, we find that complex PRs, novice contributors, and lengthy reviews have a higher probability of abandonment and the rate of PR abandonment fluctuates alongside the projects' maturity or workload. To identify why contributors abandon their PRs, we also manually examine a random sample of 354 abandoned PRs. We observe that the most frequent abandonment reasons are related to the obstacles faced by contributors, followed by the hurdles imposed by maintainers during the review process. Finally, we survey the top core maintainers of the studied projects to understand their perspectives on dealing with PR abandonment and on our findings.Comment: Manuscript accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM

arXiv.org e-Print Archive

Exploring the characteristics of issue-related behaviors in GitHub using visualization techniques

Author: Chen Zhijie
Fan Xiaoping
He Dayu
Liao Zhifang
Liu Shengzong
Zhang Yan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Crossref

ResearchOnline@GCU

Duplicate Bug Report Detection:How Far Are We?

Author: Han DongGyun
Irsan Ivana Clairine
Jiang Lingxiao
Lo David
Thung Ferdian
Vinayakarao Venkatesh
Xu Bowen
Zhang Ting
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/12/2022
Field of study

Royal Holloway - Pure

A Dataset for GitHub Repository Deduplication: Extended Description

Author: Allamanis Miltiadis
Chris
Habib Elsayed AE
Sebastián Marcos
Publication venue
Publication date: 21/04/2020
Field of study

GitHub projects can be easily replicated through the site's fork process or through a Git clone-push sequence. This is a problem for empirical software engineering, because it can lead to skewed results or mistrained machine learning models. We provide a dataset of 10.6 million GitHub projects that are copies of others, and link each record with the project's ultimate parent. The ultimate parents were derived from a ranking along six metrics. The related projects were calculated as the connected components of an 18.2 million node and 12 million edge denoised graph created by directing edges to ultimate parents. The graph was created by filtering out more than 30 hand-picked and 2.3 million pattern-matched clumping projects. Projects that introduced unwanted clumping were identified by repeatedly visualizing shortest path distances between unrelated important projects. Our dataset identified 30 thousand duplicate projects in an existing popular reference dataset of 1.8 million projects. An evaluation of our dataset against another created independently with different methods found a significant overlap, but also differences attributed to the operational definition of what projects are considered as related.Comment: 33 pages, 33 figures, 17 listing

arXiv.org e-Print Archive

Crossref

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY