9 research outputs found
Beyond Traditional Software Development: Studying and Supporting the Role of Reusing Crowdsourced Knowledge in Software Development
As software development is becoming increasingly complex, developers often need to reuse others’ code or knowledge made available online to tackle problems encountered during software development and maintenance. This phenomenon of using others' code or knowledge, often found on online forums, is referred to as crowdsourcing. A good example of crowdsourcing is posting a coding question on the Stack Overflow website and having others contribute code that solves that question. Recently, the phenomenon of crowdsourcing has attracted much attention from researchers and practitioners and recent studies show that crowdsourcing improves productivity and reduces time-to-market. However, like any solution, crowdsourcing brings with it challenges such as quality, maintenance, and even legal issues.
The research presented in this thesis presents the result of a series of large-scale empirical studies involving some of the most popular crowdsourcing platforms such as Stack Overflow, Node Package Manager (npm), and Python Package Index (PyPI). The focus of these empirical studies is to investigate the role of reusing crowdsourcing knowledge and more particularly crowd code in the software development process.
We first present two empirical studies on the reuse of knowledge from crowdsourcing platforms namely Stack Overflow. We found that reusing knowledge from this crowdsourcing platform has the potential to assist software development practices, specifically through source code reuse.
However, relying on such crowdsourced knowledge might also negatively affect the quality of the software projects. Second, we empirically examine the type of development knowledge constructed on crowdsourcing platforms. We examine the use of trivial packages on npm and PyPI platforms. We found that trivial packages are common and developers tend to use them because they provide them with well tested and implemented code. However, developers are concerned about the maintenance overhead of these trivial packages due to the extra dependencies that trivial packages introduce. Finally, we used the gained knowledge to propose a pragmatic solution to improve the efficiency of relying on the crowd in software development. We proposed a rule-based technique that automatically detects commits that can skip the continuous integration process. We evaluate the performance of the proposed technique on a dataset of open-source Java projects. Our results show that continuous integration can be used to improve the efficiency of the reused code from crowdsourcing platforms.
Among the findings of this thesis are that the way software is developed has changed dramatically. Developers rely on crowdsourcing to address problems encountered during software development and maintenance. The results presented in this thesis provides new insights on how knowledge from these crowdsourced platforms is reused in software systems and how some of this knowledge can be better integrated into current software development processes and best practices
On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests
Pull-based development has enabled numerous volunteers to contribute to
open-source projects with fewer barriers. Nevertheless, a considerable amount
of pull requests (PRs) with valid contributions are abandoned by their
contributors, wasting the effort and time put in by both the contributors and
maintainers. To better understand the underlying dynamics of
contributor-abandoned PRs, we conduct a mixed-methods study using both
quantitative and qualitative methods. We curate a dataset consisting of 265,325
PRs including 4,450 abandoned ones from ten popular and mature GitHub projects
and measure 16 features characterizing PRs, contributors, review processes, and
projects. Using statistical and machine learning techniques, we find that
complex PRs, novice contributors, and lengthy reviews have a higher probability
of abandonment and the rate of PR abandonment fluctuates alongside the
projects' maturity or workload. To identify why contributors abandon their PRs,
we also manually examine a random sample of 354 abandoned PRs. We observe that
the most frequent abandonment reasons are related to the obstacles faced by
contributors, followed by the hurdles imposed by maintainers during the review
process. Finally, we survey the top core maintainers of the studied projects to
understand their perspectives on dealing with PR abandonment and on our
findings.Comment: Manuscript accepted for publication in ACM Transactions on Software
Engineering and Methodology (TOSEM
Where to Go Now? Finding Alternatives for Declining Packages in the npm Ecosystem
Software ecosystems (e.g., npm, PyPI) are the backbone of modern software
developments. Developers add new packages to ecosystems every day to solve new
problems or provide alternative solutions, causing obsolete packages to decline
in their importance to the community. Packages in decline are reused less
overtime and may become less frequently maintained. Thus, developers usually
migrate their dependencies to better alternatives. Replacing packages in
decline with better alternatives requires time and effort by developers to
identify packages that need to be replaced, find the alternatives, asset
migration benefits, and finally, perform the migration.
This paper proposes an approach that automatically identifies packages that
need to be replaced and finds their alternatives supported with real-world
examples of open source projects performing the suggested migrations. At its
core, our approach relies on the dependency migration patterns performed in the
ecosystem to suggest migrations to other developers. We evaluated our approach
on the npm ecosystem and found that 96% of the suggested alternatives are
accurate. Furthermore, by surveying expert JavaScript developers, 67% of them
indicate that they will use our suggested alternative packages in their future
projects
An empirical study on the removal of self-admitted technical debt
\u3cp\u3eTechnical debt refers to the phenomena of taking shortcuts to achieve short term gain at the cost of higher maintenance efforts in the future. Recently, approaches were developed to detect technical debt through code comments, referred to as Self-Admitted Technical Debt (SATD). Due to its importance, several studies have focused on the detection of SATD and examined its impact on software quality. However, preliminary findings showed that in some cases SATD may live in a project for a long time, i.e., more than 10 years. These findings clearly show that not all SATD may be regarded as 'bad' and some SATD needs to be removed, while other SATD may be fine to take on. Therefore, in this paper, we study the removal of SATD. In an empirical study on five open source projects, we examine how much SATD is removed and who removes SATD? We also investigate for how long SATD lives in a project and what activities lead to the removal of SATD? Our findings indicate that the majority of SATD is removed and that the majority is self-removed (i.e., removed by the same person that introduced it). Moreover, we find that SATD can last between approx. 18-172 days, on median. Finally, through a developer survey, we find that developers mostly use SATD to track future bugs and areas of the code that need improvements. Also, developers mostly remove SATD when they are fixing bugs or adding new features. Our findings contribute to the body of empirical evidence on SATD, in particular evidence pertaining to its removal.\u3c/p\u3
Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot
<p>Replication Package for "Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot"</p>