Search CORE

15,069 research outputs found

Structured Review of the Evidence for Effects of Code Duplication on Software Quality

Author: Hordijk Wiebe
Ponisio María Laura
Wieringa Roel
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

University of Twente Research Information

SourcererCC: Scaling Code Clone Detection to Big Code

Author: Lopes Cristina V.
Roy Chanchal K.
Saini Vaibhav
Sajnani Hitesh
Svajlenko Jeffrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/12/2015
Field of study

Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

arXiv.org e-Print Archive

Structured Review of Code Clone Literature

Author: Hordijk Wiebe
Ponisio María Laura
Wieringa Roel
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

University of Twente Research Information

Stack Overflow in Github: Any Snippets There?

Author: Lopes Cristina
Martins Pedro
Saini Vaibhav
Yang Di
Publication venue
Publication date: 02/05/2017
Field of study

When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11 page

arXiv.org e-Print Archive

Harmfulness of Code Duplication - A Structured Review of the Evidence

Author: Hordijk Wiebe
Ponisio María Laura
Wieringa Roel
Publication venue: British Computer Society
Publication date: 01/01/2009
Field of study

Duplication of code has long been thought to decrease changeability of systems, but recently doubts have been expressed whether this is true in general. This is a problem for researchers because it makes the value of research aimed against clones uncertain, and for practitioners as they cannot be sure whether their effort in reducing duplication is well-spent. In this paper we try to shed light on this is-sue by collecting empirical evidence in favor and against the nega-tive effects of duplication on changeability. We go beyond the flat yes/no-question of harmfulness and present an explanatory model to show the mechanisms through which duplication is suspected to affect quality. We aggregate the evidence for each of the causal links in the model. This sheds light on the current state of duplication re-search and helps practitioners choose between the available mitiga-tion strategies

University of Twente Research Information