17,674 research outputs found
Structured Review of the Evidence for Effects of Code Duplication on Software Quality
This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)
SourcererCC: Scaling Code Clone Detection to Big Code
Despite a decade of active research, there is a marked lack in clone
detectors that scale to very large repositories of source code, in particular
for detecting near-miss clones where significant editing activities may take
place in the cloned code. We present SourcererCC, a token-based clone detector
that targets three clone types, and exploits an index to achieve scalability to
large inter-project repositories using a standard workstation. SourcererCC uses
an optimized inverted-index to quickly query the potential clones of a given
code block. Filtering heuristics based on token ordering are used to
significantly reduce the size of the index, the number of code-block
comparisons needed to detect the clones, as well as the number of required
token-comparisons needed to judge a potential clone.
We evaluate the scalability, execution time, recall and precision of
SourcererCC, and compare it to four publicly available and state-of-the-art
tools. To measure recall, we use two recent benchmarks, (1) a large benchmark
of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of
thousands of fine-grained artificial clones. We find SourcererCC has both high
recall and precision, and is able to scale to a large inter-project repository
(250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised
Stack Overflow in Github: Any Snippets There?
When programmers look for how to achieve certain programming tasks, Stack
Overflow is a popular destination in search engine results. Over the years,
Stack Overflow has accumulated an impressive knowledge base of snippets of code
that are amply documented. We are interested in studying how programmers use
these snippets of code in their projects. Can we find Stack Overflow snippets
in real projects? When snippets are used, is this copy literal or does it
suffer adaptations? And are these adaptations specializations required by the
idiosyncrasies of the target artifact, or are they motivated by specific
requirements of the programmer? The large-scale study presented on this paper
analyzes 909k non-fork Python projects hosted on Github, which contain 290M
function definitions, and 1.9M Python snippets captured in Stack Overflow.
Results are presented as quantitative analysis of block-level code cloning
intra and inter Stack Overflow and GitHub, and as an analysis of programming
behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11
page
An Extended Stable Marriage Problem Algorithm for Clone Detection
Code cloning negatively affects industrial software and threatens
intellectual property. This paper presents a novel approach to detecting cloned
software by using a bijective matching technique. The proposed approach focuses
on increasing the range of similarity measures and thus enhancing the precision
of the detection. This is achieved by extending a well-known stable-marriage
problem (SMP) and demonstrating how matches between code fragments of different
files can be expressed. A prototype of the proposed approach is provided using
a proper scenario, which shows a noticeable improvement in several features of
clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
Pirate plunder: game-based computational thinking using scratch blocks
Policy makers worldwide argue that children should be taught how technology works, and that the âcomputational thinkingâ skills developed through programming are useful in a wider context. This is causing an increased focus on computer science in primary and secondary education.
Block-based programming tools, like Scratch, have become ubiquitous in primary education (5 to 11-years-old) throughout the UK. However, Scratch users often struggle to detect and correct âcode smellsâ (bad programming practices) such as duplicated blocks and large scripts, which can lead to programs that are difficult to understand. These âsmellsâ are caused by a lack of abstraction and decomposition in programs; skills that play a key role in computational thinking. In Scratch, repeats (loops), custom blocks (procedures) and clones (instances) can be used to correct these smells. Yet, custom blocks and clones are rarely taught to children under 11-years-old.
We describe the design of a novel educational block-based programming game, Pirate Plunder, which aims to teach these skills to children aged 9-11. Players use Scratch blocks to navigate around a grid, collect items and interact with obstacles. Blocks are explained in âtutorialsâ; the player then completes a series of âchallengesâ before attempting the next tutorial. A set of Scratch blocks, including repeats, custom blocks and clones, are introduced in a linear difficulty progression. There are two versions of Pirate Plunder; one that uses a debugging-first approach, where the player is given a program that is incomplete or incorrect, and one where each level begins with an empty program.
The game design has been developed through iterative playtesting. The observations made during this process have influenced key design decisions such as Scratch integration, difficulty progression and reward system. In future, we will evaluate Pirate Plunder against a traditional Scratch curriculum and compare the debugging-first and non-debugging versions in a series of studies
Reconstructing the Forest of Lineage Trees of Diverse Bacterial Communities Using Bio-inspired Image Analysis
Cell segmentation and tracking allow us to extract a plethora of cell
attributes from bacterial time-lapse cell movies, thus promoting computational
modeling and simulation of biological processes down to the single-cell level.
However, to analyze successfully complex cell movies, imaging multiple
interacting bacterial clones as they grow and merge to generate overcrowded
bacterial communities with thousands of cells in the field of view,
segmentation results should be near perfect to warrant good tracking results.
We introduce here a fully automated closed-loop bio-inspired computational
strategy that exploits prior knowledge about the expected structure of a
colony's lineage tree to locate and correct segmentation errors in analyzed
movie frames. We show that this correction strategy is effective, resulting in
improved cell tracking and consequently trustworthy deep colony lineage trees.
Our image analysis approach has the unique capability to keep tracking cells
even after clonal subpopulations merge in the movie. This enables the
reconstruction of the complete Forest of Lineage Trees (FLT) representation of
evolving multi-clonal bacterial communities. Moreover, the percentage of valid
cell trajectories extracted from the image analysis almost doubles after
segmentation correction. This plethora of trustworthy data extracted from a
complex cell movie analysis enables single-cell analytics as a tool for
addressing compelling questions for human health, such as understanding the
role of single-cell stochasticity in antibiotics resistance without losing site
of the inter-cellular interactions and microenvironment effects that may shape
it
- âŠ