3,529 research outputs found

    Harmfulness of Code Duplication - A Structured Review of the Evidence

    Get PDF
    Duplication of code has long been thought to decrease changeability of systems, but recently doubts have been expressed whether this is true in general. This is a problem for researchers because it makes the value of research aimed against clones uncertain, and for practitioners as they cannot be sure whether their effort in reducing duplication is well-spent. In this paper we try to shed light on this is-sue by collecting empirical evidence in favor and against the nega-tive effects of duplication on changeability. We go beyond the flat yes/no-question of harmfulness and present an explanatory model to show the mechanisms through which duplication is suspected to affect quality. We aggregate the evidence for each of the causal links in the model. This sheds light on the current state of duplication re-search and helps practitioners choose between the available mitiga-tion strategies

    Structured Review of Code Clone Literature

    Get PDF
    This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

    Stack Overflow: A Code Laundering Platform?

    Full text link
    Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow.Comment: In proceedings of the 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER

    Studying Late Propagations in Code Clone Evolution Using Software Repository Mining

    Get PDF
    In the code clone evolution community, the Late Propagation (LP) has been identified as one of the clone evolution patterns that can potentially lead to software defects. An LP occurs when instances of a clone pair are changed consistently, but not at the same time. The clone instance, which receives the update at a later time, might exhibit unintended behavior if the modification was a bugfix. In this paper, we present an approach to extract LPs from software repositories. Subsequently, we study LPs in four software systems, which allows us to investigate the propagation time, the clone dispersion and the effects of LPs on the software

    Detection and analysis of near-miss clone genealogies

    Get PDF
    It is believed that identical or similar code fragments in source code, also known as code clones, have an impact on software maintenance. A clone genealogy shows how a group of clone fragments evolve with the evolution of the associated software system, and thus may provide important insights on the maintenance implications of those clone fragments. Considering the importance of studying the evolution of code clones, many studies have been conducted on this topic. However, after a decade of active research, there has been a marked lack of progress in understanding the evolution of near-miss software clones, especially where statements have been added, deleted, or modified in the copied fragments. Given that there are a significant amount of near-miss clones in the software systems, we believe that without studying the evolution of near-miss clones, one cannot have a complete picture of the clone evolution. In this thesis, we have advanced the state-of-the-art in the evolution of clone research in the context of both exact and near-miss software clones. First, we performed a large-scale empirical study to extend the existing knowledge about the evolution of exact and renamed clones where identifiers have been modified in the copied fragments. Second, we have developed a framework, gCad that can automatically extract both exact and near-miss clone genealogies across multiple versions of a program and identify their change patterns reasonably fast while maintaining high precision and recall. Third, in order to gain a broader perspective of clone evolution, we extended gCad to calculate various evolutionary metrics, and performed an in-depth empirical study on the evolution of both exact and near-miss clones in six open source software systems of two different programming languages with respect to five research questions. We discovered several interesting evolutionary phenomena of near-miss clones which either contradict with previous findings or are new. Finally, we further improved gCad, and investigated a wide range of attributes and metrics derived from both the clones themselves and their evolution histories to identify certain attributes, which developers often use to remove clones in the real world. We believe that our new insights in the evolution of near-miss clones, and about how developers approach and remove duplication, will play an important role in understanding the maintenance implications of clones and will help design better clone management systems

    How Clones are Maintained: An Empirical Study

    Full text link
    Despite the conventional wisdom concerning the risks related to the use of source code cloning as a software development strategy, several studies appeared in literature indicated that this is not true. In most cases clones are properly maintained and, when this does not happen, is because cloned code evolves independently. Stemming from previous works, this paper combines clone detection and co–change analysis to investigate how clones are maintained when an evolution activity or a bug fixing impact a source code fragment belonging to a clone class. The two case studies reported confirm that, either for bug fixing or for evolution purposes, most of the cloned code is consistently maintained during the same co–change or during temporally close co–changes

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision
    • …
    corecore