25,877 research outputs found

    Clone Detection and Elimination for Haskell

    Get PDF
    Duplicated code is a well known problem in software maintenance and refactoring. Code clones tend to increase program size and several studies have shown that duplicated code makes maintenance and code understanding more complex and time consuming. This paper presents a new technique for the detection and removal of duplicated Haskell code. The system is implemented within the refactoring framework of the Haskell Refactorer (HaRe), and uses an Abstract Syntax Tree (AST) based approach. Detection of duplicate code is automatic, while elimination is semi-automatic, with the user managing the clone removal. After presenting the system, an example is given to show how it works in practice

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    The pragmatics of clone detection and elimination

    Get PDF
    The occurrence of similar code, or ‘code clones’, can make program code difficult to read, modify and maintain. This paper describes industrial case studies of clone detection and elimination, and were were performed in collaboration with engineers from Ericsson AB using the refactoring and clone detection tool Wrangler for Erlang. We use the studies to illustrate the complex set of decisions that have to be taken when performing clone elimination in practice; we also discuss how the studies have informed the design of the tool. However, the conclusions we draw are largely language-independent, and set out the pragmatics of clone detection and elimination in real-world projects as well as design principles for clone detection decision-support tools. Context. The context of this work is the fact that a software tool is designed to be used; the success of such a tool therefore depends on its suitability and usability in practice. The work proceeds by observing the use of a tool in particular case studies in detail, through a “partici- pant observer” approach, and drawing qualitative conclusions from these studies, rather than collecting and analysing quantitative data from a larger set of applications. Our conclusions help not only programmers but also the designers of software tools. Inquiry. Data collected in this way make two kinds of contribution. First, they provide the basis for deriving a set of questions that typically need to be answered by engineers in the process of removing clones from an application, and a set of heuristics that can be used to help answer these questions. Secondly, they provide feedback on existing features of software tools, as well as suggesting new features to be added to the tools. Approach. The work was undertaken by the tool designers and engineers from Ericsson AB, working to- gether on clone elimination for code from the company. Knowledge. The work led to a number of conclusions, at different levels of generality. At the top level, there is overwhelming evidence that the process of clone elimination cannot be entirely automated, and needs to include the input of engineers familiar with the domain in question. Furthermore, there is strong evidence that the automated tools are sensitive to a set of parameters, which will differ for different applications and programming styles, and that individual clones can be over- and under-identified: again, involving those with knowledge of the code and the domain is key to successful application. Grounding. The work is grounded in “participant observation” by the tool builders, who made detailed logs of the processes undertaken by the group. Importance. The work gives guidelines that assist an engineer in using clone detection and elimination in practice, as well as helping a tool developer to shape their tool building. Although the work was in the context of a particular tool and programming language, the authors would argue that the high-level knowledge gained applies equally well to other notions of clone, as well as other tools and programming languages

    Benchmarking the vulnerability detection capabilities of software analysis tools

    Get PDF
    Code cloning and copy-pasting code fragments is common practice in software engineering. If security vulnerabilities exist in a cloned code segment, those vulnerabilities may spread in the related software, potentially leading to security incidents. Code similarity is one effective approach to detect vulnerabilities hidden in software projects. However, due to the complexity, size, and diversity of source code, current methods suffer from low accuracy, and poor performance. Moreover, most existing clone detection techniques focus on a limited set of programming languages in the detection process. We propose to solve these problems using SearchSECO, a software analysis tool that detects vulnerabilities in multiple programming languages

    Revealing Missing Bug-Fixes in Code Clones in Large-Scale Code Bases

    Get PDF
    When a bug is fixed in duplicated code, it is often necessary to modify all duplicates (so-called clones) accordingly.In practice, however, fixes are often incomplete, which causes the bug to remain in one or more of the clones.This paper presents an approach that detects such incomplete bug-fixes in cloned code by analyzing a system's version history to reveal those commits that fix problems.The approach then performs incremental clone detection to reveal those clones that became inconsistent as a result of such a fix.We present results from a case study that analyzed incomplete bug-fixes in six industrial and open-source systems to demonstrate the feasibility and defectiveness of our approach.We identified likely incomplete bug-fixes in all analyzed systems
    corecore