108 research outputs found

    Phoenix-based clone detection using suffix trees

    Full text link
    A code clone represents a sequence of statements that are duplicated in multiple locations of a program. Clones often arise in source code as a result of multiple cut/paste operations on the source, or due to the emergence of crosscutting concerns. Programs containing code clones can manifest problems during the maintenance phase. When a fault is found or an update is needed on the original copy of a code section, all similar clones must also be found so that they can be fixed or updated accordingly. The ability to detect clones becomes a necessity when performing maintenance tasks. However, if done manually, clone detection can be a slow and tedious activity that is also error prone. A tool that can automatically detect clones offers a significant advantage during software evolution. With such an automated detection tool, clones can be found and updated in less time. Moreover, restructuring or refactoring of these clones can yield better performance and modularity in the program. This paper describes an investigation into an automatic clone detection technique developed as a plug-in for Microsoft’s new Phoenix framework. Our investigation finds function-level clones in a program using abstract syntax trees (ASTs) and suffix trees. An AST provides the structural representation of the code after the lexical analysis process. The AST nodes are used to generate a suffix tree, which allows analysis on the nodes to be performed rapidly. We use the same methods that have been successfully applied to find duplicate sections in biological sequences to search for matches on the suffix tree that is generated, which in turn reveal matches in the code

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    An Extended Stable Marriage Problem Algorithm for Clone Detection

    Full text link
    Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

    Structured Review of Code Clone Literature

    Get PDF
    This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

    Syntax tree fingerprinting: a foundation for source code similarity detection

    Get PDF
    Plagiarism detection and clone refactoring in software depend on one common concern: nding similar source chunks across large repositories. However, since code duplication in software is often the result of copy-paste behaviors, only minor modi cations are expected between shared codes. On the contrary, in a plagiarism detection context, edits are more extensive and exact matching strategies show their limits. Among the three main representations used by source code similarity detection tools, namely the linear token sequences, the Abstract Syntax Tree (AST) and the Program Depen- dency Graph (PDG), we believe that the AST could e ciently support the program analysis and transformations required for the advanced similarity detection process. In this paper we present a simple and scalable architecture based on syntax tree nger- printing. Thanks to a study of several hashing strategies reducing false-positive collisions, we propose a framework that e ciently indexes AST representations in a database, that quickly detects exact (w.r.t source code abstraction) clone clusters and that easily retrieves their corresponding ASTs. Our aim is to allow further processing of neighboring exact matches in order to identify the larger approximate matches, dealing with the common modi cation patterns seen in the intra-project copy-pastes and in the plagiarism cases

    Viewing functions as token sequences to highlight similarities in source code

    Get PDF
    International audienceThe detection of similarities in source code has applications not only in software re-engineering (to eliminate redundancies) but also in software plagiarism detection. This latter can be a challenging problem since more or less extensive edits may have been performed on the original copy: insertion or removal of useless chunks of code, rewriting of expressions, transposition of code, inlining and outlining of functions, etc. In this paper, we propose a new similarity detection technique not only based on token sequence matching but also on the factorization of the function call graphs. The factorization process merges shared chunks (factors) of codes to cope, in particular, with inlining and outlining. The resulting call graph offers a view of the similarities with their nesting relations. It is useful to infer metrics quantifying similarity at a function level

    Stable Marriage Problem Based Adaptation for Clone Detection and Service Selection

    Get PDF
    Current software engineering topics such as clone detection and service selection need to improve the capability of detection process and selection process. The clone detection is the process of finding duplicated code through the system for several purposes such as removal of repeated portions as maintenance part of legacy system. Service selection is the process of finding the appropriate web service which meets the consumer’s request. Both problems can be converted into a matching problem. Matching process forms an essential part of software engineering activities. In this research, a well-known mathematical algorithm Stable Marriage Problem (SMP) and its variations are investigated to fulfil the purposes of matching processes in software engineering area. We aim to provide a competitive matching algorithm that can help to detect cloned software accurately and ensure high scalability, precision and recall. We also aim to apply matching algorithm on incoming request and service profile to deal with the web service as a clever independent object so that we can allow the services to accept or decline requests (equal opportunity) rather than the current state of service selection (search-based), in which service lacks of interacting as an independent candidate. In order to meet the above aims, the traditional SMP algorithm has been extended to achieve the cardinality of many-to-many. This adaptation is achieved by defining the selective strategy which is the main engine of the new adaptations. Two adaptations, Dual-Proposed and Dual-Multi-Allocation, have been proposed to both service selection and clone detection process. The proposed approach (SMP-based) shows very competitive results compare to existing software clone approaches, especially in identifying type 3 (copy with further modifications such update, add and delete statements) of cloned software. It performs the detection process with a relatively high precision and recall compare to the CloneDR tool and shows good scalability on a middle sized program. For service selection, the proposed approach has several advantages such as service protection and service quality. The services gain equal opportunity against the incoming requests. Therefore, the intelligent service interaction is achieved, and both stability and satisfaction of the candidates are ensured. This dissertation contributes to several contributions firstly, the new extended SMP algorithm by introducing selective strategy to accommodate many-to-many matching problems, to improve overall features. Secondly, a new SMP-based clone detection approach to detect cloned software accurately and ensures high precision and recall. Ultimately, a new SMPbased service selection approach allows equal opportunity between services and requests. This led to improve service protection and service quality. Case studies are carried out for experiments with the proposed approach, which show that the new adaptations can be applied effectively to clone detection and service selection processes with several features (e.g. accuracy). It can be concluded that the match based approach is feasible and promising in software engineering domain.Royal Embassy of Saudi Arabi

    AN EFFICIENT METHOD-LEVEL CODE CLONE DETECTION SCHEME THROUGH TEXTUAL ANALYSIS USING METRICS

    Get PDF
    ABSTRACT Code cloning or the act of copying code fragments and making minor, non-functional alterations, is a well known problem for evolving software systems which leads to duplicated code fragments known as code clones. A Clone Detection approach is to find out the reused fragment of code in any application to maintain different types of clones that are being identified by the clone detection techniques. Ever since clone detection evolved, it has been providing better results by reducing the complexity. A different clone detection tool makes the detection process easier and produces efficient results. In many existing systems, main focus is on line by line detection or token based detection to find out the clones in the system. So, it makes the system to take long time to process the entire source code. If the fragment of code is not an exact copy but the functionalities make it similar to each other, then existing system doesn't figure out that type of clones in it. This paper proposes combination of textual and metric analysis of a source code for the detection of all types of clones in a given set of fragment of java source code. Various semantics have been formulated and their values are used during the detection process. This metrics with textual analysis provides less complexity in finding the clones and giving accurate results
    corecore