1,056 research outputs found

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    Structured Review of Code Clone Literature

    Get PDF
    This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

    SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

    Get PDF
    To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

    Revealing Missing Bug-Fixes in Code Clones in Large-Scale Code Bases

    Get PDF
    When a bug is fixed in duplicated code, it is often necessary to modify all duplicates (so-called clones) accordingly.In practice, however, fixes are often incomplete, which causes the bug to remain in one or more of the clones.This paper presents an approach that detects such incomplete bug-fixes in cloned code by analyzing a system's version history to reveal those commits that fix problems.The approach then performs incremental clone detection to reveal those clones that became inconsistent as a result of such a fix.We present results from a case study that analyzed incomplete bug-fixes in six industrial and open-source systems to demonstrate the feasibility and defectiveness of our approach.We identified likely incomplete bug-fixes in all analyzed systems

    A novel approach for Software Clone detection using Data Mining in Software

    Get PDF
    The Similar Program structures which recur in variant forms in software systems are code clones. Many techniques are proposed in order to detect similar code fragments in software. The software maintenance is generally helped by maintenance is generally helped by the identification and subsequent unification. When the patterns of simple clones reoccur, it is an indication for the presence of interesting higher-level similarities. They are called as Structural Clones. The structural clones when compared to simple clones show a bigger picture of similarities. The problem of huge number of clones is alleviated by the structural clones, which are part of logical groups of simple clones. In order to understand the design of the system for better maintenance and reengineering for reuse, detection of structural clones is essential. In this paper, a technique which is useful to detect some useful types of structural clones is proposed. The novelty of the present approach comprises the formulation of the structural clone concept and the application of data mining techniques. A novel approach is useful for implementation of the proposed technique is described

    Koodikloonien hyödyntÀminen asiakaskohtaisten erojen havaitsemiseksi tuotteistusprosessissa

    Get PDF
    The topic for this thesis was inspired by two case studies. The case studies are applications that are conceptually but not technically products. Their code bases contain customer-specific branches. The development strategy with the case studies has been forking an existing branch and customizing it to the needs of the new client. Code reuse and forking can be an efficient or even a necessary development strategy due to time pressure. However, code duplication may result in harder maintainability of the code base which in turn increases the maintenance costs. Finding similar code fragments is researched in the field of code clone detection. Code clones are code fragments that are either the same or similar. The similarity can be categorized into 4 types. Type I clones are exact matches that differ only in layout, whitespace or comments. In addition to type I changes, type II clones can differ in identifier names and types or literal values. Furthermore, type III clones can have statements added, deleted or modified within the code fragments under comparison. Type IV clones are functionally similar clones. There are different kinds of techniques and tools for both detecting and visualizing clones. Different techniques find different sets of clone types. Code clone visualizations present both the overview of the cloning situation, and the details in the source code level. The branches of the same product of the case studies can be considered as clones of each other. They are expected to remind type III clones. They essentially originate from the same code base, but each one has added, deleted and modified statements within the corresponding files between the other branches. Identifying these changes facilitate forming an overall picture of how much the branches truly differ. The transformation process from development of customer-specific software to product software is called productization. In order to productize, the differences in the branches must be determined. Each customization needs to be considered in the productization process to avoid reducing the value of the product. We defined a process how to utilize code clone visualizations to explore differences between customer-specific branches. Conclusion of this thesis is that utilization of code clones clearly expedites the productization process. The visualizations aid to locate the differences much faster than manually. Code clone detection is applied to fade out the uninteresting differences between the branches. Hence, the method aids to navigate to the truly interesting customizations that require manual inspection. The method also provides a general view of the cloning situation, which eases the task of estimating the workload. The process is applicable in situations, where the diverged code bases are expected to remind each other structurally, yet contain so many changes that a manual comparison of the branches with file comparison tools would be too time-consuming.Motivaatio diplomityön tekemiselle syntyi kahden tapaustutkimuksen johdosta. Ne kÀsittelevÀt sovelluksia, jotka ovat kÀsitteellisellÀ tasolla tuotteita, mutta eivÀt teknisesti. Niiden lÀhdekoodit sisÀltÀvÀt asiakaskohtaisia haaroja. Kehitysstrategia sovellusten kohdalla on ollut haarauttaa koodipohja asiakaskohtaiseksi koodipohjaksi ja muokata se asiakastoiveiden mukaiseksi. Koodin uusiokÀyttö voi olla tehokas tai jopa tarvittava kehitysstrategia aikataulupaineiden johdosta. Toisteinen koodi voi kuitenkin hankaloittaa sovellusten yllÀpitoa ja tÀten nostaa yllÀpitokustannuksia. Samankaltaisten koodin osien etsimistÀ on tutkittu koodikloonien tutkimusalalla. Koodikloonit ovat koodin osia, jotka ovat joko samoja tai samankaltaisia. Samankaltaisuus voidaan luokitella neljÀÀn tyyppiin. Tyypin I kloonit eroavat vain ulkoasun, tyhjÀtilamerkkien tai kommenttien osalta. Tyypin II kloonit voivat erota myös muuttujien nimien tai tyyppien osalta tai literaalien arvoissa. Tyypin III klooneissa voi olla lisÀttyjÀ, poistettuja tai muuttuneita lauseita vÀlissÀ. Tyypin IV kloonit ovat toiminnaltaan samankaltaisia. Koodikloonien tunnistamiseen ja visualisointiin on erilaisia menetelmiÀ. Eri tekniikat löytÀvÀt eri tyyppisiÀ klooneja. Koodiklooneista voidaan visualisoida sekÀ kokonaiskuva kloonaustilanteesta ettÀ yksityiskohdat lÀhdekooditasolla. Saman tuotteen haarat tapaustutkimuksissamme voidaan ajatella olevan tyypin III klooneja toisistaan. Ne periytyvÀt alun perin samasta koodipohjasta, mutta jokaisessa on lisÀttyjÀ, poistettuja ja muutettuja lauseita toisiaan vastaavien tiedostojen vÀlillÀ. NÀmÀ muutokset halutaan havaita, jotta voimme saada kokonaiskuvan siitÀ, kuinka paljon haarat todellisuudessa eroavat toisistaan. Tutkimuksen kohteena oli tuotteistusprosessi, jossa asiakaskohtaisesti rÀÀtÀlöidyt koodipohjat pyrittiin muuntamaan yhdeksi tuotteeksi. Tavoitteena oli selvittÀÀ kaikkien koodipohjien asiakaskohtaisesti rÀÀtÀlöidyt osat, jotta ne tulisivat huomioitua tuotteistusprosessissa. Jokainen rÀÀtÀlöinti voi olla tuotteen arvoa nostava tekijÀ. Kehitimme prosessin, jonka mukaisesti kloonien visualisointeja voidaan kÀyttÀÀ tuotteistusprosessissa. Tutkimuksessa havaittiin, ettÀ koodikloonien hyödyntÀminen nopeutti selkeÀsti tutkimuskohteiden tuotteistusprosessia. Visualiointien avulla erot löydetÀÀn huomattavasti nopeammin kuin manuaalisesti. Kloonien tunnistusmenetelmiÀ kÀytetÀÀn tÀssÀ yhteydessÀ hÀivyttÀmÀÀn koodipohjasta epÀkiinnostavat erot. TÀten menetelmÀ ohjaa niiden erojen ÀÀrelle, joiden tarkastelu oikeasti vaatii manuaalista tulkintaa. MenetelmÀ antaa myös kokonaiskuvan tilanteesta, mikÀ helpottaa tuotteistamiseen tarvittavien työmÀÀrÀarvioiden tekemistÀ. MenetelmÀ sopii tilanteisiin, jossa toisistaan erkaantuneet koodipohjat muistuttavat vielÀ rakenteeltaan toisiaan, mutta sisÀltÀvÀt niin paljon muutoksia, ettÀ kÀsin tehtÀvÀ koodihaarojen vertailu tiedostojen vertailuun tarkoitetulla työkalulla olisi liian aikaa vievÀÀ

    Enhancing source-based clone detection using intermediate representation

    Get PDF
    Abstract-Detecting software clones in large scale projects helps improve the maintainability of large code bases. The source code representation (e.g., Java or C files) of a software system has traditionally been used for clone detection. In this paper, we propose a technique that transforms the source code to an intermediate representation, and then reuses established source-based clone detection techniques to detect clones in the intermediate representation. The clones are mapped back to the source code and are used to augment the results reported by source-based clone detection. We demonstrate the performance of our new technique using systems from the Bellon clone evaluation benchmark. The result shows that our technique can detect Type 3 clones. Our technique has higher recall with minimal drop in precision using Bellon corpus. By examining the complete clone groups, our technique has higher precision than the standalone string based and token based clone detectors

    An Extended Stable Marriage Problem Algorithm for Clone Detection

    Full text link
    Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
    • 

    corecore