12 research outputs found

    Perinnekoodin refaktorointi ja testattavuus

    Get PDF
    Ohjelmistoalan kehittyessä kasvavalla vauhdilla on entistä yleisempää löytää järjestelmästä perinnekoodia. Tekniikat uusiutuvat nopeasti, ja kiihtyvän kilpailun myötä aikataulut tiukentuvat. Paras tapa välttyä perinnekoodilta on tehdä paljon testejä. Järjestelmien koon kasvaessa ja monimutkaistuessa testaus on entistä olennaisempaa ohjelmistokehityksessä. Jos perinnekoodia on kuitenkin päässyt syntymään, pääsee siitä eroon refaktoroimalla. Tässä työssä pyrin selventämään, mitkä koodin ominaisuudet ovat tärkeitä ohjelmistokehityksessä, jotta koodista tulisi ylläpidettävää ja testattavaa. Tutkin myös, kuinka refaktorointia tulisi suorittaa. Lopuksi sovelsin ominaisuuksia ja refaktorointia telekommunikaatiojärjestelmässä olevaan perinnekoodiin. Perinnejärjestelmän ympäristön ja tekniikoiden takia refaktoroinnin mahdollisuudet kuitenkin ovat rajatut. Kyseisessä ohjelmistossa tärkeintä on pilkkoa koodia pienempiin kokonaisuuksiin ja hyödyntää yksikkötestejä

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    Dealing with clones in software : a practical approach from detection towards management

    Get PDF
    Despite the fact that duplicated fragments of code also called code clones are considered one of the prominent code smells that may exist in software, cloning is widely practiced in industrial development. The larger the system, the more people involved in its development and the more parts developed by different teams result in an increased possibility of having cloned code in the system. While there are particular benefits of code cloning in software development, research shows that it might be a source of various troubles in evolving software. Therefore, investigating and understanding clones in a software system is important to manage the clones efficiently. However, when the system is fairly large, it is challenging to identify and manage those clones properly. Among the various types of clones that may exist in software, research shows detection of near-miss clones where there might be minor to significant differences (e.g., renaming of identifiers and additions/deletions/modifications of statements) among the cloned fragments is costly in terms of time and memory. Thus, there is a great demand of state-of-the-art technologies in dealing with clones in software. Over the years, several tools have been developed to detect and visualize exact and similar clones. However, usually the tools are standalone and do not integrate well with a software developer's workflow. In this thesis, first, a study is presented on the effectiveness of a fingerprint based data similarity measurement technique named 'simhash' in detecting clones in large scale code-base. Based on the positive outcome of the study, a time efficient detection approach is proposed to find exact and near-miss clones in software, especially in large scale software systems. The novel detection approach has been made available as a highly configurable and fully fledged standalone clone detection tool named 'SimCad', which can be configured for detection of clones in both source code and non-source code based data. Second, we show a robust use of the clone detection approach studied earlier by assembling its detection service as a portable library named 'SimLib'. This library can provide tightly coupled (integrated) clone detection functionality to other applications as opposed to loosely coupled service provided by a typical standalone tool. Because of being highly configurable and easily extensible, this library allows the user to customize its clone detection process for detecting clones in data having diverse characteristics. We performed a user study to get some feedback on installation and use of the 'SimLib' API (Application Programming Interface) and to uncover its potential use as a third-party clone detection library. Third, we investigated on what tools and techniques are currently in use to detect and manage clones and understand their evolution. The goal was to find how those tools and techniques can be made available to a developer's own software development platform for convenient identification, tracking and management of clones in the software. Based on that, we developed a clone-aware software development platform named 'SimEclipse' to promote the practical use of code clone research and to provide better support for clone management in software. Finally, we evaluated 'SimEclipse' by conducting a user study on its effectiveness, usability and information management. We believe that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspect of code clone research and manage cloned code in software systems

    Detection and analysis of near-miss clone genealogies

    Get PDF
    It is believed that identical or similar code fragments in source code, also known as code clones, have an impact on software maintenance. A clone genealogy shows how a group of clone fragments evolve with the evolution of the associated software system, and thus may provide important insights on the maintenance implications of those clone fragments. Considering the importance of studying the evolution of code clones, many studies have been conducted on this topic. However, after a decade of active research, there has been a marked lack of progress in understanding the evolution of near-miss software clones, especially where statements have been added, deleted, or modified in the copied fragments. Given that there are a significant amount of near-miss clones in the software systems, we believe that without studying the evolution of near-miss clones, one cannot have a complete picture of the clone evolution. In this thesis, we have advanced the state-of-the-art in the evolution of clone research in the context of both exact and near-miss software clones. First, we performed a large-scale empirical study to extend the existing knowledge about the evolution of exact and renamed clones where identifiers have been modified in the copied fragments. Second, we have developed a framework, gCad that can automatically extract both exact and near-miss clone genealogies across multiple versions of a program and identify their change patterns reasonably fast while maintaining high precision and recall. Third, in order to gain a broader perspective of clone evolution, we extended gCad to calculate various evolutionary metrics, and performed an in-depth empirical study on the evolution of both exact and near-miss clones in six open source software systems of two different programming languages with respect to five research questions. We discovered several interesting evolutionary phenomena of near-miss clones which either contradict with previous findings or are new. Finally, we further improved gCad, and investigated a wide range of attributes and metrics derived from both the clones themselves and their evolution histories to identify certain attributes, which developers often use to remove clones in the real world. We believe that our new insights in the evolution of near-miss clones, and about how developers approach and remove duplication, will play an important role in understanding the maintenance implications of clones and will help design better clone management systems

    On the Stability of Software Clones: A Genealogy-Based Empirical Study

    Get PDF
    Clones are a matter of great concern to the software engineering community because of their dual but contradictory impact on software maintenance. While there is strong empirical evidence of the harmful impact of clones on maintenance, a number of studies have also identified positive sides of code cloning during maintenance. Recently, to help determine if clones are beneficial or not during software maintenance, software researchers have been conducting studies that measure source code stability (the likelihood that code will be modified) of cloned code compared to non-cloned code. If the presence of clones in program artifacts (files, classes, methods, variables) causes the artifacts to be more frequently changed (i.e., cloned code is more unstable than non-cloned code), clones are considered harmful. Unfortunately, existing stability studies have resulted in contradictory results and even now there is no concrete answer to the research question "Is cloned or non-cloned code more stable during software maintenance?" The possible reasons behind the contradictory results of the existing studies are that they were conducted on different sets of subject systems with different experimental setups involving different clone detection tools investigating different stability metrics. Also, there are four major types of clones (Type 1: exact; Type 2: syntactically similar; Type 3: with some added, deleted or modified lines; and, Type 4: semantically similar) and none of these studies compared the instability of different types of clones. Focusing on these issues we perform an empirical study implementing seven methodologies that calculate eight stability-related metrics on the same experimental setup to compare the instability of cloned and non-cloned code in the maintenance phase. We investigated the instability of three major types of clones (Type 1, Type 2, and Type 3) from different dimensions. We excluded Type 4 clones from our investigation, because the existing clone detection tools cannot detect Type 4 clones well. According to our in-depth investigation on hundreds of revisions of 16 subject systems covering four different programming languages (Java, C, C#, and Python) using two clone detection tools (NiCad and CCFinder) we found that clones generally exhibit higher instability in the maintenance phase compared to non-cloned code. Specifically, Type 1 and Type 3 clones are more unstable as well as more harmful compared to Type 2 clones. However, although clones are generally more unstable sometimes they exhibit higher stability than non-cloned code. We further investigated the effect of clones on another important aspect of stability: method co-changeability (the degree methods change together). Intuitively, higher method co-changeability is an indication of higher instability of software systems. We found that clones do not have any negative effect on method co-changeability; rather, cloning can be a possible way of minimizing method co-changeability when clones are likely to evolve independently. Thus, clones have both positive and negative effects on software stability. Our empirical studies demonstrate how we can effectively use the positive sides of clones by minimizing their negative impacts

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort
    corecore