32 research outputs found

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision

    Automated Improvement of Software Design by Search-Based Refactoring

    Get PDF
    Le coût de maintenance du logiciel est estimé à plus de 70% du coût total du système, en raison de nombreux facteurs, y compris les besoins des nouveaux utilisateurs, l’adoption de nouvelles technologies et la qualité des systèmes logiciels. De ces facteurs, la qualité est celle que nous pouvons contrôler et continuellement améliorer pour empêcher la dégradation de la performance et la réduction de l’efficacité (par exemple, la dégradation de la conception du logiciel). De plus, pour rester compétitive, l’industrie du logiciel a raccourci ses cycles de lancement afin de fournir de nouveaux produits et fonctionnalités plus rapidement, ce qui entraîne une pression accrue sur les équipes de développeurs et une accélération de l’évolution de la conception du système. Une façon d’empêcher la dégradation du logiciel est l’identification et la correction des anti-patrons qui sont des indicateurs de mauvaise qualité de conception. Pour améliorer la qualité de la conception et supprimer les anti-patrons, les développeurs effectuent de petites transformations préservant le comportement (c.-à-d., refactoring). Le refactoring manuel est coûteux, car il nécessite (1) d’identifier les entités de code qui doivent être refactorisées ; (2) générer des opérations de refactoring pour les classes identifiées à l’étape précédente ; (3) trouver le bon ordre d’application des refactorings générés, pour maximiser le bénéfice pour la qualité du code et minimiser les conflits. Ainsi, les chercheurs et les praticiens ont formulé le refactoring comme un problème d’optimisation et utilisent des techniques basées sur la recherche pour proposer des approches (semi) automatisées pour le résoudre. Dans cette thèse, je propose plusieurs méthodes pour résoudre les principaux problèmes des outils existants, afin d’aider les développeurs dans leurs activités de maintenance et d’assurance qualité. Ma thèse est qu’il est possible d’améliorer le refactoring automatisé en considérant de nouvelles dimensions : (1) le contexte de tâche du développeur pour prioriser le refactoring des classes pertinentes ; (2) l’effort du test pour réduire le coût des tests après le refactoring ; (3) l’identification de conflit entre opérations de refactoring afin de réduire le coût de refactoring ; et (4) l’efficacité énergétique pour améliorer la consommation d’énergie des applications mobiles après refactoring.----------ABSTRACT: Software maintenance cost is estimated to be more than 70% of the total cost of system, because of many factors, including new user’s requirements, the adoption of new technologies and the quality of software systems. From these factors, quality is the one that we can control and continually improved to prevent degradation of performance and reduction of effectiveness (a.k.a. design decay). Moreover, to stay competitive, the software industry has shortened its release cycles to deliver new products and features faster, which results in more pressure on developer teams and the acceleration of system’s design evolution. One way to prevent design decay is the identification and correction of anti-patterns which are indicators of poor design quality. To improve design quality and remove anti-patterns, developers perform small behavior-preserving transformations (a.k.a. refactoring). Manual refactoring is expensive, as it requires to (1) identify the code entities that need to be refactored; (2) generate refactoring operations for classes identified in the previous step; (3) find the correct order of application of the refactorings generated, to maximize the quality effect and to minimize conflicts. Hence, researchers and practitioners have formulated refactoring as an optimization problem and use search-based techniques to propose (semi)automated approaches to solve it. In this dissertation, we propose several approaches to tackle some of the major issues in existing refactoring tools, to assist developers in their maintenance and quality assurance activities

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    How Does Refactoring Impact Security When Improving Quality? A Security Aware Refactoring

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155871/1/RefactoringSecurityQMOOD__ICSE____Copy_.pd

    A Refactoring Documentation Bot

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153325/1/TSE_DocumentationBot__Copy_deep_blue.pd

    Detecting Semantic Method Clones In Java Code Using Method Ioe-behavior

    Get PDF
    The determination of semantic equivalence is an undecidable problem; however, this dissertation shows that a reasonable approximation can be obtained using a combination of static and dynamic analysis. This study investigates the detection of functional duplicates, referred to as semantic method clones (SMCs), in Java code. My algorithm extends the input-output notion of observable behavior, used in related work [1, 2], to include the effects of the method. The latter property refers to the persistent changes to the heap, brought about by the execution of the method. To differentiate this from the typical input-output behavior used by other researchers, I have coined the term method IOE-Behavior; which means its input-output and effects behavior [3]. Two methods are defined as semantic method clones, if they have identical IOE-Behavior; that is, for the same inputs (actual parameters and initial heap state), they produce the same output (that is result- for non-void methods, an final heap state). The detection process consists of two static pre-filters used to identify candidate clone sets. This is followed by dynamic tests that actually run the candidate methods, to determine semantic equivalence. The first filter groups the methods by type. The second filter refines the output of the first, grouping methods by their effects. This algorithm is implemented in my tool JSCTracker, used to automate the SMC detection process. The algorithm and tool are validated using a case study comprising of 12 open source Java projects, from different application domains and ranging in size from 2 KLOC (thousand lines of code) to 300 KLOC. The objectives of the case study are posed as 4 research questions: 1. Can method IOE-Behavior be used in SMC detection? 2. What is the impact of the use of the pre-filters on the efficiency of the algorithm? 3. How does the performance of method IOE-Behavior compare to using only inputoutput for identifying SMCs? 4. How reliable are the results obtained when method IOE-Behavior is used in SMC detection? Responses to these questions are obtained by checking each software sample with JSCTracker and analyzing the results. The number of SMCs detected range from 0-45 with an average execution time of 8.5 seconds. The use of the two pre-filters reduces the number of methods that reach the dynamic test phase, by an average of 34%. The IOE-Behavior approach takes an average of 0.010 seconds per method while the input-output approach takes an average of 0.015 seconds. The former also identifies an average of 32% false positives, while the SMCs identified using input-output, have an average of 92% false positives. In terms of reliability, the IOE-Behavior method produces results with precision values of an average of 68% and recall value of 76% on average. These reliability values represent an improvement of over 37% (for precision) and 30% (for recall) of the values in related work [4, 5]. Thus, it is my conclusion that IOE-Behavior can be used to detect SMCs in Java code with reasonable reliability

    SCALABLE DETECTION OF SOFTWARE REFACTORING

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore