1,221 research outputs found

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Active Clones: Source Code Clones at Runtime

    Get PDF
    Code cloning is a common programming practice, and there have been aconsiderable amount of research that investigated the implications of code clones onsoftware maintenance using static analysis. However, little has been done to investigatethe runtime implications of code cloning. In this paper we investigate sourcecode clones at runtime, referring to clones as ‘active clones’ if they are invokedwhen a software system is in use. For example, if a particular use u of a systemresults in a clone c being invoked, we say that clone c is active with respect to useu. From this definition and given a set of uses fu1;u2; :::g and clones fc1;c2; :::gwe are able to identify the extent clones are active at runtime and analyze activeclone resource use (e.g., CPU time) and define and calculate a set of active clonemetrics to provide insights into source code cloning implications at runtime. We developeda hybrid static and dynamic analysis technique for detecting and analysingactive clones, and conducted an empirical study on five software systems (HSQLDB,JHotDraw, RText, jEdit and UniCentaoPOS) to validate our approach. We found asmall portion of clones are active during a typical use of a software system, and thatactive clones have the potential for guiding a software developer’s code inspectionactivity during a software maintenance task

    Change Impact Analysis of Code Clones

    Get PDF
    Copying a code fragment and reusing it with or without modifications is known to be a frequent activity in software development. This results in exact or closely similar copies of code fragments, known as code clones, to exist in the software systems. Developers leverage the code reuse opportunity by code cloning for increased productivity. However, different studies on code clones report important concerns regarding the impacts of clones on software maintenance. One of the key concerns is to maintain consistent evolution of the clone fragments as inconsistent changes to clones may introduce bugs. Challenges to the consistent evolution of clones involve the identification of all related clone fragments for change propagation when a cloned fragment is changed. The task of identifying the ripple effects (i.e., all the related components to change) is known as Change Impact Analysis (CIA). In this thesis, we evaluate the impacts of clones on software systems from new perspectives and then we propose an evolutionary coupling based technique for change impact analysis of clones. First, we empirically evaluate the comparative stability of cloned and non-cloned code using fine-grained syntactic change types. Second, we assess the impacts of clones from the perspective of coupling at the domain level. Third, we carry out a comprehensive analysis of the comparative stability of cloned and non-cloned code within a uniform framework. We compare stability metrics with the results from the original experimental settings with respect to the clone detection tools and the subject systems. Fourth, we investigate the relationships between stability and bug-proneness of clones to assess whether and how stability contribute to the bug-proneness of different types of clones. Next, in the fifth study, we analyzed the impacts of co-change coupling on the bug-proneness of different types of clones. After a comprehensive evaluation of the impacts of clones on software systems, we propose an evolutionary coupling based CIA approach to support the consistent evolution of clones. In the sixth study, we propose a solution to minimize the effects of atypical commits (extra large commits) on the accuracy of the detection of evolutionary coupling. We propose a clustering-based technique to split atypical commits into pseudo-commits of related entities. This considerably reduces the number of incorrect couplings introduced by the atypical commits. Finally, in the seventh study, we propose an evolutionary coupling based change impact analysis approach for clones. In addition to handling the atypical commits, we use the history of fine-grained syntactic changes extracted from the software repositories to detect typed evolutionary coupling of clones. Conventional approaches consider only the frequency of co-change of the entities to detect evolutionary coupling. We consider both change frequencies and the fine-grained change types in the detection of evolutionary coupling. Findings from our studies give important insights regarding the impacts of clones and our proposed typed evolutionary coupling based CIA approach has the potential to support the consistent evolution of clones for better clone management

    Supporting Source Code Feature Analysis Using Execution Trace Mining

    Get PDF
    Software maintenance is a significant phase of a software life-cycle. Once a system is developed the main focus shifts to maintenance to keep the system up to date. A system may be changed for various reasons such as fulfilling customer requirements, fixing bugs or optimizing existing code. Code needs to be studied and understood before any modification is done to it. Understanding code is a time intensive and often complicated part of software maintenance that is supported by documentation and various tools such as profilers, debuggers and source code analysis techniques. However, most of the tools fail to assist in locating the portions of the code that implement the functionality the software developer is focusing. Mining execution traces can help developers identify parts of the source code specific to the functionality of interest and at the same time help them understand the behaviour of the code. We propose a use-driven hybrid framework of static and dynamic analyses to mine and manage execution traces to support software developers in understanding how the system's functionality is implemented through feature analysis. We express a system's use as a set of tests. In our approach, we develop a set of uses that represents how a system is used or how a user uses some specific functionality. Each use set describes a user's interaction with the system. To manage large and complex traces we organize them by system use and segment them by user interface events. The segmented traces are also clustered based on internal and external method types. The clusters are further categorized into groups based on application programming interfaces and active clones. To further support comprehension we propose a taxonomy of metrics which are used to quantify the trace. To validate the framework we built a tool called TrAM that implements trace mining and provides visualization features. It can quantify the trace method information, mine similar code fragments called active clones, cluster methods based on types, categorise them based on groups and quantify their behavioural aspects using a set of metrics. The tool also lets the users visualize the design and implementation of a system using images, filtering, grouping, event and system use, and present them with values calculated using trace, group, clone and method metrics. We also conducted a case study on five different subject systems using the tool to determine the dynamic properties of the source code clones at runtime and answer three research questions using our findings. We compared our tool with trace mining tools and profilers in terms of features, and scenarios. Finally, we evaluated TrAM by conducting a user study on its effectiveness, usability and information management

    Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

    Full text link
    The costly human effort required to prepare the training data of machine learning (ML) models hinders their practical development and usage in software engineering (ML4Code), especially for those with limited budgets. Therefore, efficiently training models of code with less human effort has become an emergent problem. Active learning is such a technique to address this issue that allows developers to train a model with reduced data while producing models with desired performance, which has been well studied in computer vision and natural language processing domains. Unfortunately, there is no such work that explores the effectiveness of active learning for code models. In this paper, we bridge this gap by building the first benchmark to study this critical problem - active code learning. Specifically, we collect 11 acquisition functions~(which are used for data selection in active learning) from existing works and adapt them for code-related tasks. Then, we conduct an empirical study to check whether these acquisition functions maintain performance for code data. The results demonstrate that feature selection highly affects active learning and using output vectors to select data is the best choice. For the code summarization task, active code learning is ineffective which produces models with over a 29.64\% gap compared to the expected performance. Furthermore, we explore future directions of active code learning with an exploratory study. We propose to replace distance calculation methods with evaluation metrics and find a correlation between these evaluation-based distance methods and the performance of code models.Comment: 12 pages, ongoing wor

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

    Code Smells and Refactoring: A Tertiary Systematic Review of Challenges and Observations

    Full text link
    In this paper, we present a tertiary systematic literature review of previous surveys, secondary systematic literature reviews, and systematic mappings. We identify the main observations (what we know) and challenges (what we do not know) on code smells and refactoring. We show that code smells and refactoring have a strong relationship with quality attributes, i.e., with understandability, maintainability, testability, complexity, functionality, and reusability. We argue that code smells and refactoring could be considered as the two faces of a same coin. Besides, we identify how refactoring affects quality attributes, more than code smells. We also discuss the implications of this work for practitioners, researchers, and instructors. We identify 13 open issues that could guide future research work. Thus, we want to highlight the gap between code smells and refactoring in the current state of software-engineering research. We wish that this work could help the software-engineering research community in collaborating on future work on code smells and refactoring

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision
    • …
    corecore