6 research outputs found

    Toward an Understanding of Software Code Cloning as a Development Practice

    Get PDF
    Code cloning is the practice of duplicating existing source code for use elsewhere within a software system. Within the research community, conventional wisdom has asserted that code cloning is generally a bad practice, and that code clones should be removed or refactored where possible. While there is significant anecdotal evidence that code cloning can lead to a variety of maintenance headaches --- such as code bloat, duplication of bugs, and inconsistent bug fixing --- there has been little empirical study on the frequency, severity, and costs of code cloning with respect to software maintenance. This dissertation seeks to improve our understanding of code cloning as a common development practice through the study of several widely adopted, medium-sized open source software systems. We have explored the motivations behind the use of code cloning as a development practice by addressing several fundamental questions: For what reasons do developers choose to clone code? Are there distinct identifiable patterns of cloning? What are the possible short- and long-term term risks of cloning? What management strategies are appropriate for the maintenance and evolution of clones? When is the ``cure'' (refactoring) likely to cause more harm than the ``disease'' (cloning)? There are three major research contributions of this dissertation. First, we propose a set of requirements for an effective clone analysis tool based on our experiences in clone analysis of large software systems. These requirements are demonstrated in an example implementation which we used to perform the case studies prior to and included in this thesis. Second, we present an annotated catalogue of common code cloning patterns that we observed in our studies. Third, we present an empirical study of the relative frequencies and likely harmfulness of instances of these cloning patterns as observed in two medium-sized open source software systems, the Apache web server and the Gnumeric spreadsheet application. In summary, it appears that code cloning is often used as a principled engineering technique for a variety of reasons, and that as many as 71% of the clones in our study could be considered to have a positive impact on the maintainability of the software system. These results suggest that the conventional wisdom that code clones are generally harmful to the quality of a software system has been proven wrong

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Semantic Web Enabled Software Engineering

    Get PDF
    Ontologies allow the capture and sharing of domain knowledge by formalizing information and making it machine understandable. As part of an information system, ontologies can capture and carry the reasoning knowledge needed to fulfill different application goals. Although many ontologies have been developed over recent years, few include such reasoning information. As a result, many ontologies are not used in real-life applications, do not get reused or only act as a taxonomy of a domain. This work is an investigation into the practical use of ontologies as a driving factor in the development of applications and the incorporation of Knowledge Engineering as a meaningful activity into modern agile software development. This thesis contributes a novel methodology that supports an incremental requirement analysis and an iterative formalization of ontology design through the use of ontology reasoning patterns. It also provides an application model for ontology-driven applications that can deal with nonontological data sources. A set of case studies with various application specific goals helps to elucidate whether ontologies are in fact suitable for more than simple knowledge formalization and sharing, and can act as the underlying structure for developing largescale information systems. Tasks from the area of bug-tracker quality mining and clone detection are evaluated for this purpose

    Applying Model-Driven Engineering to Development Scenarios for Web Content Management System Extensions

    Get PDF
    Web content management systems (WCMSs) such as WordPress, Joomla or Drupal have established themselves as popular platforms for instantiating dynamic web applications. Using a WCMS instance allows developers to add additional functionality by implementing installable extension packages. However, extension developers are challenged by dealing with boilerplate code, dependencies between extensions and frequent architectural changes to the underlying WCMS platform. These challenges occur in frequent development scenarios that include initial development and maintenance of extensions as well as migration of existing extension code to new platforms. A promising approach to overcome these challenges is represented by model-driven engineering (MDE). Adopting MDE as development practice, allows developers to define software features within reusable models which abstract the technical knowledge of the targeted system. Using these models as input for platform-specific code generators enables a rapid transformation to standardized software of high quality. However, MDE has not found adoption during extension development in the WCMS domain, due to missing tool support. The results of empirical studies in different domains demonstrate the benefits of MDE. However, empirical evidence of these benefits in the WCMS domain is currently lacking. In this work, we present the concepts and design of an MDE infrastructure for the development and maintenance of WCMS extensions. This infrastructure provides a domain-specific modelling language (DSL) for WCMS extensions, as well as corresponding model editors. In addition, the MDE infrastructure facilitates a set of transformation tools to apply forward and reverse engineering steps. This includes a code generator that uses model instances of the introduced DSL, an extension extractor for code extraction of already deployed WCMS extensions, and a model extraction tool for the creation of model instances based on an existing extension package. To ensure adequacy of the provided MDE infrastructure, we follow a structured research methodology. First, we investigate the representativeness of common development scenarios by conducting interviews with industrial practitioners from the WCMS domain. Second, we propose a general solution concept for these scenarios including involved roles, process steps, and MDE infrastructure facilities. Third, we specify functional and non-functional requirements for an adequate MDE infrastructure, including the expectations of domain experts. To show the applicability of these concepts, we introduce JooMDD as infrastructure instantiation for the Joomla WCMS which provides the most sophisticated extension mechanism in the domain. To gather empirical evidence of the positive impact of MDE during WCMS extension development, we present a mixed-methods empirical investigation with extension developers from the Joomla community. First, we share the method, results and conclusions of a controlled experiment conducted with extension developers from academia and industry. The experiment compares conventional extension development with MDE using the JooMDD infrastructure, focusing on the development of dependent and independent extensions. The results show a clear gain in productivity and quality by using the JooMDD infrastructure. Second, we share the design and observations of a semi-controlled tutorial with four experienced developers who had to apply the JooMDD infrastructure during three scenarios of developing new (both independent and dependent) extensions and of migrating existing ones to a new major platform version. The aim of this study was to obtain direct qualitative feedback about acceptance, usefulness, and open challenges of our MDE approach. Finally, we share lessons learned and discuss the threats to validity of the conducted studies

    Management Aspects of Software Clone Detection and Analysis

    Get PDF
    Copying a code fragment and reusing it by pasting with or without minor modifications is a common practice in software development for improved productivity. As a result, software systems often have similar segments of code, called software clones or code clones. Due to many reasons, unintentional clones may also appear in the source code without awareness of the developer. Studies report that significant fractions (5% to 50%) of the code in typical software systems are cloned. Although code cloning may increase initial productivity, it may cause fault propagation, inflate the code base and increase maintenance overhead. Thus, it is believed that code clones should be identified and carefully managed. This Ph.D. thesis contributes in clone management with techniques realized into tools and large-scale in-depth analyses of clones to inform clone management in devising effective techniques and strategies. To support proactive clone management, we have developed a clone detector as a plug-in to the Eclipse IDE. For clone detection, we used a hybrid approach that combines the strength of both parser-based and text-based techniques. To capture clones that are similar but not exact duplicates, we adopted a novel approach that applies a suffix-tree-based k-difference hybrid algorithm, borrowed from the area of computational biology. Instead of targeting all clones from the entire code base, our tool aids clone-aware development by allowing focused search for clones of any code fragment of the developer's interest. A good understanding on the code cloning phenomenon is a prerequisite to devise efficient clone management strategies. The second phase of the thesis includes large-scale empirical studies on the characteristics (e.g., proportion, types of similarity, change patterns) of code clones in evolving software systems. Applying statistical techniques, we also made fairly accurate forecast on the proportion of code clones in the future versions of software projects. The outcome of these studies expose useful insights into the characteristics of evolving clones and their management implications. Upon identification of the code clones, their management often necessitates careful refactoring, which is dealt with at the third phase of the thesis. Given a large number of clones, it is difficult to optimally decide what to refactor and what not, especially when there are dependencies among clones and the objective remains the minimization of refactoring efforts and risks while maximizing benefits. In this regard, we developed a novel clone refactoring scheduler that applies a constraint programming approach. We also introduced a novel effort model for the estimation of efforts needed to refactor clones in source code. We evaluated our clone detector, scheduler and effort model through comparative empirical studies and user studies. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a versatile clone management system that we envision
    corecore