3,271 research outputs found

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Towards Collaborative Scientific Workflow Management System

    Get PDF
    The big data explosion phenomenon has impacted several domains, starting from research areas to divergent of business models in recent years. As this intensive amount of data opens up the possibilities of several interesting knowledge discoveries, over the past few years divergent of research domains have undergone the shift of trend towards analyzing those massive amount data. Scientific Workflow Management System (SWfMS) has gained much popularity in recent years in accelerating those data-intensive analyses, visualization, and discoveries of important information. Data-intensive tasks are often significantly time-consuming and complex in nature and hence SWfMSs are designed to efficiently support the specification, modification, execution, failure handling, and monitoring of the tasks in a scientific workflow. As far as the complexity, dimension, and volume of data are concerned, their effective analysis or management often become challenging for an individual and requires collaboration of multiple scientists instead. Hence, the notion of 'Collaborative SWfMS' was coined - which gained significant interest among researchers in recent years as none of the existing SWfMSs directly support real-time collaboration among scientists. In terms of collaborative SWfMSs, consistency management in the face of conflicting concurrent operations of the collaborators is a major challenge for its highly interconnected document structure among the computational modules - where any minor change in a part of the workflow can highly impact the other part of the collaborative workflow for the datalink relation among them. In addition to the consistency management, studies show several other challenges that need to be addressed towards a successful design of collaborative SWfMSs, such as sub-workflow composition and execution by different sub-groups, relationship between scientific workflows and collaboration models, sub-workflow monitoring, seamless integration and access control of the workflow components among collaborators and so on. In this thesis, we propose a locking scheme to facilitate consistency management in collaborative SWfMSs. The proposed method works by locking workflow components at a granular attribute level in addition to supporting locks on a targeted part of the collaborative workflow. We conducted several experiments to analyze the performance of the proposed method in comparison to related existing methods. Our studies show that the proposed method can reduce the average waiting time of a collaborator by up to 36% while increasing the average workflow update rate by up to 15% in comparison to existing descendent modular level locking techniques for collaborative SWfMSs. We also propose a role-based access control technique for the management of collaborative SWfMSs. We leverage the Collaborative Interactive Application Methodology (CIAM) for the investigation of role-based access control in the context of collaborative SWfMSs. We present our proposed method with a use-case of Plant Phenotyping and Genotyping research domain. Recent study shows that the collaborative SWfMSs often different sets of opportunities and challenges. From our investigations on existing research works towards collaborative SWfMSs and findings of our prior two studies, we propose an architecture of collaborative SWfMSs. We propose - SciWorCS - a Collaborative Scientific Workflow Management System as a proof of concept of the proposed architecture; which is the first of its kind to the best of our knowledge. We present several real-world use-cases of scientific workflows using SciWorCS. Finally, we conduct several user studies using SciWorCS comprising different real-world scientific workflows (i.e., from myExperiment) to understand the user behavior and styles of work in the context of collaborative SWfMSs. In addition to evaluating SciWorCS, the user studies reveal several interesting facts which can significantly contribute in the research domain, as none of the existing methods considered such empirical studies, and rather relied only on computer generated simulated studies for evaluation

    Why did you clone these identifiers? Using Grounded Theory to understand Identifier Clones

    Get PDF
    Developers spend most of their time comprehending source code, with some studies estimating this activity takes between 58% to 70% of a developer’s time. To improve the readability of source code, and therefore the productivity of developers, it is important to understand what aspects of static code analysis and syntactic code structure hinder the understandability of code. Identifiers are a main source of code comprehension due to their large volume and their role as implicit documentation of a developer’s intent when writing code. Despite the critical role that identifiers play during program comprehension, there are no regulated naming standards for developers to follow when picking identifier names. Our research supports previous work aimed at understanding what makes a good identifier name, and practices to follow when picking names by exploring a phenomenon that occurs during identifier naming: identifier clones. Identifier clones are two or more identifiers that are declared using the same name. This is an important yet unexplored phenomenon in identifier naming where developers intentionally give the same name to two or more identifiers in separate parts of a system. We must study identifier clones to understand it’s impact on program comprehension and to better understand the nature of identifier naming. To accomplish this, we conducted an empirical study on identifier clones detected in open-source software engineered systems and propose a taxonomy of identifier clones containing categories that can explain why they are introduced into systems and whether they represent naming antipatterns

    Who Made This Copy? An Empirical Analysis of Code Clone Authorship

    Full text link
    Code clones are code snippets that are identical or similar to other snippets within the same or different files. They are often created through copy-and-paste practices during development and maintenance activities. Since code clones may require consistent updates and coherent management, they present a challenging issue in software maintenance. Therefore, many studies have been conducted to find various types of clones with accuracy, scalability, or performance. However, the exploration of the nature of code clones has been limited. Even the fundamental question of whether code snippets in the same clone set were written by the same author or different authors has not been thoroughly investigated. In this paper, we investigate the characteristics of code clones with a focus on authorship. We analyzed the authorship of code clones at the line-level granularity for Java files in 153 Apache projects stored on GitHub and addressed three research questions. Based on these research questions, we found that there are a substantial number of clone lines across all projects (an average of 18.5\% for all projects). Furthermore, authors who contribute to many non-clone lines also contribute to many clone lines. Additionally, we found that one-third of clone sets are primarily contributed to by multiple leading authors. These results confirm our intuitive understanding of clone characteristics, although no previous publications have provided empirical validation data from multiple projects. As the results could assist in designing better clone management techniques, we will explore the implications of developing an effective clone management tool.Comment: An extended version of the 17th International Workshop on Software Clones IWSC 2023 in Bogota, Colombi

    Animating the evolution of software

    Get PDF
    The use and development of open source software has increased significantly in the last decade. The high frequency of changes and releases across a distributed environment requires good project management tools in order to control the process adequately. However, even with these tools in place, the nature of the development and the fact that developers will often work on many other projects simultaneously, means that the developers are unlikely to have a clear picture of the current state of the project at any time. Furthermore, the poor documentation associated with many projects has a detrimental effect when encouraging new developers to contribute to the software. A typical version control repository contains a mine of information that is not always obvious and not easy to comprehend in its raw form. However, presenting this historical data in a suitable format by using software visualisation techniques allows the evolution of the software over a number of releases to be shown. This allows the changes that have been made to the software to be identified clearly, thus ensuring that the effect of those changes will also be emphasised. This then enables both managers and developers to gain a more detailed view of the current state of the project. The visualisation of evolving software introduces a number of new issues. This thesis investigates some of these issues in detail, and recommends a number of solutions in order to alleviate the problems that may otherwise arise. The solutions are then demonstrated in the definition of two new visualisations. These use historical data contained within version control repositories to show the evolution of the software at a number of levels of granularity. Additionally, animation is used as an integral part of both visualisations - not only to show the evolution by representing the progression of time, but also to highlight the changes that have occurred. Previously, the use of animation within software visualisation has been primarily restricted to small-scale, hand generated visualisations. However, this thesis shows the viability of using animation within software visualisation with automated visualisations on a large scale. In addition, evaluation of the visualisations has shown that they are suitable for showing the changes that have occurred in the software over a period of time, and subsequently how the software has evolved. These visualisations are therefore suitable for use by developers and managers involved with open source software. In addition, they also provide a basis for future research in evolutionary visualisations, software evolution and open source development

    Large Language Models for Software Engineering: A Systematic Literature Review

    Full text link
    Large Language Models (LLMs) have significantly impacted numerous domains, notably including Software Engineering (SE). Nevertheless, a well-rounded understanding of the application, effects, and possible limitations of LLMs within SE is still in its early stages. To bridge this gap, our systematic literature review takes a deep dive into the intersection of LLMs and SE, with a particular focus on understanding how LLMs can be exploited in SE to optimize processes and outcomes. Through a comprehensive review approach, we collect and analyze a total of 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize and provide a comparative analysis of different LLMs that have been employed in SE tasks, laying out their distinctive features and uses. For RQ2, we detail the methods involved in data collection, preprocessing, and application in this realm, shedding light on the critical role of robust, well-curated datasets for successful LLM implementation. RQ3 allows us to examine the specific SE tasks where LLMs have shown remarkable success, illuminating their practical contributions to the field. Finally, RQ4 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE, as well as the common techniques related to prompt optimization. Armed with insights drawn from addressing the aforementioned RQs, we sketch a picture of the current state-of-the-art, pinpointing trends, identifying gaps in existing research, and flagging promising areas for future study

    Software Engineering 2021 : Fachtagung vom 22.-26. Februar 2021 Braunschweig/virtuell

    Get PDF

    On the Stability of Software Clones: A Genealogy-Based Empirical Study

    Get PDF
    Clones are a matter of great concern to the software engineering community because of their dual but contradictory impact on software maintenance. While there is strong empirical evidence of the harmful impact of clones on maintenance, a number of studies have also identified positive sides of code cloning during maintenance. Recently, to help determine if clones are beneficial or not during software maintenance, software researchers have been conducting studies that measure source code stability (the likelihood that code will be modified) of cloned code compared to non-cloned code. If the presence of clones in program artifacts (files, classes, methods, variables) causes the artifacts to be more frequently changed (i.e., cloned code is more unstable than non-cloned code), clones are considered harmful. Unfortunately, existing stability studies have resulted in contradictory results and even now there is no concrete answer to the research question "Is cloned or non-cloned code more stable during software maintenance?" The possible reasons behind the contradictory results of the existing studies are that they were conducted on different sets of subject systems with different experimental setups involving different clone detection tools investigating different stability metrics. Also, there are four major types of clones (Type 1: exact; Type 2: syntactically similar; Type 3: with some added, deleted or modified lines; and, Type 4: semantically similar) and none of these studies compared the instability of different types of clones. Focusing on these issues we perform an empirical study implementing seven methodologies that calculate eight stability-related metrics on the same experimental setup to compare the instability of cloned and non-cloned code in the maintenance phase. We investigated the instability of three major types of clones (Type 1, Type 2, and Type 3) from different dimensions. We excluded Type 4 clones from our investigation, because the existing clone detection tools cannot detect Type 4 clones well. According to our in-depth investigation on hundreds of revisions of 16 subject systems covering four different programming languages (Java, C, C#, and Python) using two clone detection tools (NiCad and CCFinder) we found that clones generally exhibit higher instability in the maintenance phase compared to non-cloned code. Specifically, Type 1 and Type 3 clones are more unstable as well as more harmful compared to Type 2 clones. However, although clones are generally more unstable sometimes they exhibit higher stability than non-cloned code. We further investigated the effect of clones on another important aspect of stability: method co-changeability (the degree methods change together). Intuitively, higher method co-changeability is an indication of higher instability of software systems. We found that clones do not have any negative effect on method co-changeability; rather, cloning can be a possible way of minimizing method co-changeability when clones are likely to evolve independently. Thus, clones have both positive and negative effects on software stability. Our empirical studies demonstrate how we can effectively use the positive sides of clones by minimizing their negative impacts
    • …
    corecore