93 research outputs found

    Change Impact Analysis of Code Clones

    Get PDF
    Copying a code fragment and reusing it with or without modifications is known to be a frequent activity in software development. This results in exact or closely similar copies of code fragments, known as code clones, to exist in the software systems. Developers leverage the code reuse opportunity by code cloning for increased productivity. However, different studies on code clones report important concerns regarding the impacts of clones on software maintenance. One of the key concerns is to maintain consistent evolution of the clone fragments as inconsistent changes to clones may introduce bugs. Challenges to the consistent evolution of clones involve the identification of all related clone fragments for change propagation when a cloned fragment is changed. The task of identifying the ripple effects (i.e., all the related components to change) is known as Change Impact Analysis (CIA). In this thesis, we evaluate the impacts of clones on software systems from new perspectives and then we propose an evolutionary coupling based technique for change impact analysis of clones. First, we empirically evaluate the comparative stability of cloned and non-cloned code using fine-grained syntactic change types. Second, we assess the impacts of clones from the perspective of coupling at the domain level. Third, we carry out a comprehensive analysis of the comparative stability of cloned and non-cloned code within a uniform framework. We compare stability metrics with the results from the original experimental settings with respect to the clone detection tools and the subject systems. Fourth, we investigate the relationships between stability and bug-proneness of clones to assess whether and how stability contribute to the bug-proneness of different types of clones. Next, in the fifth study, we analyzed the impacts of co-change coupling on the bug-proneness of different types of clones. After a comprehensive evaluation of the impacts of clones on software systems, we propose an evolutionary coupling based CIA approach to support the consistent evolution of clones. In the sixth study, we propose a solution to minimize the effects of atypical commits (extra large commits) on the accuracy of the detection of evolutionary coupling. We propose a clustering-based technique to split atypical commits into pseudo-commits of related entities. This considerably reduces the number of incorrect couplings introduced by the atypical commits. Finally, in the seventh study, we propose an evolutionary coupling based change impact analysis approach for clones. In addition to handling the atypical commits, we use the history of fine-grained syntactic changes extracted from the software repositories to detect typed evolutionary coupling of clones. Conventional approaches consider only the frequency of co-change of the entities to detect evolutionary coupling. We consider both change frequencies and the fine-grained change types in the detection of evolutionary coupling. Findings from our studies give important insights regarding the impacts of clones and our proposed typed evolutionary coupling based CIA approach has the potential to support the consistent evolution of clones for better clone management

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Visualization and analysis of software clones

    Get PDF
    Code clones are identical or similar fragments of code in a software system. Simple copy-paste programming practices of developers, reusing existing code fragments instead of implementing from the scratch, limitations of both programming languages and developers are the primary reasons behind code cloning. Despite the maintenance implications of clones, it is not possible to conclude that cloning is harmful because there are also benefits in using them (e.g. faster and independent development). As a result, researchers at least agree that clones need to be analyzed before aggressively refactoring them. Although a large number of state-of-the-art clone detectors are available today, handling raw clone data is challenging due to the textual nature and large volume. To address this issue, we propose a framework for large-scale clone analysis and develop a maintenance support environment based on the framework called VisCad. To manage the large volume of clone data, VisCad employs the Visual Information Seeking Mantra: overview first, zoom and filter, then provide details-on-demand. With VisCad users can analyze and identify distinctive code clones through a set of visualization techniques, metrics covering different clone relations and data filtering operations. The loosely coupled architecture of VisCad allows users to work with any clone detection tool that reports source-coordinates of the found clones. This yields the opportunity to work with the clone detectors of choice, which is important because each clone detector has its own strengths and weaknesses. In addition, we extend the support for clone evolution analysis, which is important to understand the cause and effect of changes at the clone level during the evolution of a software system. Such information can be used to make software maintenance decisions like when to refactor clones. We propose and implement a set of visualizations that can allow users to analyze the evolution of clones from a coarse grain to a fine grain level. Finally, we use VisCad to extract both spatial and temporal clone data to predict changes to clones in a future release/revision of the software, which can be used to rank clone classes as another means of handling a large volume of clone data. We believe that VisCad makes clone comprehension easier and it can be used as a test-bed to further explore code cloning, necessary in building a successful clone management system

    Supporting Serendipity through Interactive Recommender Systems in Higher Education

    Get PDF
    Serendipiteetin käsite viittaa onnekkaisiin sattumuksiin, jossa hyödyllistä tietoa tai muita arvokkaita asioita löydetään yllättäen. Suosittelujärjestelmien tutkimuksessa serendipiteetistä on tullut keskeinen kokemuksellinen tavoite. Ihmisen ja tietokoneen vuorovaikutuksen kannalta olennainen kysymys siitä, kuinka käyttöliittymäsuunnittelu suosittelujärjestelmissä voisi tukea serendipiteetin kokemusta, on kuitenkin saanut vain vähän huomiota. Tässä työssä tutkitaan, kuinka suosittelijajärjestelmän mahdollistamaa serendipiteetin kokemusta voidaan soveltaa tutkimusartikkelien suositteluihin korkeakouluopetuksen kontekstissa. Erityisesti työ tarkastelee suositusjärjestelmäsovellusten käyttöä kehittyvissä maissa, sillä suurin osa kehittyvissä maissa tehdyistä tutkimuksista on keskittynyt pelkästään järjestelmien toteutukseen. Tässä väitöskirjassa kuvataan suosittelujärjestelmien käyttöliittymien suunnittelua ja kehittämistä, tavoitteena ymmärtää paremmin serendipiteetin kokemuksen tukemista käyttöliittymäratkaisuilla. Tutkimalla näitä järjestelmiä kehittyvässä maassa (Pakistan), tämä väitöskirja asettaa suosittelujärjestelmien käytön vastakkain aikaisempien teollisuusmaissa tehtyjen tutkimusten kanssa, ja siten mahdollistaa suositusjärjestelmien soveltamiseen liittyvien kontekstuaalisten ja kulttuuristen haasteiden tarkastelua. Väitöskirja koostuu viidestä empiirisestä käyttäjätutkimuksesta ja kirjallisuuskatsausartikkelista, ja työ tarjoaa uusia käyttöliittymäideoita, avoimen lähdekoodin ohjelmistoratkaisuja sekä empiirisiä analyyseja suositusjärjestelmiin liittyvistä käyttäjäkokemuksista pakistanilaisessa korkeakoulussa. Onnekkaita löytöjä tarkastellaan liittyen tutkimusartikkelien löytämiseen suositusjärjestelmän avulla. Väitöstyö kattaa sekä konstruktiivista että kokeellista tutkimusta. Väitöskirjan artikkelit esittelevät alkuperäistä tutkimusta, jossa kokeillaan erilaisia käyttöliittymämalleja, pohditaan sidosryhmien vaatimuksia, arvioidaan käyttäjien kokemuksia suositelluista artikkeleista ja esitellään tutkimusta suositusjärjestelmien tehtäväkuormitusanalyysistä.Serendipity is defined as the surprising discovery of useful information or other valuable things. In recommender systems research, serendipity has become an essential experiential goal. However, relevant to Human-Computer Interaction, the question of how the user interfaces of recommender systems could facilitate serendipity has received little attention. This work investigates how recommender system-facilitated serendipity can be applied to research article recommendation processes in the context of higher education. In particular, this work investigates the use of recommender system applications in developing countries as most studies in developing countries have focused solely on implementation, rather than user experiences. This dissertation describes the design and development of several user interfaces for recommender systems in an attempt to improve our understanding of serendipity facilitation with the help of user interfaces. By studying these systems in a developing country, this dissertation contrasts the study of recommender systems in developed countries, examining the contextual and cultural challenges associated with the application of recommender systems. This dissertation consists of five empirical user studies and a literature review article, contributing novel user interface designs, open-source software, and empirical analyses of user experiences related to recommender systems in a Pakistani higher education institution. The fortunate discoveries of recommendations are studied in the context of exploring research articles with the help of a recommender system. This dissertation covers both constructive and experimental research. The articles included in this dissertation present original research experimenting with different user interface designs in recommender systems facilitating serendipity, discuss stakeholder requirements, assess user experiences with recommended articles, and present a study on task load analysis of recommender systems. The key findings of this research are that serendipity of recommendations can be facilitated to users with the user interface. Recommender systems can become an instrumental technology in the higher education research and developing countries can benefit from recommender systems applications in higher education institutions

    A Requirements-Based Exploration of Open-Source Software Development Projects – Towards a Natural Language Processing Software Analysis Framework

    Get PDF
    Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects, error-prone. Automated analysis of natural language requirements, even partial, will be of great benefit. Towards that end, I describe the design and validation of an automated natural language requirements classifier for open source software development projects. I compare two strategies for recognizing requirements in open forums of software features. The results suggest that classifying text at the forum post aggregation and sentence aggregation levels may be effective. Initial results suggest that it can reduce the effort required to analyze requirements of open source software development projects. Software development organizations and communities currently employ a large number of software development techniques and methodologies. This implied complexity is also enhanced by a wide range of software project types and development environments. The resulting lack of consistency in the software development domain leads to one important challenge that researchers encounter while exploring this area: specificity. This results in an increased difficulty of maintaining a consistent unit of measure or analysis approach while exploring a wide variety of software development projects and environments. The problem of specificity is more prominently exhibited in an area of software development characterized by a dynamic evolution, a unique development environment, and a relatively young history of research when compared to traditional software development: the open-source domain. While performing research on open source and the associated communities of developers, one can notice the same challenge of specificity being present in requirements engineering research as in the case of closed-source software development. Whether research is aimed at performing longitudinal or cross-sectional analyses, or attempts to link requirements to other aspects of software development projects and their management, specificity calls for a flexible analysis tool capable of adapting to the needs and specifics of the explored context. This dissertation covers the design, implementation, and evaluation of a model, a method, and a software tool comprising a flexible software development analysis framework. These design artifacts use a rule-based natural language processing approach and are built to meet the specifics of a requirements-based analysis of software development projects in the open-source domain. This research follows the principles of design science research as defined by Hevner et. al. and includes stages of problem awareness, suggestion, development, evaluation, and results and conclusion (Hevner et al. 2004; Vaishnavi and Kuechler 2007). The long-term goal of the research stream stemming from this dissertation is to propose a flexible, customizable, requirements-based natural language processing software analysis framework which can be adapted to meet the research needs of multiple different types of domains or different categories of analyses

    Mining and Analysis of Control Structure Variant Clones

    Get PDF
    Code duplication (software clones) is a very common phenomenon in existing software systems, and is also considered to be an indication of poor software maintainability. In recent years, the detection of clones has drawn considerable attention. The majority of existing clone detection techniques focus on the syntactic similarity of code fragments, and more specifically, they support the detection of Type-1 clones (i.e., identical code fragments except for variations in whitespace, layout, and comments), Type-2 clones (i.e., structurally/syntactically identical fragments except for variations in identifiers, literals, types, layout, and comments), and Type-3 clones (i.e., copied fragments with statements changed, added, or removed in addition to variations in identifiers, literals, types, layout and comments). However, recent studies have shown that when developers implement the same functionalities, their code solutions may differ substantially in terms of their syntactical structure. This is because developers follow different programming styles or language features when implementing, for instance, control structures, such as loops and conditionals. From the perspective of clone management, different strategies are required to detect and refactor these control structure variant clones. Thus, there is a clear need for functionality-aware clone mining approaches, which are capable of distinguishing functional clones from syntactical clones. In this thesis, we are proposing a method for mining control structure variant clones. More specifically, the proposed approach can mine clones which use different, but functionally equivalent control structures to implement functionally similar iterations and conditionals. Our method is evaluated on six open-source systems by manually inspecting the mined clones and computing the precision and recall of our technique. Moreover, we create a publicly available benchmark of control structure variant clones. Based on the clones we found, we also propose some improvements to tackle the limitations of JDeodorant in the refactoring of control structure variant clones

    Detection and analysis of near-miss clone genealogies

    Get PDF
    It is believed that identical or similar code fragments in source code, also known as code clones, have an impact on software maintenance. A clone genealogy shows how a group of clone fragments evolve with the evolution of the associated software system, and thus may provide important insights on the maintenance implications of those clone fragments. Considering the importance of studying the evolution of code clones, many studies have been conducted on this topic. However, after a decade of active research, there has been a marked lack of progress in understanding the evolution of near-miss software clones, especially where statements have been added, deleted, or modified in the copied fragments. Given that there are a significant amount of near-miss clones in the software systems, we believe that without studying the evolution of near-miss clones, one cannot have a complete picture of the clone evolution. In this thesis, we have advanced the state-of-the-art in the evolution of clone research in the context of both exact and near-miss software clones. First, we performed a large-scale empirical study to extend the existing knowledge about the evolution of exact and renamed clones where identifiers have been modified in the copied fragments. Second, we have developed a framework, gCad that can automatically extract both exact and near-miss clone genealogies across multiple versions of a program and identify their change patterns reasonably fast while maintaining high precision and recall. Third, in order to gain a broader perspective of clone evolution, we extended gCad to calculate various evolutionary metrics, and performed an in-depth empirical study on the evolution of both exact and near-miss clones in six open source software systems of two different programming languages with respect to five research questions. We discovered several interesting evolutionary phenomena of near-miss clones which either contradict with previous findings or are new. Finally, we further improved gCad, and investigated a wide range of attributes and metrics derived from both the clones themselves and their evolution histories to identify certain attributes, which developers often use to remove clones in the real world. We believe that our new insights in the evolution of near-miss clones, and about how developers approach and remove duplication, will play an important role in understanding the maintenance implications of clones and will help design better clone management systems

    Eye gaze and interaction contexts for change tasks – Observations and potential

    Full text link
    The more we know about software developers’ detailed navigation behavior for change tasks, the better we are able to provide effective tool support. Currently, most empirical studies on developers performing change tasks are, however, limited to very small code snippets or limited by the granularity and detail of the data collected on developer’s navigation behavior. In our research, we extend this work by combining user interaction monitoring to gather interaction context – the code elements a developer selects and edits – with eye-tracking to gather more detailed and fine-granular gaze context-code elements a developer looked at. In a study with 12 professional and 10 student developers we gathered interaction and gaze contexts from participants working on three change tasks of an open source system. Based on an analysis of the data we found, amongst other results, that gaze context captures different aspects than interaction context and that developers only read small portions of code elements. We further explore the potential of the more detailed and fine-granular data by examining the use of the captured change task context to predict perceived task difficulty and to provide better and more fine-grained navigation recommendations. We discuss our findings and their implications for better tool support

    Cooperative Based Software Clustering on Dependency Graphs

    Get PDF
    The organization of software systems into subsystems is usually based on the constructs of packages or modules and has a major impact on the maintainability of the software. However, during software evolution, the organization of the system is subject to continual modification, which can cause it to drift away from the original design, often with the effect of reducing its quality. A number of techniques for evaluating a system's maintainability and for controlling the effort required to conduct maintenance activities involve software clustering. Software clustering refers to the partitioning of software system components into clusters in order to obtain both exterior and interior connectivity between these components. It helps maintainers enhance the quality of software modularization and improve its maintainability. Research in this area has produced numerous algorithms with a variety of methodologies and parameters. This thesis presents a novel ensemble approach that synthesizes a new solution from the outcomes of multiple constituent clustering algorithms. The main principle behind this approach derived from machine learning, as applied to document clustering, but it has been modified, both conceptually and empirically, for use in software clustering. The conceptual modifications include working with a variable number of clusters produced by the input algorithms and employing graph structures rather than feature vectors. The empirical modifications include experiments directed at the selection of the optimal cluster merging criteria. Case studies based on open source software systems show that establishing cooperation between leading state-of-the-art algorithms produces better clustering results compared with those achieved using only one of any of the algorithms considered

    Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks.

    Get PDF
    Information Retrieval (IR) approaches are used to leverage textual or unstructured data generated during the software development process to support various software engineering (SE) tasks (e.g., concept location, traceability link recovery, change impact analysis, etc.). Two of the most important steps for applying IR techniques to support SE tasks are preprocessing the corpus and configuring the IR technique, and these steps can significantly influence the outcome and the amount of effort developers have to spend for these maintenance tasks. We present the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process to support SE tasks. The approach named IR-GA determines the (near) optimal solution to be used for each step of the IR process without requiring any training. We applied IR-GA on three different SE tasks and the results of the study indicate that IR-GA outperforms approaches previously used in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised approach and a combinatorial approach
    corecore