1,230 research outputs found

    On the relationships between domain-based coupling and code clones: an exploratory study

    No full text
    Knowledge of similar code fragments, also known as code clones, is important to many software maintenance activities including bug fixing, refactoring, impact analysis and program comprehension. While a great deal of research has been conducted for finding techniques and implementing tools to identify code clones, little research has been done to analyze the relationships between code clones and other aspects of software. In this paper, we attempt to uncover the relationships between code clones and coupling among domain-level components. We report on a case study of a large-scale open source enterprise system, where we demonstrate the probability of finding code clones among components with domain-based coupling is more than 90%. While such a probabilistic view does not replace a clone detection tool per se, it certainly has the potential to complement the existing tools by providing the probability of having code clones between software components. For example, it can both reduce the clone search space and provide a flexible and language independent way of focusing only on a specific part of the system. It can also provide a higher level of abstraction to look at the cloning relationships among software components

    A survey on software coupling relations and tools

    Full text link
    Context Coupling relations reflect the dependencies between software entities and can be used to assess the quality of a program. For this reason, a vast amount of them has been developed, together with tools to compute their related metrics. However, this makes the coupling measures suitable for a given application challenging to find. Goals The first objective of this work is to provide a classification of the different kinds of coupling relations, together with the metrics to measure them. The second consists in presenting an overview of the tools proposed until now by the software engineering academic community to extract these metrics. Method This work constitutes a systematic literature review in software engineering. To retrieve the referenced publications, publicly available scientific research databases were used. These sources were queried using keywords inherent to software coupling. We included publications from the period 2002 to 2017 and highly cited earlier publications. A snowballing technique was used to retrieve further related material. Results Four groups of coupling relations were found: structural, dynamic, semantic and logical. A fifth set of coupling relations includes approaches too recent to be considered an independent group and measures developed for specific environments. The investigation also retrieved tools that extract the metrics belonging to each coupling group. Conclusion This study shows the directions followed by the research on software coupling: e.g., developing metrics for specific environments. Concerning the metric tools, three trends have emerged in recent years: use of visualization techniques, extensibility and scalability. Finally, some coupling metrics applications were presented (e.g., code smell detection), indicating possible future research directions. Public preprint [https://doi.org/10.5281/zenodo.2002001]

    Change Impact Analysis of Code Clones

    Get PDF
    Copying a code fragment and reusing it with or without modifications is known to be a frequent activity in software development. This results in exact or closely similar copies of code fragments, known as code clones, to exist in the software systems. Developers leverage the code reuse opportunity by code cloning for increased productivity. However, different studies on code clones report important concerns regarding the impacts of clones on software maintenance. One of the key concerns is to maintain consistent evolution of the clone fragments as inconsistent changes to clones may introduce bugs. Challenges to the consistent evolution of clones involve the identification of all related clone fragments for change propagation when a cloned fragment is changed. The task of identifying the ripple effects (i.e., all the related components to change) is known as Change Impact Analysis (CIA). In this thesis, we evaluate the impacts of clones on software systems from new perspectives and then we propose an evolutionary coupling based technique for change impact analysis of clones. First, we empirically evaluate the comparative stability of cloned and non-cloned code using fine-grained syntactic change types. Second, we assess the impacts of clones from the perspective of coupling at the domain level. Third, we carry out a comprehensive analysis of the comparative stability of cloned and non-cloned code within a uniform framework. We compare stability metrics with the results from the original experimental settings with respect to the clone detection tools and the subject systems. Fourth, we investigate the relationships between stability and bug-proneness of clones to assess whether and how stability contribute to the bug-proneness of different types of clones. Next, in the fifth study, we analyzed the impacts of co-change coupling on the bug-proneness of different types of clones. After a comprehensive evaluation of the impacts of clones on software systems, we propose an evolutionary coupling based CIA approach to support the consistent evolution of clones. In the sixth study, we propose a solution to minimize the effects of atypical commits (extra large commits) on the accuracy of the detection of evolutionary coupling. We propose a clustering-based technique to split atypical commits into pseudo-commits of related entities. This considerably reduces the number of incorrect couplings introduced by the atypical commits. Finally, in the seventh study, we propose an evolutionary coupling based change impact analysis approach for clones. In addition to handling the atypical commits, we use the history of fine-grained syntactic changes extracted from the software repositories to detect typed evolutionary coupling of clones. Conventional approaches consider only the frequency of co-change of the entities to detect evolutionary coupling. We consider both change frequencies and the fine-grained change types in the detection of evolutionary coupling. Findings from our studies give important insights regarding the impacts of clones and our proposed typed evolutionary coupling based CIA approach has the potential to support the consistent evolution of clones for better clone management

    Structured Review of Code Clone Literature

    Get PDF
    This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

    A review of software change impact analysis

    Get PDF
    Change impact analysis is required for constantly evolving systems to support the comprehension, implementation, and evaluation of changes. A lot of research effort has been spent on this subject over the last twenty years, and many approaches were published likewise. However, there has not been an extensive attempt made to summarize and review published approaches as a base for further research in the area. Therefore, we present the results of a comprehensive investigation of software change impact analysis, which is based on a literature review and a taxonomy for impact analysis. The contribution of this review is threefold. First, approaches proposed for impact analysis are explained regarding their motivation and methodology. They are further classified according to the criteria of the taxonomy to enable the comparison and evaluation of approaches proposed in literature. We perform an evaluation of our taxonomy regarding the coverage of its classification criteria in studied literature, which is the second contribution. Last, we address and discuss yet unsolved problems, research areas, and challenges of impact analysis, which were discovered by our review to illustrate possible directions for further research

    Reverse Engineering Heterogeneous Applications

    Get PDF
    Nowadays a large majority of software systems are built using various technologies that in turn rely on different languages (e.g. Java, XML, SQL etc.). We call such systems heterogeneous applications (HAs). By contrast, we call software systems that are written in one language homogeneous applications. In HAs the information regarding the structure and the behaviour of the system is spread across various components and languages and the interactions between different application elements could be hidden. In this context applying existing reverse engineering and quality assurance techniques developed for homogeneous applications is not enough. These techniques have been created to measure quality or provide information about one aspect of the system and they cannot grasp the complexity of HAs. In this dissertation we present our approach to support the analysis and evolution of HAs based on: (1) a unified first-class description of HAs and, (2) a meta-model that reifies the concept of horizontal and vertical dependencies between application elements at different levels of abstraction. We implemented our approach in two tools, MooseEE and Carrack. The first is an extension of the Moose platform for software and data analysis and contains our unified meta-model for HAs. The latter is an engine to infer derived dependencies that can support the analysis of associations among the heterogeneous elements composing HA. We validate our approach and tools by case studies on industrial and open-source JEAs which demonstrate how we can handle the complexity of such applications and how we can solve problems deriving from their heterogeneous nature

    Understanding Programmers' Working Context by Mining Interaction Histories

    Get PDF
    Understanding how software developers do their work is an important first step to improving their productivity. Previous research has generally focused either on laboratory experiments or coarsely-grained industrial case studies; however, studies that seek a finegrained understanding of industrial programmers working within a realistic context remain limited. In this work, we propose to use interaction histories — that is, finely detailed records of developers’ interactions with their IDE — as our main source of information for understanding programmer’s work habits. We develop techniques to capture, mine, and analyze interaction histories, and we present two industrial case studies to show how this approach can help to better understand industrial programmers’ work at a detailed level: we explore how the basic characteristics of software maintenance task structures can be better understood, how latent dependence between program artifacts can be detected at interaction time, and show how patterns of interaction coupling can be identified. We also examine the link between programmer interactions and some of the contextual factors of software development, such as the nature of the task being performed, the design of the software system, and the expertise of the developers. In particular, we explore how task boundaries can be automatically detected from interaction histories, how system design and developer expertise may affect interaction coupling, and whether newcomer and expert developers differ in their interaction history patterns. These findings can help us to better reason about the multidimensional nature of software development, to detect potential problems concerning task, design, expertise, and other contextual factors, and to build smarter tools that exploit the inherent patterns within programmer interactions and provide improved support for task-aware and expertise-aware software development

    Predicting change propagation using domain-based coupling

    Get PDF
    Most enterprise systems operate in domains where business rules and requirements frequently change. Managing the cost and impact of these changes has been a known challenge, and the software maintenance community has been tackling it for more than two decades. The traditional approach to impact analysis is by tracing dependencies in the design documents and the source code. More recently the software maintenance history has been exploited for impact analysis. The problem is that these approaches are difficult to implement for hybrid systems that consist of heterogeneous components. In today’s computer era, it is common to find systems of systems where each system was developed in a different language. In such environments, it is a challenge to estimate the change propagation between components that are developed in different languages. There is often no direct code dependency between these components, and they are maintained in different development environments by different developers. In addition, it is the domain experts and consultants who raise the most of the enhancement requests; however, using the existing change impact analysis methods, they cannot evaluate the impact and cost of the proposed changes without the support of the developers. This thesis seeks to address these problems by proposing a new approach to change impact analysis based on software domain-level information. This approach is based on the assumption that domain-level relationships are reflected in the software source code, and one can predict software dependencies and change propagation by exploiting software domain-level information. The proposed approach is independent of the software implementation, inexpensive to implement, and usable by domain experts with no requirement to access and analyse the source code. This thesis introduces domain-based coupling as a novel measure of the semantic similarity between software user interface components. The hypothesis is that the domain-based coupling between software components is correlated with the likelihood of the existence of dependencies and change propagation between these components. This hypothesis has been evaluated with two case studies: • A study of one of the largest open source enterprise systems demonstrates that architectural dependencies can be identified with an accuracy of more than 70% solely based on the domain-based coupling. • A study of 12 years’ maintenance history of the five subsystems of a significant sized proprietary enterprise system demonstrates that the co-change coupling derived from over 75,000 change records can be predicted solely using domain-based coupling, with average recall and precision of more than 60%, which is of comparable quality to other state-of-the-art change impact analysis methods. The results of these studies support our hypothesis that software dependencies and change propagation can be predicted solely from software domain-level information. Although the accuracy of such predictions are not sufficiently strong to completely replace the traditional dependency analysis methods; nevertheless, the presented results suggest that the domain-based coupling might be used as a complementary method or where analysis of dependencies in the code and documents is not a viable option

    Bayesian statistical approach for protein residue-residue contact prediction

    Get PDF
    Despite continuous efforts in automating experimental structure determination and systematic target selection in structural genomics projects, the gap between the number of known amino acid sequences and solved 3D structures for proteins is constantly widening. While DNA sequencing technologies are advancing at an extraordinary pace, thereby constantly increasing throughput while at the same time reducing costs, protein structure determination is still labour intensive, time-consuming and expensive. This trend illustrates the essential importance of complementary computational approaches in order to bridge the so-called sequence-structure gap. About half of the protein families lack structural annotation and therefore are not amenable to techniques that infer protein structure from homologs. These protein families can be addressed by de novo structure prediction approaches that in practice are often limited by the immense computational costs required to search the conformational space for the lowest-energy conformation. Improved predictions of contacts between amino acid residues have been demonstrated to sufficiently constrain the overall protein fold and thereby extend the applicability of de novo methods to larger proteins. Residue-residue contact prediction is based on the idea that selection pressure on protein structure and function can lead to compensatory mutations between spatially close residues. This leaves an echo of correlation signatures that can be traced down from the evolutionary record. Despite the success of contact prediction methods, there are several challenges. The most evident limitation lies in the requirement of deep alignments, which excludes the majority of protein families without associated structural information that are the focus for contact guided de novo structure prediction. The heuristics applied by current contact prediction methods pose another challenge, since they omit available coevolutionary information. This work presents two different approaches for addressing the limitations of contact prediction methods. Instead of inferring evolutionary couplings by maximizing the pseudo-likelihood, I maximize the full likelihood of the statistical model for protein sequence families. This approach performed with comparable precision up to minor improvements over the pseudo-likelihood methods for protein families with few homologous sequences. A Bayesian statistical approach has been developed that provides posterior probability estimates for residue-residue contacts and eradicates the use of heuristics. The full information of coevolutionary signatures is exploited by explicitly modelling the distribution of statistical couplings that reflects the nature of residue-residue interactions. Surprisingly, the posterior probabilities do not directly translate into more precise predictions than obtained by pseudo-likelihood methods combined with prior knowledge. However, the Bayesian framework offers a statistically clean and theoretically solid treatment for the contact prediction problem. This flexible and transparent framework provides a convenient starting point for further developments, such as integrating more complex prior knowledge. The model can also easily be extended towards the Derivation of probability estimates for residue-residue distances to enhance the precision of predicted structures
    • …
    corecore