352 research outputs found

    A survey on software coupling relations and tools

    Full text link
    Context Coupling relations reflect the dependencies between software entities and can be used to assess the quality of a program. For this reason, a vast amount of them has been developed, together with tools to compute their related metrics. However, this makes the coupling measures suitable for a given application challenging to find. Goals The first objective of this work is to provide a classification of the different kinds of coupling relations, together with the metrics to measure them. The second consists in presenting an overview of the tools proposed until now by the software engineering academic community to extract these metrics. Method This work constitutes a systematic literature review in software engineering. To retrieve the referenced publications, publicly available scientific research databases were used. These sources were queried using keywords inherent to software coupling. We included publications from the period 2002 to 2017 and highly cited earlier publications. A snowballing technique was used to retrieve further related material. Results Four groups of coupling relations were found: structural, dynamic, semantic and logical. A fifth set of coupling relations includes approaches too recent to be considered an independent group and measures developed for specific environments. The investigation also retrieved tools that extract the metrics belonging to each coupling group. Conclusion This study shows the directions followed by the research on software coupling: e.g., developing metrics for specific environments. Concerning the metric tools, three trends have emerged in recent years: use of visualization techniques, extensibility and scalability. Finally, some coupling metrics applications were presented (e.g., code smell detection), indicating possible future research directions. Public preprint [https://doi.org/10.5281/zenodo.2002001]

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    Generating Class-Level Integration Tests Using Call Site Information

    Get PDF
    Search-based approaches have been used in the literature to automate the process of creating unit test cases. However, related work has shown that generated unit-tests with high code coverage could be ineffective, i.e., they may not detect all faults or kill all injected mutants. In this paper, we propose CLING, an integration-level test case generation approach that exploits how a pair of classes, the caller and the callee, interact with each other through method calls. In particular, CLING generates integration-level test cases that maximize the Coupled Branches Criterion (CBC). Coupled branches are pairs of branches containing a branch of the caller and a branch of the callee such that an integration test that exercises the former also exercises the latter. CBC is a novel integration-level coverage criterion, measuring the degree to which a test suite exercises the interactions between a caller and its callee classes. We implemented CLING and evaluated the approach on 140 pairs of classes from five different open-source Java projects. Our results show that (1) CLING generates test suites with high CBC coverage, thanks to the definition of the test suite generation as a many-objectives problem where each couple of branches is an independent objective; (2) such generated suites trigger different class interactions and can kill on average 7.7% (with a maximum of 50%) of mutants that are not detected by tests generated at the unit level; (3) CLING can detect integration faults coming from wrong assumptions about the usage of the callee class (32 for our subject systems) that remain undetected when using automatically generated unit-level test suites

    Pitfalls and Guidelines for Using Time-Based Git Data

    Get PDF
    Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantify how often such data is dirty. Depending on the research task and method used, including such dirty data could aect the research results. This paper presents an extended survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 754 technical track and data papers published in MSR 2004{2021, we saw at least 290 (38%) papers utilized time-based data. We also observed that most time-based data used in research papers comes in the form of Git commits, often from GitHub. Based on those results, we then used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty Git timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data from Git repositories

    Understanding and Optimizing Python-Based Applications - A Case Study on PYPY

    Get PDF
    Python is nowadays one of the most popular programming languages. It has been used extensively for rapid prototyping and developing real-world applications. Unfortunately, very few empirical studies were conducted on Python-based applications. There are various Python implementations (e.g., CPython, and PyPy). Among them, PyPy is generally the fastest due to PyPy's efficient tracing-based Just-in-Time (JIT) compiler. Understanding how PyPy has been evolved and the rationale behind its high performance would be very useful for Python application developers and researchers. In the first part of the thesis, we conducted a replication study on mining the historical code changes' of PyPy and compared our findings against Python-based applications from five other application domains. In the second part, we conducted a detailed empirical study on the performance impact of the JIT configuration settings of PyPy. The findings and the techniques in this thesis will be useful for Python application developers and researchers

    Understanding Programmers' Working Context by Mining Interaction Histories

    Get PDF
    Understanding how software developers do their work is an important first step to improving their productivity. Previous research has generally focused either on laboratory experiments or coarsely-grained industrial case studies; however, studies that seek a finegrained understanding of industrial programmers working within a realistic context remain limited. In this work, we propose to use interaction histories — that is, finely detailed records of developers’ interactions with their IDE — as our main source of information for understanding programmer’s work habits. We develop techniques to capture, mine, and analyze interaction histories, and we present two industrial case studies to show how this approach can help to better understand industrial programmers’ work at a detailed level: we explore how the basic characteristics of software maintenance task structures can be better understood, how latent dependence between program artifacts can be detected at interaction time, and show how patterns of interaction coupling can be identified. We also examine the link between programmer interactions and some of the contextual factors of software development, such as the nature of the task being performed, the design of the software system, and the expertise of the developers. In particular, we explore how task boundaries can be automatically detected from interaction histories, how system design and developer expertise may affect interaction coupling, and whether newcomer and expert developers differ in their interaction history patterns. These findings can help us to better reason about the multidimensional nature of software development, to detect potential problems concerning task, design, expertise, and other contextual factors, and to build smarter tools that exploit the inherent patterns within programmer interactions and provide improved support for task-aware and expertise-aware software development

    Change-centric improvement of team collaboration

    Get PDF
    In software development, teamwork is essential to the successful delivery of a final product. The software industry has historically built software utilizing development teams that share the workplace. Process models, tools, and methodologies have been enhanced to support the development of software in a collocated setting. However, since the dawn of the 21st century, this scenario has begun to change: an increasing number of software companies are adopting global software development to cut costs and speed up the development process. Global software development introduces several challenges for the creation of quality software, from the adaptation of current methods, tools, techniques, etc., to new challenges imposed by the distributed setting, including physical and cultural distance between teams, communication problems, and coordination breakdowns. A particular challenge for distributed teams is the maintenance of a level of collaboration naturally present in collocated teams. Collaboration in this situation naturally d r ops due to low awareness of the activity of the team. Awareness is intrinsic to a collocated team, being obtained through human interaction such as informal conversation or meetings. For a distributed team, however, geographical distance and a subsequent lack of human interaction negatively impact this awareness. This dissertation focuses on the improvement of collaboration, especially within geographically dispersed teams. Our thesis is that by modeling the evolution of a software system in terms of fine-grained changes, we can produce a detailed history that may be leveraged to help developers collaborate. To validate this claim, we first c r eate a model to accurately represent the evolution of a system as sequences of fine- grained changes. We proceed to build a tool infrastructure able to capture and store fine-grained changes for both immediate and later use. Upon this foundation, we devise and evaluate a number of applications for our work with two distinct goals: 1. To assist developers with real-time information about the activity of the team. These applications aim to improve developers’ awareness of team member activity that can impact their work. We propose visualizations to notify developers of ongoing change activity, as well as a new technique for detecting and informing developers about potential emerging conflicts. 2. To help developers satisfy their needs for information related to the evolution of the software system. These applications aim to exploit the detailed change history generated by our approach in order to help developers find answers to questions arising during their work. To this end, we present two new measurements of code expertise, and a novel approach to replaying past changes according to user-defined criteria. We evaluate the approach and applications by adopting appropriate empirical methods for each case. A total of two case studies – one controlled experiment, and one qualitative user study – are reported. The results provide evidence that applications leveraging a fine-grained change history of a software system can effectively help developers collaborate in a distributed setting
    • …
    corecore