12 research outputs found

    Understanding developers' needs on deprecation as a language feature

    Full text link
    Deprecation is a language feature that allows API producers to mark a feature as obsolete. We aim to gain a deep understanding of the needs of API producers and consumers alike regarding deprecation. To that end, we investigate why API producers deprecate features, whether they remove deprecated features, how they expect consumers to react, and what prompts an API consumer to react to deprecation. To achieve this goal we conduct semi-structured interviews with 17 third-party Java API producers and survey 170 Java developers. We observe that the current deprecation mechanism in Java and the proposal to enhance it does not address all the needs of a developer. This leads us to propose and evaluate three further enhancements to the deprecation mechanism

    Revisiting test smells in automatically generated tests : limitations, pitfalls, and opportunities

    Get PDF
    ​© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of "test smells", a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts

    Visualizing code and coverage changes for code review

    Full text link
    One of the tasks of reviewers is to verify that code modifications are well tested. However, current tools offer little support in understanding precisely how changes to the code relate to changes to the tests. In particular, it is hard to see whether (modified) test code covers the changed code. To mitigate this problem, we developed Operias, a tool that provides a combined visualization of fine-grained source code differences and coverage impact. Operias works both as a stand-alone tool on specific project versions and as a service hooked to GitHub. In the latter case, it provides automated reports for each new pull request, which reviewers can use to assess the code contribution. Operias works for any Java project that works with maven and its standard Cobertura coverage plugin. We present how Operias could be used to identify test-related problems in real-world pull requests. Operias is open source and available on GitHub with a demo video: https://github.com/SERG-Delft/operia

    fine-GRAPE: fine-grained APi usage extractor – an approach and dataset to investigate API usage

    Get PDF
    An Application Programming Interface (API) provides a set of functionalities to a developer with the aim of enabling reuse. APIs have been investigated from different angles such as popularity usage and evolution to get a better understanding of their various characteristics. For such studies, software repositories are mined for API usage examples. However, many of the mining algorithms used for such purposes do not take type information into account. Thus making the results unreliable. In this paper, we aim to rectify this by introducing fine-GRAPE, an approach that produces fine-grained API usage information by taking advantage of type information while mining API method invocations and annotation. By means of fine-GRAPE, we investigate API usages from Java projects hosted on GitHub. We select five of the most popular APIs across GitHub Java projects and collect historical API usage information by mining both the release history of these APIs and the code history of every project that uses them. We perform two case studies on the resulting dataset. The first measures the lag time of each client. The second investigates the percentage of used API features. In the first case we find that for APIs that release more frequently clients are far less likely to upgrade to a more recent version of the API as opposed to clients of APIs that release infrequently. The second case study shows us that for most APIs there is a small number of features that is actually used and most of these features relate to those that have been introduced early in the APIs lifecycle

    To react, or not to react: Patterns of reaction to API deprecation

    Get PDF
    Application Programming Interfaces (API) provide reusable functionality to aid developers in the development process. The features provided by these APIs might change over time as the API evolves. To allow API consumers to peacefully transition from older obsolete features to new features, API producers make use of the deprecation mechanism that allows them to indicate to the consumer that a feature should no longer be used. The Java language designers noticed that no one was taking these deprecation warnings seriously and continued using outdated features. Due to this, they decided to change the implementation of this feature in Java 9. We question as to what extent this issue exists and whether the Java language designers have a case. We start by identifying the various ways in which an API consumer can react to deprecation. Following this we benchmark the frequency of the reaction patterns by creating a dataset consisting of data mined from 50 API consumers totalling 297,254 GitHub based projects and 1,322,612,567 type-checked method invocations. We see that predominantly consumers do not react to deprecation and we try to explain this behavior by surveying API consumers and by analyzing if the API’s deprecation policy has an impact on the consumers’ decision to react

    On the reaction to deprecation of clients of 4+1 popular Java APIs and the JDK

    Full text link
    Application Programming Interfaces (APIs) are a tremendous resource—that is, when they are stable. Several studies have shown that this is unfortunately not the case. Of those, a large-scale study of API changes in the Pharo Smalltalk ecosystem documented several findings about API deprecations and their impact on API clients. We extend this study, by analyzing clients of both popular third-party Java APIs and the JDK API. This results in a dataset consisting of more than 25,000 clients of five popular Java APIs on GitHub, and 60 clients of the JDK API from Maven Central. This work addresses several shortcomings of the previous study, namely: a study of several distinct API clients in a popular, statically-typed language, with more accurate version information. We compare and contrast our findings with the previous study and highlight new ones, particularly on the API client update practices and the startling similarities between reaction behavior in Smalltalk and Java. We make a comparison between reaction behavior for third-party APIs and JDK APIs, given that language APIs are a peculiar case in terms of wide-spread usage, documentation, and support from IDEs. Furthermore, we investigate the connection between reaction patterns of a client and the deprecation policy adopted by the API used

    The indolent lambdification of Java: Understanding the support for lambda expressions in the Java ecosystem

    Full text link
    As Java 8 introduced functional interfaces and lambda expressions to the Java programming language, the JDK API was changed to introduce support for lambda expressions, thus allowing consumers to define lambda functions when using Java’s collections. While the JDK API allows for a functional paradigm, for API consumers to be able to completely embrace Java’s new functional features, third-party APIs must also support lambda expressions. To understand the current state of the Java ecosystem, we investigate (i) the extent to which third-party Java APIs have changed their interfaces, (ii) why or why not they introduce functional interface support and (iii) in the case the API has changed its interface how it does so. We also investigate the consumers’ perspective, particularly their ease in using lambda expressions in Java with APIs. We perform our investigation by manually analyzing the top 50 popular Java APIs, conducting in-person and email interviews with 23 API producers, and surveying 110 developers. We find that only a minority of the top 50 APIs support functional interfaces, the rest does not support them, predominantly in the interest of backward compatibility. Java 7 support is still greatly desirable due to enterprise projects not migrating to newer versions of Java. This suggests that the Java ecosystem is stagnant and that the introduction of new language features will not be enough to save it from the advent of new languages such as Kotlin (JVM based) and Rust (non-JVM based)

    Mining Motivated Trends of Usage of Haskell Libraries

    Full text link
    We propose an initial approach to mine the usage trends of libraries in Haskell, a popular functional programming language. We integrate it with a novel, initial method to automatically determine the reasons of clients for switching to different versions. Based on these, we conduct a preliminary investigation of trends of usage in Haskell libraries. Results suggest that trends are similar to those in the Java ecosystem and in line with Rogers' theory on the diffusion of innovation. Our results also provide indication on Haskell libraries being all by and large stable

    Test Smells 20 Years Later: Detectability, Validity, and Reliability

    No full text
    Test smells aim to capture design issues in test code that reduces its maintainability. These have been extensively studied and generally found quite prevalent in both human-written and automatically generated test-cases. However, most evidence of prevalence is based on specific static detection rules. Although those are based on the original, conceptual definitions of the various test smells, recent empirical studies indicate that developers perceive warnings raised by detection tools as overly strict and non-representative of the maintainability and quality of test suites. This leads us to re-assess test smell detection tools' detection accuracy and investigate the prevalence and detectability of test smells more broadly.Specifically, we construct a hand-annotated dataset spanning hundreds of test suites both written by developers and generated by two test generation tools (EvoSuite and JTExpert) and performed a multi-stage, cross-validated manual analysis to identify the presence of six types of test smells in these. We then use this manual labeling to benchmark the performance and external validity of two test smell detection tools -- one widely used in prior work and one recently introduced with the express goal to match developer perceptions of test smells.Our results primarily show that the current vocabulary of test smells is highly mismatched to real concerns: multiple smells were ubiquitous on developer-written tests but virtually never correlated with semantic or maintainability flaws; machine-generated tests actually often scored better, but in reality, suffered from a host of problems not well-captured by current test smells. Current test smell detection strategies poorly characterized the issues in these automatically generated test suites; in particular, the older tool's detection strategies misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice, (ii) more accurate detection strategies to be evaluated primarily in industrial contexts.Software Engineerin

    Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binarie

    No full text
    Reverse engineering binaries is required to understand and analyse programs for which the source code is unavailable. Decompilers can transform the largely unreadable binaries into a more readable source code-like representation. However, reverse engineering is time-consuming, much of which is taken up by labelling the functions with semantic information.While the automated summarisation of decompiled code can help Reverse Engineers understand and analyse binaries, current work mainly focuses on summarising source code, and no suitable dataset exists for this task.In this work, we extend large pre-trained language models of source code to summarise decompiled binary functions. Furthermore, we investigate the impact of input and data properties on the performance of such models. Our approach consists of two main components; the data and the model.We first build CAPYBARA, a dataset of 214K decompiled function-documentation pairs across various compiler optimisations. We extend CAPYBARA further by generating synthetic datasets and deduplicating the data.Next, we fine-tune the CodeT5 base model with CAPYBARA to create BinT5. BinT5 achieves the state-of-the-art BLEU-4 score of 60.83, 58.82, and 44.21 for summarising source, decompiled, and synthetically stripped decompiled code, respectively. This indicates that these models can be extended to decompiled binaries successfully.Finally, we found that the performance of BinT5 is not heavily dependent on the dataset size and compiler optimisation level. We recommend future research to further investigate transferring knowledge when working with less expressive input formats such as stripped binaries
    corecore