128 research outputs found

    A Systematic Approach to Benchmark and Improve Automated Static Detection of Java-API Misuses

    Get PDF
    Today's software industry relies heavily on the reuse of existing software libraries. Such libraries provide the building blocks for modern software products. Reusing them allow developers to focus on innovation, while standing on the shoulders of giants. To use libraries effectively, developers need to know the Application Programming Interfaces (APIs) through which they communicate with the libraries. This includes both the APIs' semantics and the (implicit) usage constraints that come with them. In the face of the rapidly growing and evolving supply of software libraries, this has become a challenging task. As a result, incorrect usages of APIs, or API misuses, are a prevalent cause of software bugs, crashes, and vulnerabilities. In reaction to this problem, researchers have proposed a multitude of developer-assistance tools. One particular class of such tools automates the detection of API misuses in software code. We call these tools API-misuse detectors. Existing misuse detectors address different aspects of API misuse. However, no attempt has been made to systematically define the problem space of API misuse and to assess the prevalence of API misuses compared to other types of bugs. This makes it impossible to judge the relevance of research on API-misuse detection. Moreover, previous empirical evaluations of misuse detectors commonly measure the detectors' precision. However, since the studies use different datasets, it is unclear to which extend the results are comparable. It is also unclear where the detectors make trade-offs between their precision and their recall. In this thesis, we first present a systematic analysis of the problem of API misuse. We find that API misuse causes 9.1% of all software bugs in real-world projects, including many critical issues, such as program crashes, data loss, and security vulnerabilities. Then, we survey the literature to consolidate over a decade of research on API-misuse detection and build MUBench, a public automated benchmark for API-misuse detectors. This enables us to conduct the first-ever qualitative and quantitative comparison of existing misuse detectors. We find that these detectors have the potential to discover many API misuses, but suffer from extremely low precision and recall in practice. Finally, we systematically design MUDetect, a new API-misuse detector that addresses many of the problems of existing detectors. Using MUBench, we demonstrate that MUDetect clearly outperforms existing detectors with respect to both precision and recall. Our results provide strong evidence that, following our systematic approach, we can develop API-misuse detectors that are fit for practical application

    Evaluating Automatic Program Repair Capabilities to Repair API Misuses

    Get PDF
    API misuses are well-known causes of software crashes and security vulnerabilities. However, their detection and repair is challenging given that the correct usages of (third-party) APIs might be obscure to the developers of client programs. This paper presents the first empirical study to assess the ability of existing automated bug repair tools to repair API misuses, which is a class of bugs previously unexplored. Our study examines and compares 14 Java test-suite-based repair tools (11 proposed before 2018, and three afterwards) on a manually curated benchmark (APIREPBENCH) consisting of 101 API misuses. We develop an extensible execution framework (APIARTY) to automatically execute multiple repair tools. Our results show that the repair tools are able to generate patches for 28% of the API misuses considered. While the 11 less recent tools are generally fast (the median execution time of the repair attempts is 3.87 minutes and the mean execution time is 30.79 minutes), the three most recent are less efficient (i.e., 98% slower) than their predecessors. The tools generate patches for API misuses that mostly belong to the categories of missing null check, missing value, missing exception, and missing call. Most of the patches generated by all tools are plausible (65%), but only few of these patches are semantically correct to human patches (25%). Our findings suggest that the design of future repair tools should support the localisation of complex bugs, including different categories of API misuses, handling of timeout issues, and ability to configure large software projects. Both APIREPBENCH and APIARTY have been made publicly available for other researchers to evaluate the capabilities of repair tools on detecting and fixing API misuses

    Evaluating Pre-trained Language Models for Repairing API Misuses

    Full text link
    API misuses often lead to software bugs, crashes, and vulnerabilities. While several API misuse detectors have been proposed, there are no automatic repair tools specifically designed for this purpose. In a recent study, test-suite-based automatic program repair (APR) tools were found to be ineffective in repairing API misuses. Still, since the study focused on non-learning-aided APR tools, it remains unknown whether learning-aided APR tools are capable of fixing API misuses. In recent years, pre-trained language models (PLMs) have succeeded greatly in many natural language processing tasks. There is a rising interest in applying PLMs to APR. However, there has not been any study that investigates the effectiveness of PLMs in repairing API misuse. To fill this gap, we conduct a comprehensive empirical study on 11 learning-aided APR tools, which include 9 of the state-of-the-art general-purpose PLMs and two APR tools. We evaluate these models with an API-misuse repair dataset, consisting of two variants. Our results show that PLMs perform better than the studied APR tools in repairing API misuses. Among the 9 pre-trained models tested, CodeT5 is the best performer in the exact match. We also offer insights and potential exploration directions for future research.Comment: Under review by TOSE

    Automated Change Rule Inference for Distance-Based API Misuse Detection

    Full text link
    Developers build on Application Programming Interfaces (APIs) to reuse existing functionalities of code libraries. Despite the benefits of reusing established libraries (e.g., time savings, high quality), developers may diverge from the API's intended usage; potentially causing bugs or, more specifically, API misuses. Recent research focuses on developing techniques to automatically detect API misuses, but many suffer from a high false-positive rate. In this article, we improve on this situation by proposing ChaRLI (Change RuLe Inference), a technique for automatically inferring change rules from developers' fixes of API misuses based on API Usage Graphs (AUGs). By subsequently applying graph-distance algorithms, we use change rules to discriminate API misuses from correct usages. This allows developers to reuse others' fixes of an API misuse at other code locations in the same or another project. We evaluated the ability of change rules to detect API misuses based on three datasets and found that the best mean relative precision (i.e., for testable usages) ranges from 77.1 % to 96.1 % while the mean recall ranges from 0.007 % to 17.7 % for individual change rules. These results underpin that ChaRLI and our misuse detection are helpful complements to existing API misuse detectors

    A Reevaluation Of Why Crypto-Detectors Fail: A Systematic Revaluation Of Cryptographic Misuse Detection Techniques

    Get PDF
    The correct use of cryptography is central to ensuring data security in modern software systems. Hence, several academic and commercial static analysis tools have been developed for detecting and mitigating crypto-API misuse. While developers are optimistically adopting these crypto-API misuse detectors (or crypto-detectors) in their software development cycles, this momentum must be accompanied by a rigorous understanding of their effectiveness at finding crypto-API misuse in practice. The original paper presents the MASC framework, which enables a systematic and data-driven evaluation of crypto-detectors using mutation testing. MASC was grounded in a comprehensive view of the problem space by developing a data-driven taxonomy of existing crypto-API misuse, containing 105 misuse cases organized among nine semantic clusters. 12 generalizable usage based mutation operators were developed and three mutation scopes that can expressively instantiate thousands of compilable variants of the misuse cases for thoroughly evaluating crypto-detectors. Using MASC, nine major crypto-detectors were evaluated and 19 unique, undocumented flaws that severely impact the ability of crypto-detectors to discover misuses in practice were found. For my thesis, I built upon this previous research and greatly expanded the MASC framework. MASC was expanded in all areas by adding new functionality, new operators, new misuses, and expanding the taxonomy. In addition, I reevaluated the most up-to-date versions of the original 9 crypto-detectors and evaluated 5 additional crypto-detectors. On top of this I also doubled the amount of applications I used to evaluate the tools. To analyze crypto-detectors I looked at both the 9 crypto-detectors evaluated in the original work and 5 new crypto-detectors. For the original crypto-detectors, I evaluated them with the updated MASC against their most up-to-date versions to determine if old flaws that have previously been fixed have a tendency to reappear. Both old and new crypto-detectors were evaluated with mutated Android and Java applications that were used in the original MASC paper, 15 newly mutated Android and Java applications, and minimal examples of cryptographic misuse

    Active Learning of Discriminative Subgraph Patterns for API Misuse Detection

    Full text link
    A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern's frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects

    Static and Dynamic Analysis in Cryptographic-API Misuse Detection of Mobile Application

    Get PDF
    With Android devices becoming more advanced and gaining more popularity, the number of cryptographic-API misuses in mobile applications is escalating. Numerous snippets of code in Android are from Stack Overflow and over 90% of them contain several crypto-issues. Various crypto-misuse detectors come out aiming to report vulnerabilities of apps and better secure users’ privacy. These detectors can be broadly classified into two categories based on the analysis strategies employed to catch misuses – static analysis (i.e., by scanning the code base) and dynamic analysis (i.e., by executing the code). However, there are not enough research on comparing their underlying differences, making it difficult to explain the pervasiveness of static crypto-detectors in both academia and industry. The lack of studies potentially limits the improvement of crypto-detection efficiency. In this study, a holistic evaluation and comparison on static and dynamic analysis’ underlying mechanisms, robustness, and efficiency are carried out. A systematic empirical experiment is implemented on testing 1003 popular Android applications across 21 categories from Google Play. We find that 93.3% of the apps make at least one mistake using cryptographic APIs and closely analyze top four cryptographic rules reported to be violated most frequently by static crypto detector. Instead of merely comparing statistics such as false positives (i.e., false alarms), we focus on examining the crypto rules whose number of violations reported by static and dynamic crypto detectors diverge greatly. In addition, we firstly posit a new taxonomy schema that classifies cryptographic rules based on how they are inspected rather than their attack type or severity level. This schema will be useful to both researchers and practitioners to decide how to efficiently combine static and dynamic techniques to improve the reliability and accuracy of crypto-detection
    • …
    corecore