1,282 research outputs found

    Towards understanding the challenges faced by machine learning software developers and enabling automated solutions

    Get PDF
    Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To fill that gap this thesis reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikitlearn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. Our findings reveal the urgent need for software engineering (SE) research in this area. The second part of the thesis particularly focuses on understanding the Deep Neural Network (DNN) bug characteristics. We study 2,716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, their root causes and impacts, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. While exploring the bug characteristics, our findings imply that repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing DNNs. So, the third part of this thesis presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from Github for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns and the most common bug fix patterns are fixing data dimension and neural network connectivity. Finally, we propose an automatic technique to detect ML Application Programming Interface (API) misuses. We started with an empirical study to understand ML API misuses. Our study shows that ML API misuse is prevalent and distinct compared to non-ML API misuses. Inspired by these findings, we contributed Amimla (Api Misuse In Machine Learning Apis) an approach and a tool for ML API misuse detection. Amimla relies on several technical innovations. First, we proposed an abstract representation of ML pipelines to use in misuse detection. Second, we proposed an abstract representation of neural networks for deep learning related APIs. Third, we have developed a representation strategy for constraints on ML APIs. Finally, we have developed a misuse detection strategy for both single and multi-APIs. Our experimental evaluation shows that Amimla achieves a high average accuracy of ∼80% on two benchmarks of misuses from Stack Overflow and Github

    Evaluating Automatic Program Repair Capabilities to Repair API Misuses

    Get PDF
    API misuses are well-known causes of software crashes and security vulnerabilities. However, their detection and repair is challenging given that the correct usages of (third-party) APIs might be obscure to the developers of client programs. This paper presents the first empirical study to assess the ability of existing automated bug repair tools to repair API misuses, which is a class of bugs previously unexplored. Our study examines and compares 14 Java test-suite-based repair tools (11 proposed before 2018, and three afterwards) on a manually curated benchmark (APIREPBENCH) consisting of 101 API misuses. We develop an extensible execution framework (APIARTY) to automatically execute multiple repair tools. Our results show that the repair tools are able to generate patches for 28% of the API misuses considered. While the 11 less recent tools are generally fast (the median execution time of the repair attempts is 3.87 minutes and the mean execution time is 30.79 minutes), the three most recent are less efficient (i.e., 98% slower) than their predecessors. The tools generate patches for API misuses that mostly belong to the categories of missing null check, missing value, missing exception, and missing call. Most of the patches generated by all tools are plausible (65%), but only few of these patches are semantically correct to human patches (25%). Our findings suggest that the design of future repair tools should support the localisation of complex bugs, including different categories of API misuses, handling of timeout issues, and ability to configure large software projects. Both APIREPBENCH and APIARTY have been made publicly available for other researchers to evaluate the capabilities of repair tools on detecting and fixing API misuses

    Evaluating Pre-trained Language Models for Repairing API Misuses

    Full text link
    API misuses often lead to software bugs, crashes, and vulnerabilities. While several API misuse detectors have been proposed, there are no automatic repair tools specifically designed for this purpose. In a recent study, test-suite-based automatic program repair (APR) tools were found to be ineffective in repairing API misuses. Still, since the study focused on non-learning-aided APR tools, it remains unknown whether learning-aided APR tools are capable of fixing API misuses. In recent years, pre-trained language models (PLMs) have succeeded greatly in many natural language processing tasks. There is a rising interest in applying PLMs to APR. However, there has not been any study that investigates the effectiveness of PLMs in repairing API misuse. To fill this gap, we conduct a comprehensive empirical study on 11 learning-aided APR tools, which include 9 of the state-of-the-art general-purpose PLMs and two APR tools. We evaluate these models with an API-misuse repair dataset, consisting of two variants. Our results show that PLMs perform better than the studied APR tools in repairing API misuses. Among the 9 pre-trained models tested, CodeT5 is the best performer in the exact match. We also offer insights and potential exploration directions for future research.Comment: Under review by TOSE

    A Comprehensive Empirical Study of Bugs in Open-Source Federated Learning Frameworks

    Full text link
    Federated learning (FL) is a distributed machine learning (ML) paradigm, allowing multiple clients to collaboratively train shared machine learning (ML) models without exposing clients' data privacy. It has gained substantial popularity in recent years, especially since the enforcement of data protection laws and regulations in many countries. To foster the application of FL, a variety of FL frameworks have been proposed, allowing non-experts to easily train ML models. As a result, understanding bugs in FL frameworks is critical for facilitating the development of better FL frameworks and potentially encouraging the development of bug detection, localization and repair tools. Thus, we conduct the first empirical study to comprehensively collect, taxonomize, and characterize bugs in FL frameworks. Specifically, we manually collect and classify 1,119 bugs from all the 676 closed issues and 514 merged pull requests in 17 popular and representative open-source FL frameworks on GitHub. We propose a classification of those bugs into 12 bug symptoms, 12 root causes, and 18 fix patterns. We also study their correlations and distributions on 23 functionalities. We identify nine major findings from our study, discuss their implications and future research directions based on our findings

    Active Learning of Discriminative Subgraph Patterns for API Misuse Detection

    Full text link
    A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern's frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects

    Automatic Fault Detection for Deep Learning Programs Using Graph Transformations

    Full text link
    Nowadays, we are witnessing an increasing demand in both corporates and academia for exploiting Deep Learning (DL) to solve complex real-world problems. A DL program encodes the network structure of a desirable DL model and the process by which the model learns from the training dataset. Like any software, a DL program can be faulty, which implies substantial challenges of software quality assurance, especially in safety-critical domains. It is therefore crucial to equip DL development teams with efficient fault detection techniques and tools. In this paper, we propose NeuraLint, a model-based fault detection approach for DL programs, using meta-modelling and graph transformations. First, we design a meta-model for DL programs that includes their base skeleton and fundamental properties. Then, we construct a graph-based verification process that covers 23 rules defined on top of the meta-model and implemented as graph transformations to detect faults and design inefficiencies in the generated models (i.e., instances of the meta-model). First, the proposed approach is evaluated by finding faults and design inefficiencies in 28 synthesized examples built from common problems reported in the literature. Then NeuraLint successfully finds 64 faults and design inefficiencies in 34 real-world DL programs extracted from Stack Overflow posts and GitHub repositories. The results show that NeuraLint effectively detects faults and design issues in both synthesized and real-world examples with a recall of 70.5 % and a precision of 100 %. Although the proposed meta-model is designed for feedforward neural networks, it can be extended to support other neural network architectures such as recurrent neural networks. Researchers can also expand our set of verification rules to cover more types of issues in DL programs
    • …
    corecore