184 research outputs found

    Demystifying Dependency Bugs in Deep Learning Stack

    Full text link
    Deep learning (DL) applications, built upon a heterogeneous and complex DL stack (e.g., Nvidia GPU, Linux, CUDA driver, Python runtime, and TensorFlow), are subject to software and hardware dependencies across the DL stack. One challenge in dependency management across the entire engineering lifecycle is posed by the asynchronous and radical evolution and the complex version constraints among dependencies. Developers may introduce dependency bugs (DBs) in selecting, using and maintaining dependencies. However, the characteristics of DBs in DL stack is still under-investigated, hindering practical solutions to dependency management in DL stack. To bridge this gap, this paper presents the first comprehensive study to characterize symptoms, root causes and fix patterns of DBs across the whole DL stack with 446 DBs collected from StackOverflow posts and GitHub issues. For each DB, we first investigate the symptom as well as the lifecycle stage and dependency where the symptom is exposed. Then, we analyze the root cause as well as the lifecycle stage and dependency where the root cause is introduced. Finally, we explore the fix pattern and the knowledge sources that are used to fix it. Our findings from this study shed light on practical implications on dependency management

    A study of recent contributions on performance and simulation techniques for accelerator devices

    Get PDF
    High performance computing platform is moving from homogeneous individual unites to heterogeneous systems where each unit is a combination of homogeneous cores and accelerator devices. Accelerators such as GPUs, FPGAs, DSPs, are usually designed for specific and intensive type of computing tasks. The presence of these devises have created fresh and attractive development platforms for developers and designers as well as novel performance analysis frameworks and optimization tools. This is the cutting edge in performance of some accelerator devices like: GPUs and Intel’s Xeon Phi. We outline some of the existing heterogeneous systems and their development frameworks. The core of this study is a review of performance modeling of these devices

    Demystifying Developers' Issues in Distributed Training of Deep Learning Software

    Full text link
    Deep learning (DL) has been pervasive in a wide spectrum of nowadays software systems and applications. The rich features of these DL based software applications (i.e., DL software) usually rely on powerful DL models. To train powerful DL models with large datasets efficiently, it has been a common practice for developers to parallelize and distribute the computation and memory over multiple devices in the training process, which is known as distributed training. However, existing efforts in the software engineering (SE) research community mainly focus on issues in the general process of training DL models. In contrast, to the best of our knowledge, issues that developers encounter in distributed training have never been well studied. Given the surging importance of distributed training in the current practice of developing DL software, this paper fills in the knowledge gap and presents the first comprehensive study on developers' issues in distributed training. To this end, we extract and analyze 1,054 real-world developers' issues in distributed training from Stack Overflow and GitHub, two commonly used data sources for studying software issues. We construct a fine-grained taxonomy consisting of 30 categories regarding the fault symptoms and summarize common fix patterns for different symptoms. Based on the results, we suggest actionable implications and research avenues that can potentially facilitate the future development of distributed training

    Improving the Reliability of Deep Learning Software Systems

    Get PDF
    For the last decade, deep learning (DL) has emerged as a new effective machine learning approach that is capable of solving difficult challenges. Due to their increasing effectiveness, DL approaches have been applied widely in commercial products such as social media platforms and self-driving cars. Such widespread application in critical areas means that mistakes caused by bugs in such DL systems would lead to serious consequences. Our research focuses on improving the reliability of such DL systems. At a high level, the DL systems development process starts with labeled data. This data is then used to train the DL model with some training methods. Once the model is trained, it can be used to create predictions for some unlabeled data in the inference stage. In this thesis, we present testing and analysis techniques that help improve the DL system reliability for all stages. In the first work, CRADLE, we improve the reliability of the DL system inference by applying differential testing to find bugs in DL libraries. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. We leverage equivalent DL libraries to overcome this challenge. CRADLE focuses on finding and localizing bugs in DL software libraries by performing cross-implementation inconsistency checking to detect bugs, and leveraging anomaly propagation tracking and analysis to localize faulty functions that cause the bugs. CRADLE detects 12 bugs in three libraries (TensorFlow, CNTK, and Theano), and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies. Our second work is the first to study the variance of DL systems training and the awareness of this variance among researchers and practitioners. Our experiments show large overall accuracy differences among identical training runs. Even after excluding weak models, the accuracy difference is 10.8%. In addition, implementation-level factors alone cause the accuracy difference across identical training runs to be up to 2.9%. Our researcher and practitioner survey shows that 83.8% of the 901 participants are unaware of or unsure about any implementation-level variance. This work raises awareness of DL training variance and directs SE researchers to challenging tasks such as creating deterministic DL implementations to facilitate debugging and improving the reproducibility of DL software and results. DL systems perform well on static test sets coming from the same distribution as training sets but may not be robust in real-world deployments because of the fundamental assumption that the training data represents the real world data well. In cases where the training data misses samples from the real-world distribution, it is said to contain blindspots. In practice, it is more likely a training dataset contains weakspots (i.e., a weaker form of blindspots, where the training data contains some samples that represent the real world but it does not contain enough). In the third work, we propose a new procedure to detect weakspots in training data and to improve the DL system with minimum labeling effort. This procedure leverages the variance of the DL training process to detect highly varying data samples that could indicate the weakspots. Metrics that measure such variance can also be used to rank new samples to prioritize the labeling of additional training data that can improve the DL system accuracy when applied to the real world. Our evaluation shows that, in scenarios where the weakspots are severe, our procedure improves the model accuracy on weakspot samples by 25.2% requiring 2% of additional training data. This is an improvement of 4.5 percentage points compared to the traditional single model metric with the same amount of additional training data

    Software testing or the bugs’ nightmare

    Get PDF
    Software development is not error-free. For decades, bugs –including physical ones– have become a significant development problem requiring major maintenance efforts. Even in some cases, solving bugs led to increment them. One of the main reasons for bug’s prominence is their ability to hide. Finding them is difficult and costly in terms of time and resources. However, software testing made significant progress identifying them by using different strategies that combine knowledge from every single part of the program. This paper humbly reviews some different approaches from software testing that discover bugs automatically and presents some different state-of-the-art methods and tools currently used in this area. It covers three testing strategies: search-based methods, symbolic execution, and fuzzers. It also provides some income about the application of diversity in these areas, and common and future challenges on automatic test generation that still need to be addressed
    • …