9 research outputs found

    TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs

    Full text link
    The increasing inclusion of Machine Learning (ML) models in safety critical systems like autonomous cars have led to the development of multiple model-based ML testing techniques. One common denominator of these testing techniques is their assumption that training programs are adequate and bug-free. These techniques only focus on assessing the performance of the constructed model using manually labeled data or automatically generated data. However, their assumptions about the training program are not always true as training programs can contain inconsistencies and bugs. In this paper, we examine training issues in ML programs and propose a catalog of verification routines that can be used to detect the identified issues, automatically. We implemented the routines in a Tensorflow-based library named TFCheck. Using TFCheck, practitioners can detect the aforementioned issues automatically. To assess the effectiveness of TFCheck, we conducted a case study with real-world, mutants, and synthetic training programs. Results show that TFCheck can successfully detect training issues in ML code implementations

    A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects

    Full text link
    Background: Meeting the growing industry demand for Data Science requires cross-disciplinary teams that can translate machine learning research into production-ready code. Software engineering teams value adherence to coding standards as an indication of code readability, maintainability, and developer expertise. However, there are no large-scale empirical studies of coding standards focused specifically on Data Science projects. Aims: This study investigates the extent to which Data Science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? Method: We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity. Results: Data Science projects suffer from a significantly higher rate of functions that use an excessive numbers of parameters and local variables. Data Science projects also follow different variable naming conventions to non-Data Science projects. Conclusions: The differences indicate that Data Science codebases are distinct from traditional software codebases and do not follow traditional software engineering conventions. Our conjecture is that this may be because traditional software engineering conventions are inappropriate in the context of Data Science projects.Comment: 11 pages, 7 figures. To appear in ESEM 2020. Updated based on peer revie

    AdopciĂłn de software de cĂłdigo abierto entre los estudiantes universitarios de paĂ­ses emergentes

    Get PDF
    Open-source software (OSS) has become a valuable resource for corporate, educational, and social processes, reducing digital divides in emerging countries. This paper proposes an open-source software acceptance model (OSS-AM) to examine determinants of OSS adoption among students in emerging economies. A quantitative methodology with a descriptive correlational approach was employed, collecting data from a representative sample of 504 students. Confirmatory factor analysis showed strong associations between attitude towards use and variables such as compatibility, quality, flexibility, and security. This study reveals that skill development through practical education, perceived usefulness, training, and compatibility are the most influential factors in students’ adoption of OS

    Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study

    Full text link
    Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. While hybrid approaches aim for the best of both worlds, the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges—and resultant bugs—involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation—the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators

    Machine learning methods for manufacturing

    Get PDF
    Machine learning methods have become increasingly popular with the release of numerous open-source tools and libraries. Nevertheless the adoption of these techniques for use in manufacturing has been limited in practice. Manufacturing is still mostly dependent on traditional statistical methods and tools, even though machine learning methods could be applied to data that is already being collected from measurements done during manufacturing processes. The purpose of this thesis is to introduce four different machine learning methods, that could prove to be useful in a manufacturing setting, and several different methods relating to the preprocessing of data and preliminary data analysis. The machine learning methods introduced are support vector machines, random forests, neural networks and NARX (non-linear autoregressive exogenous) neural networks. The algorithms and the history behind the methods introduced are explained, along with suggestions for some popular implementations of the algorithms, and the performance of each the methods is evaluated using a domain appropriate dataset. Knowledge of the machine learning methods introduced in this thesis are an important addition to the toolkit of anyone doing predictive analytics
    corecore