1,072 research outputs found

    Fuzzing Deep Learning Compilers with HirGen

    Full text link
    Deep Learning (DL) compilers are widely adopted to optimize advanced DL models for efficient deployment on diverse hardware. Their quality has profound effect on the quality of compiled DL models. A recent bug study shows that the optimization of high-level intermediate representation (IR) is the most error-prone compilation stage. Bugs in this stage are accountable for 44.92% of the whole collected ones. However, existing testing techniques do not consider high-level optimization related features (e.g. high-level IR), and are therefore weak in exposing bugs at this stage. To bridge this gap, we propose HirGen, an automated testing technique that aims to effectively expose coding mistakes in the optimization of high-level IR. The design of HirGen includes 1) three coverage criteria to generate diverse and valid computational graphs; 2) full use of high-level IRs language features to generate diverse IRs; 3) three test oracles inspired from both differential testing and metamorphic testing. HirGen has successfully detected 21 bugs that occur at TVM, with 17 bugs confirmed and 12 fixed. Further, we construct four baselines using the state-of-the-art DL compiler fuzzers that can cover the high-level optimization stage. Our experiment results show that HirGen can detect 10 crashes and inconsistencies that cannot be detected by the baselines in 48 hours. We further validate the usefulness of our proposed coverage criteria and test oracles in evaluation

    Automated Fixing of Programs with Contracts

    Full text link
    This paper describes AutoFix, an automatic debugging technique that can fix faults in general-purpose software. To provide high-quality fix suggestions and to enable automation of the whole debugging process, AutoFix relies on the presence of simple specification elements in the form of contracts (such as pre- and postconditions). Using contracts enhances the precision of dynamic analysis techniques for fault detection and localization, and for validating fixes. The only required user input to the AutoFix supporting tool is then a faulty program annotated with contracts; the tool produces a collection of validated fixes for the fault ranked according to an estimate of their suitability. In an extensive experimental evaluation, we applied AutoFix to over 200 faults in four code bases of different maturity and quality (of implementation and of contracts). AutoFix successfully fixed 42% of the faults, producing, in the majority of cases, corrections of quality comparable to those competent programmers would write; the used computational resources were modest, with an average time per fix below 20 minutes on commodity hardware. These figures compare favorably to the state of the art in automated program fixing, and demonstrate that the AutoFix approach is successfully applicable to reduce the debugging burden in real-world scenarios.Comment: Minor changes after proofreadin

    Automated blackbox GUI specifications enhancement and test data generation

    Get PDF
    Applications with a Graphical User Interface (GUI) front-end are ubiquitous nowadays. While automated model-based approaches have been shown to be effective in testing of such applications, most existing techniques produce many infeasible event sequences used as GUI test cases. This happens primarily because the behavioral specifications of the GUI under test are ignored. In this dissertation we present an automated framework that reveals an important set of state-based constraints among GUI events based on infeasible (i.e., unexecutable or partially executable) test cases of a GUI test suite. GUIDiVa, an iterative algorithm at the core of our framework, enumerates all possible constraint violations as potential reasons for test case failure, on the failed event of an infeasible test case. It then selects and adds the most promising constraints of each iteration to a final set based on the Validity Weight of constraints. The results of empirical studies on both seeded and nine non-trivial open-source study subjects show that our framework is capable of capturing important aspects of GUI behavior in the form of state-based event constraints, while considerably reducing the number of insfeasible test cases. The second part of this dissertation deals with the problem of automatic generation of relevant test data for parameterized GUI events (i.e., events associated with widgets that accept user inputs such as textboxes and textareas). Current techniques either manipulate the source code of the application under test (AUT) to generate the test data, or blindly use a set of random string values. We propose a novel way to generate the test data by exploiting the information provided in the GUI structure to extract a set of key identifiers for each parameterized GUI widget. These identifiers are used to compose appropriate online search phrases and collect relevant test data from the Internet. The results of an empirical study on five GUI-based applications show that the proposed approach is applicable and results in execution of some hard-to-cover branches in the subject programs. The proposed technique works from a black-box perspective and is entirely independent from GUI modeling and event sequence generation, thus it does not require source code access and offers the possibility of being integrated with existing GUI testing frameworks

    Revised reference model

    Get PDF
    This document contains an update of the HIDENETS Reference Model, whose preliminary version was introduced in D1.1. The Reference Model contains the overall approach to development and assessment of end-to-end resilience solutions. As such, it presents a framework, which due to its abstraction level is not only restricted to the HIDENETS car-to-car and car-to-infrastructure applications and use-cases. Starting from a condensed summary of the used dependability terminology, the network architecture containing the ad hoc and infrastructure domain and the definition of the main networking elements together with the software architecture of the mobile nodes is presented. The concept of architectural hybridization and its inclusion in HIDENETS-like dependability solutions is described subsequently. A set of communication and middleware level services following the architecture hybridization concept and motivated by the dependability and resilience challenges raised by HIDENETS-like scenarios is then described. Besides architecture solutions, the reference model addresses the assessment of dependability solutions in HIDENETS-like scenarios using quantitative evaluations, realized by a combination of top-down and bottom-up modelling, as well as verification via test scenarios. In order to allow for fault prevention in the software development phase of HIDENETS-like applications, generic UML-based modelling approaches with focus on dependability related aspects are described. The HIDENETS reference model provides the framework in which the detailed solution in the HIDENETS project are being developed, while at the same time facilitating the same task for non-vehicular scenarios and application

    Joint use of static and dynamic software verification techniques: a cross-domain view in safety critical system industries

    Get PDF
    International audienceHow different are the approaches to combining formal methods (FM) and testing in the safety standards of the automotive, aeronautic, nuclear, process, railway and space industries? This is the question addressed in this paper by a cross-domain group of experts involved in the revision committees of ISO 26262, DO-178C, IEC 60880, IEC 61508, EN 50128 and ECSS-Q-ST-8OC. First we review some commonalities and differences regarding application of formal methods in theaforementioned standards. Are they mandatory or recommended only? What kind of properties are they advised to be applied to? What is specified in the different standards regarding coverage (both functional and structural) if testing and formal methods are used jointly?We also account for the return on experience of the group members in the six industrial domains regarding state of the art practice of joint use of formal methods and testing. Where did formal methods actually prove to outperform testing? Then we discuss verification coverage, and more specifically the role of structural coverage. Does structural coverage play the same role in all the standards? Is it specific to testing and irrelevant for formal methods? What verification terminationcriteria is applicable in case FM-test mix? We conclude on some prospective views on how software safety standards may evolve to maximize the benefits of joint use of dynamic (testing) and static (FM) verification methods

    Deep language models for software testing and optimisation

    Get PDF
    Developing software is difficult. A challenging part of production development is ensuring programs are correct and fast, two properties satisfied with software testing and optimisation. While both tasks still rely on manual effort and expertise, the recent surge in software applications has led them to become tedious and time-consuming. Under this fast-pace environment, manual testing and optimisation hinders productivity significantly and leads to error-prone or sub-optimal programs that waste energy and lead users to frustration. In this thesis, we propose three novel approaches to automate software testing and optimisation with modern language models based on deep learning. In contrast to our methods, existing few techniques in these two domains have limited scalability and struggle when they face real-world applications. Our first contribution lies in the field of software testing and aims to automate the test oracle problem, which is the procedure of determining the correctness of test executions. The test oracle is still largely manual, relying on human experts. Automating the oracle is a non-trivial task that requires software specifications or derived information that are often too difficult to extract. We present the first application of deep language models over program execution traces to predict runtime correctness. Our technique classifies test executions of large-scale codebases used in production as “pass” or “fail”. Our proposed approach reduces by 86% the amount of test inputs an expert has to label by training only on 14% and classifying the rest automatically. Our next two contributions improve the effectiveness of compiler optimisation. Compilers optimise programs by applying heuristic-based transformations constructed by compiler engineers. Selecting the right transformations requires extensive knowledge of the compiler, the subject program and the target architecture. Predictive models have been successfully used to automate heuristics construction but their performance is hindered by a shortage of training benchmarks in quantity and feature diversity. Our next contributions address the scarcity of compiler benchmarks by generating human-likely synthetic programs to improve the performance of predictive models. Our second contribution is BENCHPRESS, the first steerable deep learning synthesizer for executable compiler benchmarks. BENCHPRESS produces human-like programs that compile at a rate of 87%. It targets parts of the feature space previously unreachable by other synthesizers, addressing the scarcity of high-quality training data for compilers. BENCHPRESS improves the performance of a device mapping predictive model by 50% when it introduces synthetic benchmarks into its training data. BENCHPRESS is restricted by a feature-agnostic synthesizer that requires thou sands of random inferences to select a few that target the desired features. Our third contribution addresses this inefficiency. We develop BENCHDIRECT, a directed language model for compiler benchmark generation. BENCHDIRECT synthesizes programs by jointly observing the source code context and the compiler features that are targeted. This enables efficient steerable generation on large scale tasks. Compared to BENCHPRESS, BENCHDIRECT matches successfully 1.8× more Rodinia target benchmarks, while it is up to 36% more accurate and up to 72% faster in targeting three different feature spaces for compilers. All three contributions demonstrate the exciting potential of deep learning and language models to simplify the testing of programs and the construction of better optimi sation heuristics for compilers. The outcomes of this thesis provides developers with tools to keep up with the rapidly evolving landscape of software engineering

    AUTOMATED DEBUGGING AND FAULT LOCALIZATION OF MATLAB/SIMULINK MODELS

    Get PDF
    Matlab/Simulink is an advanced environment for modeling and simulating multidomain dynamic systems. It has been widely used to model advanced Cyber-Physical Systems, e.g. in the automotive or avionics industry. To ensure the reliability of Simulink models (i.e., ensuring that they are free of faults), these models are subject to extensive testing to verify the logic and behavior of software modules developed in the models. Due to the complex structure of Simulink models, finding root causes of failures (i.e., faults) is an expensive and time-consuming task. Therefore, there is a high demand for automatic fault localization techniques that can help en- gineers to locate faults in Simulink models with less human intervention. This demand leads to the proposal and development of various approaches and techniques that are able to automatically locate faults in Simulink models. Fault localization has been an active research area that focuses on developing automated tech- niques to support software debugging. Although there have been many techniques proposed to localize faults in programs, there has not been much research on fault localization for Simulink models. In this dissertation, we investigate and develop a lightweight fault localization approach to automatically and accurately locate faults in Simulink models. To enhance the usability of our approach, we also develop a stand-alone desktop application that provides engineers with a usable interface to facilitate localization of faults in their models
    • 

    corecore