1,072 research outputs found
Fuzzing Deep Learning Compilers with HirGen
Deep Learning (DL) compilers are widely adopted to optimize advanced DL
models for efficient deployment on diverse hardware. Their quality has profound
effect on the quality of compiled DL models. A recent bug study shows that the
optimization of high-level intermediate representation (IR) is the most
error-prone compilation stage. Bugs in this stage are accountable for 44.92% of
the whole collected ones. However, existing testing techniques do not consider
high-level optimization related features (e.g. high-level IR), and are
therefore weak in exposing bugs at this stage. To bridge this gap, we propose
HirGen, an automated testing technique that aims to effectively expose coding
mistakes in the optimization of high-level IR. The design of HirGen includes 1)
three coverage criteria to generate diverse and valid computational graphs; 2)
full use of high-level IRs language features to generate diverse IRs; 3) three
test oracles inspired from both differential testing and metamorphic testing.
HirGen has successfully detected 21 bugs that occur at TVM, with 17 bugs
confirmed and 12 fixed. Further, we construct four baselines using the
state-of-the-art DL compiler fuzzers that can cover the high-level optimization
stage. Our experiment results show that HirGen can detect 10 crashes and
inconsistencies that cannot be detected by the baselines in 48 hours. We
further validate the usefulness of our proposed coverage criteria and test
oracles in evaluation
Automated Fixing of Programs with Contracts
This paper describes AutoFix, an automatic debugging technique that can fix
faults in general-purpose software. To provide high-quality fix suggestions and
to enable automation of the whole debugging process, AutoFix relies on the
presence of simple specification elements in the form of contracts (such as
pre- and postconditions). Using contracts enhances the precision of dynamic
analysis techniques for fault detection and localization, and for validating
fixes. The only required user input to the AutoFix supporting tool is then a
faulty program annotated with contracts; the tool produces a collection of
validated fixes for the fault ranked according to an estimate of their
suitability.
In an extensive experimental evaluation, we applied AutoFix to over 200
faults in four code bases of different maturity and quality (of implementation
and of contracts). AutoFix successfully fixed 42% of the faults, producing, in
the majority of cases, corrections of quality comparable to those competent
programmers would write; the used computational resources were modest, with an
average time per fix below 20 minutes on commodity hardware. These figures
compare favorably to the state of the art in automated program fixing, and
demonstrate that the AutoFix approach is successfully applicable to reduce the
debugging burden in real-world scenarios.Comment: Minor changes after proofreadin
Automated blackbox GUI specifications enhancement and test data generation
Applications with a Graphical User Interface (GUI) front-end are ubiquitous nowadays.
While automated model-based approaches have been shown to be effective in testing of such applications, most existing techniques produce many infeasible event sequences used as GUI test cases. This happens primarily because the behavioral specifications of the GUI under test are ignored. In this dissertation we present an automated framework that reveals an important set of state-based constraints among GUI events based on infeasible (i.e., unexecutable or partially executable) test cases of a GUI test suite. GUIDiVa, an iterative algorithm at the core of our framework, enumerates all possible constraint violations as potential reasons for test case failure, on the failed event of an infeasible test case. It then selects and adds the most promising constraints of each iteration to a final set based on the Validity Weight of constraints. The results of empirical studies on both seeded and nine non-trivial open-source study subjects show that our framework is capable of capturing important aspects of GUI behavior in the form of state-based event constraints, while considerably reducing the number of insfeasible test cases. The second part of this dissertation deals with the problem of automatic generation of relevant test data for parameterized GUI events (i.e., events associated with widgets that accept user inputs such as textboxes and textareas). Current techniques either manipulate the source code of the application under test (AUT) to generate the test data, or blindly use a set of random string values. We propose a novel way to generate the test data by exploiting the information provided in the GUI structure to extract a set of key identifiers for each parameterized GUI widget. These identifiers are used to compose appropriate online search phrases and collect relevant test data from the Internet. The results of an empirical study on five GUI-based applications show that the proposed approach is applicable and results in execution of some hard-to-cover branches in the subject programs. The proposed technique works from a black-box perspective and is entirely independent from GUI modeling and event sequence generation, thus it does not require source code access and offers the possibility of being integrated with existing GUI testing frameworks
Revised reference model
This document contains an update of the HIDENETS Reference Model, whose preliminary version was introduced in D1.1. The Reference Model contains the overall approach to development and assessment of end-to-end resilience solutions. As such, it presents a framework, which due to its abstraction level is not only restricted to the HIDENETS car-to-car and car-to-infrastructure applications and use-cases. Starting from a condensed summary of the used dependability terminology, the network architecture containing the ad hoc and infrastructure domain and the definition of the main networking elements together with the software architecture of the mobile nodes is presented. The concept of architectural hybridization and its inclusion in HIDENETS-like dependability solutions is described subsequently. A set of communication and middleware level services following the architecture hybridization concept and motivated by the dependability and resilience challenges raised by HIDENETS-like scenarios is then described. Besides architecture solutions, the reference model addresses the assessment of dependability solutions in HIDENETS-like scenarios using quantitative evaluations, realized by a combination of top-down and bottom-up modelling, as well as verification via test scenarios. In order to allow for fault prevention in the software development phase of HIDENETS-like applications, generic UML-based modelling approaches with focus on dependability related aspects are described. The HIDENETS reference model provides the framework in which the detailed solution in the HIDENETS project are being developed, while at the same time facilitating the same task for non-vehicular scenarios and application
Joint use of static and dynamic software verification techniques: a cross-domain view in safety critical system industries
International audienceHow different are the approaches to combining formal methods (FM) and testing in the safety standards of the automotive, aeronautic, nuclear, process, railway and space industries? This is the question addressed in this paper by a cross-domain group of experts involved in the revision committees of ISO 26262, DO-178C, IEC 60880, IEC 61508, EN 50128 and ECSS-Q-ST-8OC. First we review some commonalities and differences regarding application of formal methods in theaforementioned standards. Are they mandatory or recommended only? What kind of properties are they advised to be applied to? What is specified in the different standards regarding coverage (both functional and structural) if testing and formal methods are used jointly?We also account for the return on experience of the group members in the six industrial domains regarding state of the art practice of joint use of formal methods and testing. Where did formal methods actually prove to outperform testing? Then we discuss verification coverage, and more specifically the role of structural coverage. Does structural coverage play the same role in all the standards? Is it specific to testing and irrelevant for formal methods? What verification terminationcriteria is applicable in case FM-test mix? We conclude on some prospective views on how software safety standards may evolve to maximize the benefits of joint use of dynamic (testing) and static (FM) verification methods
Deep language models for software testing and optimisation
Developing software is difficult. A challenging part of production development is ensuring programs are correct and fast, two properties satisfied with software testing and
optimisation. While both tasks still rely on manual effort and expertise, the recent
surge in software applications has led them to become tedious and time-consuming.
Under this fast-pace environment, manual testing and optimisation hinders productivity significantly and leads to error-prone or sub-optimal programs that waste energy
and lead users to frustration. In this thesis, we propose three novel approaches to automate software testing and optimisation with modern language models based on deep
learning. In contrast to our methods, existing few techniques in these two domains
have limited scalability and struggle when they face real-world applications.
Our first contribution lies in the field of software testing and aims to automate
the test oracle problem, which is the procedure of determining the correctness of test
executions. The test oracle is still largely manual, relying on human experts. Automating the oracle is a non-trivial task that requires software specifications or derived
information that are often too difficult to extract. We present the first application of
deep language models over program execution traces to predict runtime correctness.
Our technique classifies test executions of large-scale codebases used in production as
âpassâ or âfailâ. Our proposed approach reduces by 86% the amount of test inputs an
expert has to label by training only on 14% and classifying the rest automatically.
Our next two contributions improve the effectiveness of compiler optimisation.
Compilers optimise programs by applying heuristic-based transformations constructed
by compiler engineers. Selecting the right transformations requires extensive knowledge of the compiler, the subject program and the target architecture. Predictive models
have been successfully used to automate heuristics construction but their performance
is hindered by a shortage of training benchmarks in quantity and feature diversity. Our
next contributions address the scarcity of compiler benchmarks by generating human-likely synthetic programs to improve the performance of predictive models.
Our second contribution is BENCHPRESS, the first steerable deep learning synthesizer for executable compiler benchmarks. BENCHPRESS produces human-like programs that compile at a rate of 87%. It targets parts of the feature space previously
unreachable by other synthesizers, addressing the scarcity of high-quality training data
for compilers. BENCHPRESS improves the performance of a device mapping predictive model by 50% when it introduces synthetic benchmarks into its training data. BENCHPRESS is restricted by a feature-agnostic synthesizer that requires thou sands of random inferences to select a few that target the desired features. Our third
contribution addresses this inefficiency. We develop BENCHDIRECT, a directed language model for compiler benchmark generation. BENCHDIRECT synthesizes programs by jointly observing the source code context and the compiler features that
are targeted. This enables efficient steerable generation on large scale tasks. Compared to BENCHPRESS, BENCHDIRECT matches successfully 1.8Ă more Rodinia target benchmarks, while it is up to 36% more accurate and up to 72% faster in targeting
three different feature spaces for compilers.
All three contributions demonstrate the exciting potential of deep learning and language models to simplify the testing of programs and the construction of better optimi sation heuristics for compilers. The outcomes of this thesis provides developers with
tools to keep up with the rapidly evolving landscape of software engineering
AUTOMATED DEBUGGING AND FAULT LOCALIZATION OF MATLAB/SIMULINK MODELS
Matlab/Simulink is an advanced environment for modeling and simulating multidomain dynamic systems. It has been widely used to model advanced Cyber-Physical Systems, e.g. in the automotive or avionics industry. To ensure the reliability of Simulink models (i.e., ensuring that they are free of faults), these models are subject to extensive testing to verify the logic and behavior of software modules developed in the models. Due to the complex structure of Simulink models, finding root causes of failures (i.e., faults) is an expensive and time-consuming task. Therefore, there is a high demand for automatic fault localization techniques that can help en- gineers to locate faults in Simulink models with less human intervention. This demand leads to the proposal and development of various approaches and techniques that are able to automatically locate faults in Simulink models. Fault localization has been an active research area that focuses on developing automated tech- niques to support software debugging. Although there have been many techniques proposed to localize faults in programs, there has not been much research on fault localization for Simulink models. In this dissertation, we investigate and develop a lightweight fault localization approach to automatically and accurately locate faults in Simulink models. To enhance the usability of our approach, we also develop a stand-alone desktop application that provides engineers with a usable interface to facilitate localization of faults in their models
- âŠ