17,144 research outputs found

    Towards Structured Evaluation of Deep Neural Network Supervisors

    Full text link
    Deep Neural Networks (DNN) have improved the quality of several non-safety related products in the past years. However, before DNNs should be deployed to safety-critical applications, their robustness needs to be systematically analyzed. A common challenge for DNNs occurs when input is dissimilar to the training set, which might lead to high confidence predictions despite proper knowledge of the input. Several previous studies have proposed to complement DNNs with a supervisor that detects when inputs are outside the scope of the network. Most of these supervisors, however, are developed and tested for a selected scenario using a specific performance metric. In this work, we emphasize the need to assess and compare the performance of supervisors in a structured way. We present a framework constituted by four datasets organized in six test cases combined with seven evaluation metrics. The test cases provide varying complexity and include data from publicly available sources as well as a novel dataset consisting of images from simulated driving scenarios. The latter we plan to make publicly available. Our framework can be used to support DNN supervisor evaluation, which in turn could be used to motive development, validation, and deployment of DNNs in safety-critical applications.Comment: Preprint of paper accepted for presentation at The First IEEE International Conference on Artificial Intelligence Testing, April 4-9, 2019, San Francisco East Bay, California, US

    On Improving Validity of Deep Neural Networks in Safety Critical Applications

    Get PDF
    Context: Deep learning has proven to be a valuable component in object detection and classification, as the technique has shown an increased performance throughput compared to traditional software algorithms. Deep learning refers to the process, in which an optimisation process learns an algorithm through a set of labeled data, where the researcher defines an architecture rather than the algorithm itself. As the resulting model contains abstract features retrieved through the optimisation process, new unsolved challenges emerge that need to be resolved before deploying these models in safety critical applications. Aim: The aim of this Licentiate thesis has been to study what extensions are necessary to verify deep neural networks. Furthermore, the thesis studies one challenge in detail: how out-of-distribution samples can be detected and excluded. Method:A comparative framework has been constructed to evaluate performance of out-of-distribution detection methods on common ground. To achieve this, the top performing candidates from recent publications were used as a reference snowballing baseline, from which a set of candidates were studied. From the study, common features were studied and included in the comparative framework. Furthermore, the thesis conducted semi-structured interviews to understand the challenges of deploying deep neural networks in industrial safety critical applications. Results: The thesis found that the main issue with deployment is traceability and quality quantification, in the form that deep learning lacks proper descriptions of how to design test cases, training datasets and robustness of the model itself. While deep learning performance is commendable, error tracing is challenging as the abstract features in the do not have any direct connection to the training samples. In addition, the training phase lacks proper measures to quantify diversity within the dataset, especially for the vastly different scenarios that exist in the real world. One safety method studied in this thesis is to utilize an out-of-distribution detector as a safety measure. The benefit of this measure is that it can both identify and mitigate potential hazards. From our literature review it became apparent that each detector was compared with different techniques, hence a framework was constructed that allowed for extensive and fair comparison. In addition, when utilizing the framework, robustness issues of the detector were found, where performance could drastically change depending on small variations in the deep neural network. Future work: Future works recommend testing the outlier detectors on real world scenarios, and show how the detector can be part of a safety strategy argumentation

    Machine learning for early detection of traffic congestion using public transport traffic data

    Get PDF
    The purpose of this project is to provide better knowledge of how the bus travel times is affected by congestion and other problems in the urban traffic environment. The main source of data for this study is second-level measurements coming from all buses in the Linköping region showing the location of each vehicle.The main goal of this thesis is to propose, implement, test and optimize a machine learning algorithm based on data collected from regional buses from Sweden so that it is able to perform predictions on the future state of the urban traffic.El objetivo principal de este proyecto es proponer, implementar, probar y optimizar un algoritmo de aprendizaje automático basado en datos recopilados de autobuses regionales de Suecia para que poder realizar predicciones sobre el estado futuro del tráfico urbano.L'objectiu principal d'aquest projecte és proposar, implementar, provar i optimitzar un algoritme de machine learning basat en dades recollides a partir d'autobusos regionals de Suècia de manera per poder realitzar prediccions sobre l'estat futur del trànsit urbà

    DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities

    Full text link
    Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry. Traditionally, software security is safeguarded by designated rule-based detectors that heavily rely on empirical expertise, requiring tremendous effort from software experts to generate rule repositories for large code corpus. Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities. However, prior learning-based works only break programs down into a sequence of word tokens for extracting contextual features of codes, or apply GNN largely on homogeneous graph representation (e.g., AST) without discerning complex types of underlying program entities (e.g., methods, variables). In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph and adapt a well-known heterogeneous graph network with a dual-supervisor structure for the corresponding graph learning task. Using the prototype built, we have conducted extensive experiments on both synthetic datasets and real-world projects. Compared with the state-of-the-art baselines, the results demonstrate promising effectiveness in this research direction in terms of vulnerability detection performance (average F1 improvements over 10\% in real-world projects) and transferability from C/C++ to other programming languages (average F1 improvements over 11%)

    Generating and Detecting True Ambiguity: A Forgotten Danger in DNN Supervision Testing

    Full text link
    Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their labels. Recent work proposed DNN supervisors to detect high-uncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators aim to produce out-of-distribution inputs. No existing model- and supervisor independent technique targets the generation of truly ambiguous test inputs, i.e., inputs that admit multiple classes according to expert human judgment. In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. AmbiGuess is based on gradient-guided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is -- to the best of our knowledge -- the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones. We find that the tested supervisors' capabilities are complementary: Those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.Comment: Accepted for publication at Springers "Empirical Software Engineering" (EMSE
    corecore