17,144 research outputs found
Towards Structured Evaluation of Deep Neural Network Supervisors
Deep Neural Networks (DNN) have improved the quality of several non-safety
related products in the past years. However, before DNNs should be deployed to
safety-critical applications, their robustness needs to be systematically
analyzed. A common challenge for DNNs occurs when input is dissimilar to the
training set, which might lead to high confidence predictions despite proper
knowledge of the input. Several previous studies have proposed to complement
DNNs with a supervisor that detects when inputs are outside the scope of the
network. Most of these supervisors, however, are developed and tested for a
selected scenario using a specific performance metric. In this work, we
emphasize the need to assess and compare the performance of supervisors in a
structured way. We present a framework constituted by four datasets organized
in six test cases combined with seven evaluation metrics. The test cases
provide varying complexity and include data from publicly available sources as
well as a novel dataset consisting of images from simulated driving scenarios.
The latter we plan to make publicly available. Our framework can be used to
support DNN supervisor evaluation, which in turn could be used to motive
development, validation, and deployment of DNNs in safety-critical
applications.Comment: Preprint of paper accepted for presentation at The First IEEE
International Conference on Artificial Intelligence Testing, April 4-9, 2019,
San Francisco East Bay, California, US
On Improving Validity of Deep Neural Networks in Safety Critical Applications
Context: Deep learning has proven to be a valuable component in object detection and classification, as the technique has shown an increased performance throughput compared to traditional software algorithms. Deep learning refers to the process, in which an optimisation process learns an algorithm through a set of labeled data, where the researcher defines an architecture rather than the algorithm itself. As the resulting model contains abstract features retrieved through the optimisation process, new unsolved challenges emerge that need to be resolved before deploying these models in safety critical applications. Aim: The aim of this Licentiate thesis has been to study what extensions are necessary to verify deep neural networks. Furthermore, the thesis studies one challenge in detail: how out-of-distribution samples can be detected and excluded. Method:A comparative framework has been constructed to evaluate performance of out-of-distribution detection methods on common ground. To achieve this, the top performing candidates from recent publications were used as a reference snowballing baseline, from which a set of candidates were studied. From the study, common features were studied and included in the comparative framework. Furthermore, the thesis conducted semi-structured interviews to understand the challenges of deploying deep neural networks in industrial safety critical applications. Results: The thesis found that the main issue with deployment is traceability and quality quantification, in the form that deep learning lacks proper descriptions of how to design test cases, training datasets and robustness of the model itself. While deep learning performance is commendable, error tracing is challenging as the abstract features in the do not have any direct connection to the training samples. In addition, the training phase lacks proper measures to quantify diversity within the dataset, especially for the vastly different scenarios that exist in the real world. One safety method studied in this thesis is to utilize an out-of-distribution detector as a safety measure. The benefit of this measure is that it can both identify and mitigate potential hazards. From our literature review it became apparent that each detector was compared with different techniques, hence a framework was constructed that allowed for extensive and fair comparison. In addition, when utilizing the framework, robustness issues of the detector were found, where performance could drastically change depending on small variations in the deep neural network. Future work: Future works recommend testing the outlier detectors on real world scenarios, and show how the detector can be part of a safety strategy argumentation
Machine learning for early detection of traffic congestion using public transport traffic data
The purpose of this project is to provide better knowledge of how the bus travel times is affected by congestion and other problems in the urban traffic environment. The main source of data for this study is second-level measurements coming from all buses in the Linköping region showing the location of each vehicle.The main goal of this thesis is to propose, implement, test and optimize a machine learning algorithm based on data collected from regional buses from Sweden so that it is able to perform predictions on the future state of the urban traffic.El objetivo principal de este proyecto es proponer, implementar, probar y optimizar un algoritmo de aprendizaje automático basado en datos recopilados de autobuses regionales de Suecia para que poder realizar predicciones sobre el estado futuro del tráfico urbano.L'objectiu principal d'aquest projecte és proposar, implementar, provar i optimitzar un algoritme de machine learning basat en dades recollides a partir d'autobusos regionals de Suècia de manera per poder realitzar prediccions sobre l'estat futur del trànsit urbà
DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities
Vulnerability detection is a critical problem in software security and
attracts growing attention both from academia and industry. Traditionally,
software security is safeguarded by designated rule-based detectors that
heavily rely on empirical expertise, requiring tremendous effort from software
experts to generate rule repositories for large code corpus. Recent advances in
deep learning, especially Graph Neural Networks (GNN), have uncovered the
feasibility of automatic detection of a wide range of software vulnerabilities.
However, prior learning-based works only break programs down into a sequence of
word tokens for extracting contextual features of codes, or apply GNN largely
on homogeneous graph representation (e.g., AST) without discerning complex
types of underlying program entities (e.g., methods, variables). In this work,
we are one of the first to explore heterogeneous graph representation in the
form of Code Property Graph and adapt a well-known heterogeneous graph network
with a dual-supervisor structure for the corresponding graph learning task.
Using the prototype built, we have conducted extensive experiments on both
synthetic datasets and real-world projects. Compared with the state-of-the-art
baselines, the results demonstrate promising effectiveness in this research
direction in terms of vulnerability detection performance (average F1
improvements over 10\% in real-world projects) and transferability from C/C++
to other programming languages (average F1 improvements over 11%)
Generating and Detecting True Ambiguity: A Forgotten Danger in DNN Supervision Testing
Deep Neural Networks (DNNs) are becoming a crucial component of modern
software systems, but they are prone to fail under conditions that are
different from the ones observed during training (out-of-distribution inputs)
or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes
with nonzero probability in their labels. Recent work proposed DNN supervisors
to detect high-uncertainty inputs before their possible misclassification leads
to any harm. To test and compare the capabilities of DNN supervisors,
researchers proposed test generation techniques, to focus the testing effort on
high-uncertainty inputs that should be recognized as anomalous by supervisors.
However, existing test generators aim to produce out-of-distribution inputs. No
existing model- and supervisor independent technique targets the generation of
truly ambiguous test inputs, i.e., inputs that admit multiple classes according
to expert human judgment.
In this paper, we propose a novel way to generate ambiguous inputs to test
DNN supervisors and used it to empirically compare several existing supervisor
techniques. In particular, we propose AmbiGuess to generate ambiguous samples
for image classification problems. AmbiGuess is based on gradient-guided
sampling in the latent space of a regularized adversarial autoencoder.
Moreover, we conducted what is -- to the best of our knowledge -- the most
extensive comparative study of DNN supervisors, considering their capabilities
to detect 4 distinct types of high-uncertainty inputs, including truly
ambiguous ones. We find that the tested supervisors' capabilities are
complementary: Those best suited to detect true ambiguity perform worse on
invalid, out-of-distribution and adversarial inputs and vice-versa.Comment: Accepted for publication at Springers "Empirical Software
Engineering" (EMSE
- …