59 research outputs found
Identifying and Explaining Safety-critical Scenarios for Autonomous Vehicles via Key Features
Ensuring the safety of autonomous vehicles (AVs) is of utmost importance and
testing them in simulated environments is a safer option than conducting
in-field operational tests. However, generating an exhaustive test suite to
identify critical test scenarios is computationally expensive as the
representation of each test is complex and contains various dynamic and static
features, such as the AV under test, road participants (vehicles, pedestrians,
and static obstacles), environmental factors (weather and light), and the
road's structural features (lanes, turns, road speed, etc.). In this paper, we
present a systematic technique that uses Instance Space Analysis (ISA) to
identify the significant features of test scenarios that affect their ability
to reveal the unsafe behaviour of AVs. ISA identifies the features that best
differentiate safety-critical scenarios from normal driving and visualises the
impact of these features on test scenario outcomes (safe/unsafe) in 2D. This
visualization helps to identify untested regions of the instance space and
provides an indicator of the quality of the test suite in terms of the
percentage of feature space covered by testing. To test the predictive ability
of the identified features, we train five Machine Learning classifiers to
classify test scenarios as safe or unsafe. The high precision, recall, and F1
scores indicate that our proposed approach is effective in predicting the
outcome of a test scenario without executing it and can be used for test
generation, selection, and prioritization.Comment: 28 pages, 6 figure
Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of System-level Testing of Autonomous Vehicles
AI-powered systems have gained widespread popularity in various domains,
including Autonomous Vehicles (AVs). However, ensuring their reliability and
safety is challenging due to their complex nature. Conventional test adequacy
metrics, designed to evaluate the effectiveness of traditional software
testing, are often insufficient or impractical for these systems. White-box
metrics, which are specifically designed for these systems, leverage neuron
coverage information. These coverage metrics necessitate access to the
underlying AI model and training data, which may not always be available.
Furthermore, the existing adequacy metrics exhibit weak correlations with the
ability to detect faults in the generated test suite, creating a gap that we
aim to bridge in this study.
In this paper, we introduce a set of black-box test adequacy metrics called
"Test suite Instance Space Adequacy" (TISA) metrics, which can be used to gauge
the effectiveness of a test suite. The TISA metrics offer a way to assess both
the diversity and coverage of the test suite and the range of bugs detected
during testing. Additionally, we introduce a framework that permits testers to
visualise the diversity and coverage of the test suite in a two-dimensional
space, facilitating the identification of areas that require improvement.
We evaluate the efficacy of the TISA metrics by examining their correlation
with the number of bugs detected in system-level simulation testing of AVs. A
strong correlation, coupled with the short computation time, indicates their
effectiveness and efficiency in estimating the adequacy of testing AVs.Comment: 12 pages, 7 figure
Closing the Loop for Software Remodularisation -- REARRANGE: An Effort Estimation Approach for Software Clustering-based Remodularisation
Software remodularization through clustering is a common practice to improve
internal software quality. However, the true benefit of software clustering is
only realized if developers follow through with the recommended refactoring
suggestions, which can be complex and time-consuming. Simply producing
clustering results is not enough to realize the benefits of remodularization.
For the recommended refactoring operations to have an impact, developers must
follow through with them. However, this is often a difficult task due to
certain refactoring operations' complexity and time-consuming nature.Comment: Accepted for publication at ICSE23 Poster Trac
Test-based Patch Clustering for Automatically-Generated Patches Assessment
Previous studies have shown that Automated Program Repair (APR) techniques
suffer from the overfitting problem. Overfitting happens when a patch is run
and the test suite does not reveal any error, but the patch actually does not
fix the underlying bug or it introduces a new defect that is not covered by the
test suite. Therefore, the patches generated by APR tools need to be validated
by human programmers, which can be very costly, and prevents APR tools adoption
in practice.Our work aims at increasing developer trust in automated patch
generation by minimizing the number of plausible patches that they have to
review, thereby reducing the time required to find a correct patch. We
introduce a novel light-weight test-based patch clustering approach called
xTestCluster, which clusters patches based on their dynamic behavior.
xTestCluster is applied after the patch generation phase in order to analyze
the generated patches from one or more repair tools. The novelty of
xTestCluster lies in using information from execution of newly generated test
cases to cluster patches generated by multiple APR approaches. A cluster is
formed with patches that fail on the same generated test cases. The output from
xTestCluster gives developers a) a way of reducing the number of patches to
analyze, as they can focus on analyzing a sample of patches from each cluster,
b) additional information attached to each patch. After analyzing 1910
plausible patches from 25 Java APR tools, our results show that xTestCluster is
able to reduce the number of patches to review and analyze with a median of
50%. xTestCluster can save a significant amount of time for developers that
have to review the multitude of patches generated by APR tools, and provides
them with new test cases that show the differences in behavior between
generated patches
- …