169 research outputs found
BehAVExplor: Behavior Diversity Guided Testing for Autonomous Driving Systems
Testing Autonomous Driving Systems (ADSs) is a critical task for ensuring the
reliability and safety of autonomous vehicles. Existing methods mainly focus on
searching for safety violations while the diversity of the generated test cases
is ignored, which may generate many redundant test cases and failures. Such
redundant failures can reduce testing performance and increase failure analysis
costs. In this paper, we present a novel behavior-guided fuzzing technique
(BehAVExplor) to explore the different behaviors of the ego vehicle (i.e., the
vehicle controlled by the ADS under test) and detect diverse violations.
Specifically, we design an efficient unsupervised model, called BehaviorMiner,
to characterize the behavior of the ego vehicle. BehaviorMiner extracts the
temporal features from the given scenarios and performs a clustering-based
abstraction to group behaviors with similar features into abstract states. A
new test case will be added to the seed corpus if it triggers new behaviors
(e.g., cover new abstract states). Due to the potential conflict between the
behavior diversity and the general violation feedback, we further propose an
energy mechanism to guide the seed selection and the mutation. The energy of a
seed quantifies how good it is. We evaluated BehAVExplor on Apollo, an
industrial-level ADS, and LGSVL simulation environment. Empirical evaluation
results show that BehAVExplor can effectively find more diverse violations than
the state-of-the-art
Retromorphic Testing: A New Approach to the Test Oracle Problem
A test oracle serves as a criterion or mechanism to assess the correspondence
between software output and the anticipated behavior for a given input set. In
automated testing, black-box techniques, known for their non-intrusive nature
in test oracle construction, are widely used, including notable methodologies
like differential testing and metamorphic testing. Inspired by the mathematical
concept of inverse function, we present Retromorphic Testing, a novel black-box
testing methodology. It leverages an auxiliary program in conjunction with the
program under test, which establishes a dual-program structure consisting of a
forward program and a backward program. The input data is first processed by
the forward program and then its program output is reversed to its original
input format using the backward program. In particular, the auxiliary program
can operate as either the forward or backward program, leading to different
testing modes. The process concludes by examining the relationship between the
initial input and the transformed output within the input domain. For example,
to test the implementation of the sine function , we can employ its
inverse function, , and validate the equation . In addition to the
high-level concept of Retromorphic Testing, this paper presents its three
testing modes with illustrative use cases across diverse programs, including
algorithms, traditional software, and AI applications
Definition and Detection of Defects in NFT Smart Contracts
Recently, the birth of non-fungible tokens (NFTs) has attracted great
attention. NFTs are capable of representing users' ownership on the blockchain
and have experienced tremendous market sales due to their popularity.
Unfortunately, the high value of NFTs also makes them a target for attackers.
The defects in NFT smart contracts could be exploited by attackers to harm the
security and reliability of the NFT ecosystem. Despite the significance of this
issue, there is a lack of systematic work that focuses on analyzing NFT smart
contracts, which may raise worries about the security of users' NFTs. To
address this gap, in this paper, we introduce 5 defects in NFT smart contracts.
Each defect is defined and illustrated with a code example highlighting its
features and consequences, paired with possible solutions to fix it.
Furthermore, we propose a tool named NFTGuard to detect our defined defects
based on a symbolic execution framework. Specifically, NFTGuard extracts the
information of the state variables from the contract abstract syntax tree
(AST), which is critical for identifying variable-loading and storing
operations during symbolic execution. Furthermore, NFTGuard recovers
source-code-level features from the bytecode to effectively locate defects and
report them based on predefined detection patterns. We run NFTGuard on 16,527
real-world smart contracts and perform an evaluation based on the manually
labeled results. We find that 1,331 contracts contain at least one of the 5
defects, and the overall precision achieved by our tool is 92.6%.Comment: Accepted by ISSTA 202
CodeGrid: A Grid Representation of Code
Code representation is a key step in the application of AI in software engineering. Generic NLP representations are effective but do not exploit all the rich structure inherent to code. Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just what signal image-based approaches are exploiting. We answer this question. We show that code is spatial and exploit this fact to propose , a new representation that embeds tokens into a grid that preserves code layout. Unlike some of the existing state of the art, is agnostic to the downstream task: whether that task is generation or classification, can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages 's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that augments models, improving their performance on a range of tasks, On clone detection, improves ASTNN's performance by 3.3% F1 score
Analyzing Conflict Freedom For Multi-threaded Programs With Time Annotations
Avoiding access conflicts is a major challenge in the design of
multi-threaded programs. In the context of real-time systems, the absence of
conflicts can be guaranteed by ensuring that no two potentially conflicting
accesses are ever scheduled concurrently.In this paper, we analyze programs
that carry time annotations specifying the time for executing each statement.
We propose a technique for verifying that a multi-threaded program with time
annotations is free of access conflicts. In particular, we generate constraints
that reflect the possible schedules for executing the program and the required
properties. We then invoke an SMT solver in order to verify that no execution
gives rise to concurrent conflicting accesses. Otherwise, we obtain a trace
that exhibits the access conflict.Comment: http://journal.ub.tu-berlin.de/eceasst/article/view/97
Session-Based Recommender Systems for Action Selection in GUI Test Generation
Test generation at the graphical user interface (GUI) level has proven to be
an effective method to reveal faults. When doing so, a test generator has to
repeatably decide what action to execute given the current state of the system
under test (SUT). This problem of action selection usually involves random
choice, which is often referred to as monkey testing. Some approaches leverage
other techniques to improve the overall effectiveness, but only a few try to
create human-like actions---or even entire action sequences. We have built a
novel session-based recommender system that can guide test generation. This
allows us to mimic past user behavior, reaching states that require complex
interactions. We present preliminary results from an empirical study, where we
use GitHub as the SUT. These results show that recommender systems appear to be
well-suited for action selection, and that the approach can significantly
contribute to the improvement of GUI-based test generation.Comment: 5 pages, 3 figures, to be published in ICSTW 202
CodeGrid: A Grid Representation of Code
peer reviewedCode representation is a key step in the application of AI in software engineering. Generic NLP representations are effective but do not exploit all the rich structure inherent to code. Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just what signal image-based approaches are exploiting. We answer this question. We show that code is spatial and exploit this fact to propose , a new representation that embeds tokens into a grid that preserves code layout. Unlike some of the existing state of the art, is agnostic to the downstream task: whether that task is generation or classification, can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages 's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that augments models, improving their performance on a range of tasks, On clone detection, improves ASTNN's performance by 3.3% F1 score
Parasol: Efficient Parallel Synthesis of Large Model Spaces
Formal analysis is an invaluable tool for software engineers, yet state-of-the-art formal analysis techniques suffer from well-known limitations in terms of scalability. In particular, some software design domainsâsuch as tradeoff analysis and security analysisârequire systematic exploration of potentially huge model spaces, which further exacerbates the problem. Despite this present and urgent challenge, few techniques exist to support the systematic exploration of large model spaces. This paper introduces Parasol, an approach and accompanying tool suite, to improve the scalability of large-scale formal model space exploration. Parasol presents a novel parallel model space synthesis approach, backed with unsupervised learning to automatically derive domain knowledge, guiding a balanced partitioning of the model space. This allows Parasol to synthesize the models in each partition in parallel, significantly reducing synthesis time and making large-scale systematic model space exploration for real-world systems more tractable. Our empirical results corroborate that Parasol substantially reduces (by 460% on average) the time required for model space synthesis, compared to state-of-the-art model space synthesis techniques relying on both incremental and parallel constraint solving technologies as well as competing, non-learning-based partitioning methods
- âŠ