169 research outputs found

    BehAVExplor: Behavior Diversity Guided Testing for Autonomous Driving Systems

    Full text link
    Testing Autonomous Driving Systems (ADSs) is a critical task for ensuring the reliability and safety of autonomous vehicles. Existing methods mainly focus on searching for safety violations while the diversity of the generated test cases is ignored, which may generate many redundant test cases and failures. Such redundant failures can reduce testing performance and increase failure analysis costs. In this paper, we present a novel behavior-guided fuzzing technique (BehAVExplor) to explore the different behaviors of the ego vehicle (i.e., the vehicle controlled by the ADS under test) and detect diverse violations. Specifically, we design an efficient unsupervised model, called BehaviorMiner, to characterize the behavior of the ego vehicle. BehaviorMiner extracts the temporal features from the given scenarios and performs a clustering-based abstraction to group behaviors with similar features into abstract states. A new test case will be added to the seed corpus if it triggers new behaviors (e.g., cover new abstract states). Due to the potential conflict between the behavior diversity and the general violation feedback, we further propose an energy mechanism to guide the seed selection and the mutation. The energy of a seed quantifies how good it is. We evaluated BehAVExplor on Apollo, an industrial-level ADS, and LGSVL simulation environment. Empirical evaluation results show that BehAVExplor can effectively find more diverse violations than the state-of-the-art

    Retromorphic Testing: A New Approach to the Test Oracle Problem

    Full text link
    A test oracle serves as a criterion or mechanism to assess the correspondence between software output and the anticipated behavior for a given input set. In automated testing, black-box techniques, known for their non-intrusive nature in test oracle construction, are widely used, including notable methodologies like differential testing and metamorphic testing. Inspired by the mathematical concept of inverse function, we present Retromorphic Testing, a novel black-box testing methodology. It leverages an auxiliary program in conjunction with the program under test, which establishes a dual-program structure consisting of a forward program and a backward program. The input data is first processed by the forward program and then its program output is reversed to its original input format using the backward program. In particular, the auxiliary program can operate as either the forward or backward program, leading to different testing modes. The process concludes by examining the relationship between the initial input and the transformed output within the input domain. For example, to test the implementation of the sine function sin⁥(x)\sin(x), we can employ its inverse function, arcsin⁥(x)\arcsin(x), and validate the equation x=sin⁥(arcsin⁥(x)+2kπ),∀k∈Zx = \sin(\arcsin(x)+2k\pi), \forall k \in \mathbb{Z}. In addition to the high-level concept of Retromorphic Testing, this paper presents its three testing modes with illustrative use cases across diverse programs, including algorithms, traditional software, and AI applications

    Definition and Detection of Defects in NFT Smart Contracts

    Full text link
    Recently, the birth of non-fungible tokens (NFTs) has attracted great attention. NFTs are capable of representing users' ownership on the blockchain and have experienced tremendous market sales due to their popularity. Unfortunately, the high value of NFTs also makes them a target for attackers. The defects in NFT smart contracts could be exploited by attackers to harm the security and reliability of the NFT ecosystem. Despite the significance of this issue, there is a lack of systematic work that focuses on analyzing NFT smart contracts, which may raise worries about the security of users' NFTs. To address this gap, in this paper, we introduce 5 defects in NFT smart contracts. Each defect is defined and illustrated with a code example highlighting its features and consequences, paired with possible solutions to fix it. Furthermore, we propose a tool named NFTGuard to detect our defined defects based on a symbolic execution framework. Specifically, NFTGuard extracts the information of the state variables from the contract abstract syntax tree (AST), which is critical for identifying variable-loading and storing operations during symbolic execution. Furthermore, NFTGuard recovers source-code-level features from the bytecode to effectively locate defects and report them based on predefined detection patterns. We run NFTGuard on 16,527 real-world smart contracts and perform an evaluation based on the manually labeled results. We find that 1,331 contracts contain at least one of the 5 defects, and the overall precision achieved by our tool is 92.6%.Comment: Accepted by ISSTA 202

    CodeGrid: A Grid Representation of Code

    Get PDF
    Code representation is a key step in the application of AI in software engineering. Generic NLP representations are effective but do not exploit all the rich structure inherent to code. Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just what signal image-based approaches are exploiting. We answer this question. We show that code is spatial and exploit this fact to propose , a new representation that embeds tokens into a grid that preserves code layout. Unlike some of the existing state of the art, is agnostic to the downstream task: whether that task is generation or classification, can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages 's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that augments models, improving their performance on a range of tasks, On clone detection, improves ASTNN's performance by 3.3% F1 score

    Analyzing Conflict Freedom For Multi-threaded Programs With Time Annotations

    Get PDF
    Avoiding access conflicts is a major challenge in the design of multi-threaded programs. In the context of real-time systems, the absence of conflicts can be guaranteed by ensuring that no two potentially conflicting accesses are ever scheduled concurrently.In this paper, we analyze programs that carry time annotations specifying the time for executing each statement. We propose a technique for verifying that a multi-threaded program with time annotations is free of access conflicts. In particular, we generate constraints that reflect the possible schedules for executing the program and the required properties. We then invoke an SMT solver in order to verify that no execution gives rise to concurrent conflicting accesses. Otherwise, we obtain a trace that exhibits the access conflict.Comment: http://journal.ub.tu-berlin.de/eceasst/article/view/97

    Session-Based Recommender Systems for Action Selection in GUI Test Generation

    Full text link
    Test generation at the graphical user interface (GUI) level has proven to be an effective method to reveal faults. When doing so, a test generator has to repeatably decide what action to execute given the current state of the system under test (SUT). This problem of action selection usually involves random choice, which is often referred to as monkey testing. Some approaches leverage other techniques to improve the overall effectiveness, but only a few try to create human-like actions---or even entire action sequences. We have built a novel session-based recommender system that can guide test generation. This allows us to mimic past user behavior, reaching states that require complex interactions. We present preliminary results from an empirical study, where we use GitHub as the SUT. These results show that recommender systems appear to be well-suited for action selection, and that the approach can significantly contribute to the improvement of GUI-based test generation.Comment: 5 pages, 3 figures, to be published in ICSTW 202

    CodeGrid: A Grid Representation of Code

    Get PDF
    peer reviewedCode representation is a key step in the application of AI in software engineering. Generic NLP representations are effective but do not exploit all the rich structure inherent to code. Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just what signal image-based approaches are exploiting. We answer this question. We show that code is spatial and exploit this fact to propose , a new representation that embeds tokens into a grid that preserves code layout. Unlike some of the existing state of the art, is agnostic to the downstream task: whether that task is generation or classification, can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages 's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that augments models, improving their performance on a range of tasks, On clone detection, improves ASTNN's performance by 3.3% F1 score

    Parasol: Efficient Parallel Synthesis of Large Model Spaces

    Get PDF
    Formal analysis is an invaluable tool for software engineers, yet state-of-the-art formal analysis techniques suffer from well-known limitations in terms of scalability. In particular, some software design domains—such as tradeoff analysis and security analysis—require systematic exploration of potentially huge model spaces, which further exacerbates the problem. Despite this present and urgent challenge, few techniques exist to support the systematic exploration of large model spaces. This paper introduces Parasol, an approach and accompanying tool suite, to improve the scalability of large-scale formal model space exploration. Parasol presents a novel parallel model space synthesis approach, backed with unsupervised learning to automatically derive domain knowledge, guiding a balanced partitioning of the model space. This allows Parasol to synthesize the models in each partition in parallel, significantly reducing synthesis time and making large-scale systematic model space exploration for real-world systems more tractable. Our empirical results corroborate that Parasol substantially reduces (by 460% on average) the time required for model space synthesis, compared to state-of-the-art model space synthesis techniques relying on both incremental and parallel constraint solving technologies as well as competing, non-learning-based partitioning methods
