691 research outputs found
Sciduction: Combining Induction, Deduction, and Structure for Verification and Synthesis
Even with impressive advances in automated formal methods, certain problems
in system verification and synthesis remain challenging. Examples include the
verification of quantitative properties of software involving constraints on
timing and energy consumption, and the automatic synthesis of systems from
specifications. The major challenges include environment modeling,
incompleteness in specifications, and the complexity of underlying decision
problems.
This position paper proposes sciduction, an approach to tackle these
challenges by integrating inductive inference, deductive reasoning, and
structure hypotheses. Deductive reasoning, which leads from general rules or
concepts to conclusions about specific problem instances, includes techniques
such as logical inference and constraint solving. Inductive inference, which
generalizes from specific instances to yield a concept, includes algorithmic
learning from examples. Structure hypotheses are used to define the class of
artifacts, such as invariants or program fragments, generated during
verification or synthesis. Sciduction constrains inductive and deductive
reasoning using structure hypotheses, and actively combines inductive and
deductive reasoning: for instance, deductive techniques generate examples for
learning, and inductive reasoning is used to guide the deductive engines.
We illustrate this approach with three applications: (i) timing analysis of
software; (ii) synthesis of loop-free programs, and (iii) controller synthesis
for hybrid systems. Some future applications are also discussed
The Oracle Problem in Software Testing: A Survey
Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice
Can Large Language Models Write Good Property-Based Tests?
Property-based testing (PBT), while an established technique in the software
testing research community, is still relatively underused in real-world
software. Pain points in writing property-based tests include implementing
diverse random input generators and thinking of meaningful properties to test.
Developers, however, are more amenable to writing documentation; plenty of
library API documentation is available and can be used as natural language
specifications for property-based tests. As large language models (LLMs) have
recently shown promise in a variety of coding tasks, we explore the potential
of using LLMs to synthesize property-based tests. We call our approach PBT-GPT,
and propose three different strategies of prompting the LLM for PBT. We
characterize various failure modes of PBT-GPT and detail an evaluation
methodology for automatically synthesized property-based tests. PBT-GPT
achieves promising results in our preliminary studies on sample Python library
APIs in , , and
Search-based Software Testing Driven by Automatically Generated and Manually Defined Fitness Functions
Search-based software testing (SBST) typically relies on fitness functions to
guide the search exploration toward software failures. There are two main
techniques to define fitness functions: (a) automated fitness function
computation from the specification of the system requirements and (b) manual
fitness function design. Both techniques have advantages. The former uses
information from the system requirements to guide the search toward portions of
the input domain that are more likely to contain failures. The latter uses the
engineers' domain knowledge. We propose ATheNA, a novel SBST framework that
combines fitness functions that are automatically generated from requirements
specifications and manually defined by engineers. We design and implement
ATheNA-S, an instance of ATheNA that targets Simulink models. We evaluate
ATheNA-S by considering a large set of models and requirements from different
domains. We compare our solution with an SBST baseline tool that supports
automatically generated fitness functions, and another one that supports
manually defined fitness functions. Our results show that ATheNA-S generates
more failure-revealing test cases than the baseline tools and that the
difference between the performance of ATheNA-S and the baseline tools is not
statistically significant. We also assess whether ATheNA-S could generate
failure-revealing test cases when applied to a large case study from the
automotive domain. Our results show that ATheNA-S successfully revealed a
requirement violation in our case study
Generating Log File Analyzers
Software testing is a crucial part of the software development process, because it helps developers ensure that the software works correctly and according to stakehold- ers’ requirements and specifications. Faulty or problematic software can cause huge financial losses. Automation of testing tasks can have a positive impact on software development, by reducing costs and minimizing human error. Software testing can be divided into three tasks: choosing test cases, running test cases on the software under test (SUT) and evaluating the test results. To evaluate test results, testers need to examine the output of the SUT to determine if it performed as expected. Programs often store some of their outputs in files known as log files. The task of evaluating test results can be automated by using a log file analyzer. The main goal of this thesis is to design an approach to generate log file analyzers based on a set of state machine specifications. Our analyzers are generated in C++ and are capable of reading log files from disk or shared memory areas. Regular expressions have been incorporated, so that analyzers can be adapted to different logging policies. We analyze the purpose and benefits of this framework and discuss differences with a previous implementation based on Prolog. In particular, we discuss the results of a series of experiments that we performed in order to compare the performance between Prolog–based analyzers and C++ analyzers. Our results show that C++ analyzers are between 8 and 15 times faster than Prolog–based analyzers
Generating Automated and Online Test Oracles for Simulink Models with Continuous and Uncertain Behaviors
Test automation requires automated oracles to assess test outputs. For cyber physical systems (CPS), oracles, in addition to be automated, should ensure some key objectives: (i) they should check test outputs in an online manner to stop expensive test executions as soon as a failure is detected; (ii) they should handle time- and magnitude-continuous CPS behaviors; (iii) they should provide a quantitative degree of satisfaction or failure measure instead of binary pass/fail outputs; and (iv) they should be able to handle uncertainties due to CPS interactions with the environment. We propose an automated approach to translate CPS requirements specified in a logic-based language into test oracles specified in Simulink - a widely-used development and simulation language for CPS. Our approach achieves the objectives noted above through the identification of a fragment of Signal First Order logic (SFOL) to specify requirements, the definition of a quantitative semantics for this fragment and a sound translation of the fragment into Simulink. The results from applying our approach on 11 industrial case studies show that: (i) our requirements language can express all the 98 requirements of our case studies; (ii) the time and effort required by our approach are acceptable, showing potentials for the adoption of our work in practice, and (iii) for large models, our approach can dramatically reduce the test execution time compared to when test outputs are checked in an offline manner
Falsification of Cyber-Physical Systems with Robustness-Guided Black-Box Checking
For exhaustive formal verification, industrial-scale cyber-physical systems
(CPSs) are often too large and complex, and lightweight alternatives (e.g.,
monitoring and testing) have attracted the attention of both industrial
practitioners and academic researchers. Falsification is one popular testing
method of CPSs utilizing stochastic optimization. In state-of-the-art
falsification methods, the result of the previous falsification trials is
discarded, and we always try to falsify without any prior knowledge. To
concisely memorize such prior information on the CPS model and exploit it, we
employ Black-box checking (BBC), which is a combination of automata learning
and model checking. Moreover, we enhance BBC using the robust semantics of STL
formulas, which is the essential gadget in falsification. Our experiment
results suggest that our robustness-guided BBC outperforms a state-of-the-art
falsification tool.Comment: Accepted to HSCC 202
- …