Synthesizing Adaptive Test Strategies from Temporal Logic Specifications by Bloem, Roderick et al.
Synthesizing Adaptive Test Strategies from
Temporal Logic Specifications
Roderick Bloem1, Goerschwin Fey2,3, Fabian Greif3, Robert Ko¨nighofer1,
Ingo Pill1, Heinz Riener3,4, and Franz Ro¨ck1
1Graz University of Technology, Graz, Austria
2Hamburg University of Technology, Hamburg, Germany
3German Aerospace Center, Bremen, Germany
4EPFL, Lausanne, Switzerland
Abstract. Constructing good test cases is difficult and time-consuming,
especially if the system under test is still under development and its exact
behavior is not yet fixed. We propose a new approach to compute test
strategies for reactive systems from a given temporal logic specification
using formal methods. The computed strategies are guaranteed to reveal
certain simple faults in every realization of the specification and for
every behavior of the uncontrollable part of the system’s environment.
The proposed approach supports different assumptions on occurrences
of faults (ranging from a single transient fault to a persistent fault) and
by default aims at unveiling the weakest one. Based on well-established
hypotheses from fault-based testing, we argue that such tests are also
sensitive for more complex bugs. Since the specification may not define
the system behavior completely, we use reactive synthesis algorithms with
partial information. The computed strategies are adaptive test strategies
that react to behavior at runtime. We work out the underlying theory
of adaptive test strategy synthesis and present experiments for a safety-
critical component of a real-world satellite system. We demonstrate that
our approach can be applied to industrial specifications and that the
synthesized test strategies are capable of detecting bugs that are hard to
detect with random testing.
1 Introduction
Model checking [11,47] is an algorithmic approach to prove that a model of a
system adheres to its specification. However, model checking cannot always be
applied effectively to obtain confidence in the correctness of a system. Possible
reasons include scalability issues, third-party IP components for which no code or
detailed model is available, or a high effort for building system models that are
sufficiently precise. Moreover, model checking cannot verify the final and “live”
product but only an (abstracted) model.
Testing is a natural alternative to complement formal methods like model
checking, and automatic test case generation helps keeping the effort acceptable.
Black-box testing techniques, where tests are derived from a specification rather
ar
X
iv
:1
80
9.
01
60
7v
1 
 [c
s.S
E]
  5
 Se
p 2
01
8
than the implementation, are particularly attractive: first, tests can be computed
before the implementation phase starts, and thus guide the development. Second,
the same tests can be reused across different realizations of a given specification.
Third, a specification is usually much simpler than its implementation, which gives
a scalability advantage. At the same time, the specification focuses on critical
functional aspects that require thorough testing. Fault-based techniques [28] are
particularly appealing, where the computed tests are guaranteed to reveal all
faults in a certain fault class — after all, the foremost goal in testing is to detect
bugs.
Methods to derive tests from declarative requirements (see, e.g., [24]) are
sparse. One issue in this setting is controllability: the requirements leave plenty
of implementation freedom, so they cannot be used to fully predict the system
behavior for all given inputs. Consequently, test cases have to be adaptive, i.e., able
to react to observed behavior at runtime, rather than being fixed input sequences.
This is particularly true for reactive systems that continuously interact with their
environment. Existing methods often work around this complication by requiring
a deterministic system model as additional input [23]. Even a probabilistic model
fixes the behavior in a way not necessarily required by the specification.
In previous work, we presented a fault-based approach to compute adaptive
test strategies for reactive systems [9]. This approach generates tests that enforce
certain coverage goals for every implementation of a provided specification. The
generated tests can be used across realizations of the specification that differ not
only in implementation details but also in their observable behavior. This is, e.g.,
useful for standards and protocols that are implemented by multiple vendors or
for systems under development, where the exact behavior is not yet fixed.
Fig. 1 outlines the assumed testing setup and shows how the approach for
synthesizing adaptive test strategies (illustrated in black) can be integrated in
an existing testing flow. The user provides a specification ϕ, which describes the
requirements of the system under test (SUT) and additionally a fault model δ,
which defines the coverage goal in terms of a class of faults for which the tests
shall cause a specification violation. Both the specification and the coverage goal
are expressed in Linear Temporal Logic (LTL) [45]. By default, our approach
supports the detection of transient and permanent faults and distinguishes four
fault occurrence frequencies: faults that occur at least (1) once, (2) repeatedly,
(3) from some point on, or (4) permanently. The approach then automatically
synthesizes a test strategy to reveal a fault for the lowest frequency possible. Such
a test strategy guarantees to cause a specification violation if the fault occurs
with the defined fault occurrence (and all higher fault occurrence frequencies)
and the test is executed long enough. Besides the four default fault occurrence
frequencies, a user can also provide a custom frequency using LTL.
Under the hood, reactive synthesis [46] with partial information [32] is used,
which provides strong guarantees about all uncertainties: if synthesis is successful
and if the computed tests are executed long enough, they reveal all faults from
the fault model for every realization of the specification and every behavior of
the uncontrollable part of the system’s environment. Uncontrollable environment
Specification 𝜑𝜑 
Fault Model 𝛿𝛿 Synthesis 
Oracle 
Synthesis 
Adaptive 
Test 
Strategy 
System Under 
Test (SUT) 
Oracle 
input 
ou
tp
ut
 
Test case 
generation 
Test case execution Input Verdict 
Pass/ 
Fail/ 
Inconcl. 
Fig. 1: Testing setup: this paper focuses on test strategy synthesis.
aspects can be seen as part of the system for the purpose of testing. Finally,
existing techniques from runtime verification [6] can be used to build an oracle
that checks the system behavior against the specification while tests are executed.1
This paper is an extension of [9]. In summary, this paper presents the following
contributions:
– An approach to compute adaptive test strategies for reactive systems from
temporal specifications that provide implementation freedom. The tests are
guaranteed to reveal certain bugs for every realization of the specification.
– The underlying theory is considered in detail, i.e., we show that the approach
is sound and complete for many interesting cases and provide additional
solutions for other cases that may arise in practice.
– A proof of concept tool, called PARTYStrategy2, that is capable of generating
multiple different test strategies, implemented on top of the synthesis tool
PARTY [30].
– A post-processing procedure to generalize a test strategy by eliminating input
constraints not necessary to guarantee a coverage goal.
– A case study with a safety-critical software component of a real-world satellite
system developed in the German Aerospace Center (DLR). We specify the
system in LTL, synthesize test strategies, and evaluate the generated adaptive
test strategies using code coverage and mutation coverage metrics. Our
synthesized test strategies increase both the mutation coverage as well as
the code coverage of random test cases by activating behaviors that require
complex input sequences that are unlikely to be produced by random testing.
The remainder of this paper is organized as follows: Section 2 illustrates our
approach and presents a motivating example. Section 3 discusses related work.
Section 4 gives preliminaries and notation. Our test case generation approach
is then worked out in detail in Section 5. Section 6 presents the case study and
discusses results. Section 7 concludes.
1 While the semantics of LTL are defined over infinite execution traces, we can only
run the tests for a finite amount of time. This can result in inconclusive verdicts [6].
We exclude this issue from the scope of this paper, relying on the user to judge when
tests have been executed long enough, and on existing research on interpreting LTL
over finite traces [38,26,14,13].
2 PARTYStrategy, https://www.iaik.tugraz.at/content/research/scos/tools/
SUT 
f p 
c 
h 
Highway 
Farm 
Road 
Fig. 2: Traffic light example.
2 Motivating Example
Let us develop a traffic light controller for the scenario depicted in Fig. 2. For
this highway and farmroad crossing, the controller’s Boolean input signal c
describes whether a car is idling at the farmroad. Boolean outputs h and f
control the highway and farmroad traffic lights respectively, where a value of true
means a green light. Output p controls a camera that takes a picture if a car on
the farmroad makes a fast start, i.e., races off immediately when the farmroad
light turns green. The controller then should implement the following critical
properties:
1. The traffic lights must never be green simultaneously.
2. If a car is waiting at the farmroad, f eventually turns true.
3. If no car is waiting at the farmroad, h eventually becomes true.
4. A picture is taken if a car on the farmroad makes a fast start.
We model the four properties in Linear Temporal Logic (LTL) [45] as
ϕ1 = G(¬f ∨ ¬h) (1)
ϕ2 = G(c→ F f) (2)
ϕ3 = G(¬c→ F h) (3)
ϕ4 = G
(
(¬f ∧ X(c ∧ f ∧ X¬c))↔ XXp) (4)
where the operator G denotes always, F denotes eventually, and X denotes in the
nextstep.
The resulting specification is then:
ϕ = ϕ1 ∧ ϕ2 ∧ ϕ3 ∧ ϕ4
To compute a test strategy (only from the specification) that enforces a
specification violation by the system under the existence of a certain fault (or
class of faults), we have some requirements for our approach.
¬c c ¬c
f
¬f
¬f
f
true
¬c c
f ¬f
¬f
f
Fig. 3: Two adaptive test strategies for the traffic light controller: on the left, T1
that enforces p = true once. On the right, T2 that enforces p = true infinitely
often.
Enforcing test objectives To mitigate scalability issues, we compute test cases
directly from the specification ϕ. Note that ϕ focuses on the desired properties
only, and allows for plenty of implementation freedom. Our goal is to compute
tests that enforce certain coverage objectives independent of this implementation
freedom. Some uncertainties about the SUT behavior may actually be rooted
in uncontrollable environment aspects (such as weather conditions) rather than
implementation freedom inside the system. But for our testing approach, this
makes no difference. We can force the farmroad’s traffic light to turn green
(f=true) by relying on a correct implementation of Property 2 and setting c=true.
Depending on how the system is implemented, f=true might also be achieved by
setting c=false all the time, but this is not guaranteed.
Adaptive test strategies Certain test goals may not be enforceable with a static
input sequence. For our example, for p to be true, a car must do a fast start.
Yet, the specification does not prescribe the exact point in time when the traffic
light turns to green. We thus synthesize adaptive test strategies that guide the
controller’s inputs based on the previous inputs and outputs and, therefore, can
take advantage of situational possibilities by exploiting previous system behavior.
Fig. 3 shows a test strategy T1 (on the left) to reach p=true, illustrated as a
state machine. States are labeled by the value of controller input c (which is an
output of the test strategy T1). Edges represent transitions and are labeled with
conditions on observed output values (since the SUT’s outputs are inputs for the
test strategy). First, c is set to false to provoke h=true via Property 3, implying
f=false via Property 1. As soon as this happens, the strategy traverses to the
middle state, setting c=true in order to have f=true eventually (Property 2). As
soon as f switches from false to true, T1 sets c=false in the rightmost state to
trigger a picture (Property 4). A system with a permanent stuck-at-0 fault at
signal p is unable to satisfy the specification and the resulting violation can be
detected by a runtime verification technique.
Coverage objectives We follow a fault-centered approach to define the test
objectives to enforce. The user defines a class of (potentially transient) faults.
Our approach then computes adaptive test strategies (in form of state machines)
that detect these faults. For a permanent stuck-at-0 fault at signal p, our approach
could produce the test strategy T1 from the previous paragraph: for any correct
implementation of ϕ, the strategy enforces p becoming true at least once. Thus,
¬c ¬c c
f ¬f
true
¬f
f
¬c c
f ¬f
true
¬f
f
Fig. 4: Two more adaptive test strategies for the traffic light controller: on the
left, T3 that enforces p=true infinitely often starting from the second time step.
On the right, T4 that generalizes T3 by allowing an arbitrary choice for the input
in the first time step.
a faulty version where p is always false necessarily violates the specification,
which can be detected [6] during test strategy execution. The test strategy T2, as
shown on the right of Fig. 3, is even more powerful since it also reveals stuck-at-0
faults for p that occur not always but only from some point in time onwards.
The difference to T1 is mainly in the bold transition, which makes T2 enforce
p=true infinitely often rather than only once. Our approach distinguishes four
fault occurrence frequencies (a fault occurs at least once, infinitely often, from
some point on, or always) and synthesizes test strategies for the lowest one for
which this is possible.
Multiple strategies The previously discussed strategies, T1 and T2, reveal a stuck-
at-0 fault that manifests permanently at signal p or a stuck-at-0 fault that
manifests from some point in time on permanently at signal p, respectively. Let
us now assume that a stuck-at-0 fault occurs from some point in time only if a
certain input-output interaction happened first, e.g., if c is false at the second
time step. Strategy T3 as shown on the left of Fig. 4 sets c=false in the second
time step. The output produced by the SUT as a response is not relevant. The
strategy then follows T2 to enforce p=true infinitely often, as before. The two test
strategies, T2 and T3, enforce the same test objective; however when executed they
produce different traces. We argue that considering multiple test strategies for a
test objective is necessary to uncover faults in different system implementations
and extend our approach to compute a bounded number of test strategies for
a given test objective to improve the overall fault coverage while keeping the
computational overhead controllable by the user.
Strategy generalization The assignment c = false in the initial state of T3 is
neither necessary to activate the fault in the envisioned scenario nor to enforce
p=true infinitely often. From a testing perspective, the tester is free to make an
arbitrary choice for the input to the SUT in the initial state. As a generalization
mechanism of the test strategies, we identify and remove state machine labels not
necessary to enforce the test objective. Strategy T4, illustrated on the right of
Fig. 4, is similar to T3, but differs by only having assignments for input variables
in states where the concrete values are necessary to enforce the desired behavior.
3 Background and Related Work
Fault-based testing Fault-based test case generation methods that use the concept
of mutation testing [28] seed simple faults into a system implementation (or
model) and compute tests that uncover these faults. Two hypotheses support
the value of such tests. The Competent Programmer Hypothesis [15,1] states
that implementations are mostly close to correct. The Coupling Effect [15,40]
states that tests that detect simple faults are also sensitive to more complex
faults. Our approach also relies on these hypotheses. However, in contrast to
most existing work that considers permanent faults and deterministic system
descriptions that define behavior unambiguously, our approach can deal with
transient faults and focuses on uncovering faults in every implementation of a
given LTL [45] specification (and all behaviors of the uncontrollable part of the
system’s environment).
Adaptive tests If the behavior of the system or the uncontrollable part of the
environment is not fully specified, tests may have to react to observed behavior
at runtime to achieve their goals. Such adaptive tests have been studied by
Hierons [27] from a theoretical perspective, relying on fairness assumptions
(every non-deterministic behavior is exhibited when trying often enough) or
probabilities. Petrenko et al. compute adaptive tests for trace inclusion [42,44,43]
or equivalence [41,34,43] from a specification given as non-deterministic finite
state machine, also relying on fairness assumptions. Our work makes no such
assumptions but considers the SUT to be fully antagonistic. Aichernig et al. [2]
present a method to compute adaptive tests from (non-deterministic) UML state
machines. Starting from an initial state, a trace to a goal state, the state that
shall be covered by the resulting test case, is searched for every possible system
behavior, issuing inconclusive verdicts only if the goal state is not reachable any
more. Our approach uses reactive synthesis to enforce reaching the testing goal
for all implementations if this is possible.
Testing as a game Yannakakis [50] points out that testing reactive systems can
be seen as a game between two players: the tester providing inputs and trying
to reveal faults, and the SUT providing outputs and trying to hide faults. The
tester can only observe outputs and has thus partial information about the SUT.
The goal is to find a strategy for the tester that wins against every SUT. The
underlying complexities are studied by Alur et al. [3] in detail. Our work builds
upon reactive synthesis [46] (with partial information [32]), which can also be
seen as a game. However, we go far beyond the basic idea. We combine the game
concept with user-defined fault models, work out the underlying theory, optimize
the faults sensitivity in the temporal domain, and present a realization and
experiments for LTL [45]. Nachmanson et al. [39] synthesize game strategies as
tests for non-deterministic software models, but their approach is not fault-based
and focuses on simple reachability goals. A variant of their approach considers
the SUT to behave probabilistically with known probabilities [39]. The same
model is also used in [8]. Test strategies for reachability goals are also considered
by David et al. [12] for timed automata.
Vacuity detection Several approaches [7,33,5] aim at finding cases where a
temporal specification is trivially satisfied (e.g., because the left side of an
implication is false). Good tests avoid such vacuities to challenge the SUT. The
method by Beer et al. [7] can produce witnesses that satisfy the specification non-
vacuously, which can serve as tests. Our approach avoids vacuities by requiring
that certain faulty SUTs violate the specification.
Testing with a model checker Model checkers can be utilized to compute tests
from temporal specifications [24]. The method by Fraser and Ammann [21]
ensures that properties are not vacuously satisfied and that faults propagate
to observable property violations (using finite-trace semantics for LTL). Tan et
al. [48] also define and apply a coverage metric based on vacuity for LTL. Ammann
et al. [4] create tests from CTL [11] specifications using model mutations. All
these methods assume that a deterministic system model is available in addition
to the specification. Fraser and Wotawa [22] also consider non-deterministic
models, but issue inconclusive verdicts if the system deviates from the behavior
foreseen in the test case. In contrast, we search for test strategies that achieve
their goal for every realization of the specification. Boroday et al. [10] aim for a
similar guarantee (calling it strong test cases) using a model checker, but do not
consider adaptive test cases, and use a finite state machine as a specification.
Synthesis of test strategies Bounded synthesis [20] aims for finding a system
implementation of minimal size in the number of states. Symbolic procedures
based on binary decision diagrams [17] and satisfiability solving [30] exist. In our
setting, we do not synthesize an implementation of the system, but an adaptive
test strategy, i.e., a controller that mimics the system’s environment to enforce
a certain test goal. In contrast to a complete implementation of the controller,
we strive for finding a partial implementation that assigns values only to those
signals that necessarily contribute to reach the test goal. Other signals can be
kept non-deterministic and either chosen during execution of the test strategy
or randomized. We use a post-processing procedure that eliminates assignments
from the test strategy and invokes a modelchecker to verify that the test goal
is still enforced. This post-processing step is conceptually similar to procedures
that aim for counterexample simplification [29] and don’t care identification in
test patterns [37]. Jin et al. [29] separate a counterexample trace into forced
segments that unavoidably progress towards the specification violation and free
segments that, if avoided, may have prevented the specification violation. Our
post-processing step is similar, but instead of counterexamples, adaptive test
strategies are post-processed. Miyase and Kajihara [37] present an approach to
identify don’t cares in test patterns of combinational circuits. In contrast to
combinational circuits, we deal with reactive systems. Instead of post-processing
a complete test strategy, a partial test strategy can be directly synthesized by
modifying a synthesis procedure to compute minimum satisfying assignments [16].
Although feasible, modifying a synthesis procedure requires a lot of work. Our
post-processing procedure uses the synthesis procedure in a plug-and-play fashion
and does not require manual changes in the synthesis procedure.
4 Preliminaries and Notation
Traces We want to test reactive systems that have a finite set I = {i1, . . . , im} of
Boolean inputs and a finite set O = {o1, . . . , on} of Boolean outputs. The input
alphabet is ΣI = 2
I , the output alphabet is ΣO = 2
O, and Σ = 2I∪O. An infinite
word σ over Σ is an (execution) trace and the set Σω is the set of all infinite
words over Σ.
Linear Temporal Logic We use Linear Temporal Logic (LTL) [45] as a specification
language for reactive systems. The syntax is defined as follows: every input or
output p ∈ I ∪O is an LTL formula; and if ϕ1 and ϕ2 are LTL formulas, then
so are ¬ϕ1, ϕ1 ∨ ϕ2, Xϕ1 and ϕ1 U ϕ2. We write σ |= ϕ to denote that a trace
σ = σ0σ1 . . . ∈ Σω satisfies LTL formula ϕ. This is defined inductively as follows:
– σ0σ1σ2 . . . |= p iff p ∈ σ0,
– σ |= ¬ϕ iff σ 6|= ϕ,
– σ |= ϕ1 ∨ ϕ2 iff σ |= ϕ1 or σ |= ϕ2,
– σ0σ1σ2 . . . |= Xϕ iff σ1σ2 . . . |= ϕ, and
– σ0σ1 . . . |= ϕ1Uϕ2 iff ∃j ≥ 0 . σjσj+1 . . . |= ϕ2∧∀0 ≤ k < j . σkσk+1 . . . |= ϕ1.
That is, Xϕ requires ϕ to hold in the next step, and ϕ1 U ϕ2 means that ϕ1
must hold until ϕ2 holds (and ϕ2 must hold eventually). We also use the usual
abbreviations ϕ1 ∧ ϕ2 = ¬(¬ϕ1 ∨ ¬ϕ2), ϕ1 → ϕ2 = ¬ϕ1 ∨ ϕ2, Fϕ = true U ϕ
(meaning that ϕ must hold eventually), and Gϕ = ¬F¬ϕ (ϕ must hold always).
By ϕ[x← y] we denote the LTL formula ϕ where all occurrences of x have been
textually replaced by y.
Mealy machines We use Mealy machines to model the reactive system under test.
A Mealy machine is a tuple S = (Q, q0, ΣI , ΣO, δ, λ), where Q is a finite set of
states, q0 ∈ Q is the initial state, δ : Q×ΣI → Q is a total transition function, and
λ : Q×ΣI → ΣO is a total output function. Given the input trace σI = x0x1 . . . ∈
ΣωI , S produces the output trace σO = S(σI) = λ(q0, x0)λ(q1, x1) . . . ∈ ΣωO, where
qi+1 = δ(qi, xi) for all i ≥ 0. That is, in every time step i, the Mealy machine
reads the input letter xi ∈ ΣI , responds with an output letter λ(qi, xi) ∈ ΣO,
and updates its state to qi+1 = δ(qi, xi). A Mealy machine can directly model
synchronous hardware designs, but also other systems with inputs and outputs
evolving in discrete time steps. We write Mealy(I,O) for the set of all Mealy
machines with inputs I and outputs O.
Moore machines We use Moore machines to describe test strategies. A Moore
machine is a special Mealy machine with ∀q ∈ Q .∀x, x′ ∈ ΣI . λ(q, x) = λ(q, x′).
That is, λ(q, x) is insensitive to x, i.e., becomes a function λ : Q → ΣO. This
means that the input xi at step i can affect the next state qi+1 and thus the next
output λ(qi+1) but not the current output λ(qi). We write Moore(I,O) for the
set of all Moore machines with inputs I and outputs O.
Composition Given Mealy machines S1 = (Q1, q0,1, 2I , 2O1 , δ1, λ1) ∈ Mealy(I,O1)
and S2 = (Q2, q0,2, 2I∪O1 , 2O2 , δ2, λ2) ∈ Mealy(I ∪O1, O2), we write S = S1 ◦ S2
for their sequential composition S = (Q1×Q2, (q0,1, q0,2), 2I , 2O1∪O2 , δ, λ), where
S ∈ Mealy(I,O1 ∪ O2) with δ
(
(q1, q2), x
)
=
(
δ1(q1, x), δ2(q2, x ∪ λ1(q1, x))
)
and
λ
(
(q1, q2), x
)
= λ1(q1, x) ∪ λ2
(
q2, x ∪ λ1(q1, x)
)
. Note that x ∈ 2I .
Systems and test strategies A reactive system S is a Mealy machine. An (adaptive)
test strategy is a Moore machine T = (T, t0, ΣO, ΣI , ∆,Λ) with input and output
alphabet swapped. That is, T produces values for input signals and reacts to
values of output signals. A test strategy T can be run on a system S as follows.
In every time step i (starting with i = 0), T first computes the next input
xi = Λ(ti). Then, the system computes the output yi = λ(qi, xi). Finally, both
machines compute their next state ti+1 = ∆(ti, yi) and qi+1 = δ(qi, xi). We
write σ(T ,S) = (x0 ∪ y0)(x1 ∪ y1) . . . ∈ Σω for the resulting execution trace. If
T = (T, t0, 2O′ , ΣI , ∆,Λ) ∈ Moore(O′, I) can observe only a subset O′ ⊆ O of
the outputs, we define σ(T ,S) with ti+1 = ∆(ti, yi ∩ O′). A test suite is a set
TS ⊆ Moore(O, I) of adaptive test strategies.
Realizability A Mealy machine S ∈ Mealy(I,O) realizes an LTL formula ϕ, written
S ||=ϕ, if ∀M ∈ Moore(O, I) . σ(M,S) |= ϕ. An LTL formula ϕ is Mealy-realizable
if there exists a Mealy machine that realizes it. A Moore machineM∈ Moore(I,O)
realizes ϕ, written M||=ϕ, if ∀S ∈ Mealy(O, I) . σ(M,S) |= ϕ. A model checking
procedure checks if a given Mealy (Moore) machine S (M) realizes an LTL
specification ϕ and returns true iff S ||=ϕ (M||=ϕ) holds. We denote the call of
a model checking procedure by modelcheck
(S, ϕ) (modelcheck(M, ϕ)).
Reactive synthesis We use reactive synthesis to compute test strategies. A reactive
(Moore, LTL) synthesis procedure takes as input a set I of Boolean inputs, a set O
of Boolean outputs, and an LTL specification ϕ over these signals. It produces a
Moore machineM∈ Moore(I,O) that realizes ϕ, or the message unrealizable if no
such Moore machine exists. We denote this computation byM = synt(I,O, ϕ). A
synthesis procedure with partial information is defined similarly, but takes a subset
I ′ ⊆ I of the inputs as an additional argument. As output, the synthesis procedure
produces a Moore machine M′ = syntp(I,O, ϕ, I ′) with M′ ∈ Moore(I ′, O) that
realizes ϕ while only observing the inputs I ′, or the message unrealizable if no such
Moore machine exists. We assume that both synthesis procedure, synt and syntp,
can be called incrementally with an additional parameter Θ, where Θ denotes a set
of Moore machines. The incremental synthesis procedures M = synt(I,O, ϕ,Θ)
andM′ = syntp(I,O, ϕ, I ′, Θ) compute Moore machinesM andM′, respectively,
as before but with the additional constraints that M,M′ 6∈ Θ.
Fault versus failure A Mealy machine S ∈ Mealy(I,O) is faulty with respect to
LTL formula ϕ (specification) iff S 6||=ϕ, i.e., ∃M ∈ Moore(O, I) . σ(M,S) 6|= ϕ.
S ′ |= ϕ F |= δo
′
iI oi
o1
. . .
on
. . .
SUT S
Fig. 5: Coverage goal illustration for fault.
We call a trace σ(M,S) that uncovers a faulty behavior of S a failure and a
deviation between S and any correct realization S ′, i.e., S ′ ||=ϕ, a fault. For a
fixed faulty S, there are multiple correct S ′ that realize ϕ and thus a fault in S
can be characterized by multiple, different ways. As a simplification, we assume
that in practice every faulty S is close to a correct S ′ and only deviates in a
simple fault. In the next section, we will show how this idea can be leveraged to
determine test suites independent of the implementation and the concrete fault
manifestation.
5 Synthesis of Adaptive Test Strategies
This section presents our approach for synthesizing adaptive test strategies for
reactive systems specified in LTL. First, we elaborate on the coverage objective
we aim to achieve. Then we present our strategy synthesis algorithm. Finally, we
discuss extensions and variants of the algorithm.
5.1 Coverage Objective for Test Strategy Computation
Many coverage metrics [36] exist to assess the quality of a test suite. Since the goal
in testing is to detect bugs, we follow a fault-centered approach: a test suite has
high quality if it reveals certain kinds of faults in a system. As illustrated in Fig. 5,
we assume that our SUT is “almost correct”, i.e., it is composed of a correct
implementation S ′ of the specification ϕ, but with a fault F that affects one of
the outputs. In order to make our approach flexible, we allow the user to define
the considered faults as an LTL formula δ. Through δ, the user can define both
permanent and transient faults of various types. For instance, δ = F(oi ↔ ¬o′i)
describes a bit-flip that occurs at least once, GF¬oi models a stuck-at-0 fault that
occurs infinitely often, and G(X(oi)↔ o′i) models a permanent shift by one time
step. We strive for a test suite that reveals every fault that satisfies δ for every
realization of ϕ. This renders the test suite independent of the implementation
and the concrete fault manifestation. The following definition formalizes this
intuition into a coverage objective.
Definition 1. A test suite TS ⊆ Moore(O, I) for a system with inputs I, outputs
O, and specification ϕ is universally complete3 with respect to a given fault model
δ iff
∀oi ∈ O . ∀S ′ ∈ Mealy(I,O ∪ {o′i} \ {oi}) .
∀F ∈ Mealy(I ∪O ∪ {o′i} \ {oi}, {oi}) .∃T ∈ TS .((S ′ ||=ϕ[oi ← o′i] ∧ F ||=δ)→ (σ(T ,S ′ ◦ F ) 6|= ϕ)). (5)
That is, for every output oi, system S ′ ||=ϕ[oi ← o′i], and fault F ||= δ, TS
must contain a test strategy T that reveals the fault by causing a specification
violation (Fig. 5). Note that the test strategies T ∈ TS ⊆ Moore(O, I) cannot
observe the signal o′i. The reason is that this signal o
′
i does not exist in the real
system implementation(s) on which we run our tests — it was only introduced
to define our coverage objective.
There can be an unbounded number of system realizations S ′ ||=ϕ[oi ← o′i]
and faults F ||= δ. Computing a separate test strategy for each combination is
thus not a viable option. We rather strive for computing only one test strategy
per output variable.
Theorem 1. A universally complete test suite TS ⊆ Moore(O, I) with respect to
fault model δ exists for a system with inputs I, outputs O, and specification ϕ if
∀oi ∈ O . ∃T ∈ Moore(O, I) .∀S ∈ Mealy(I,O ∪ {o′i}) .
σ(T ,S) |= ((ϕ[oi ← o′i] ∧ δ)→ ¬ϕ). (6)
Proof. Equation 6 implies
∀oi ∈ O . ∀S ∈ Mealy(I,O ∪ {o′i}) .∃T ∈ Moore(O, I) .(S ||=ϕ[oi ← o′i] ∧ δ)→ (σ(T ,S) 6|= ϕ) (7)
because (a) going from ∃T ∀S to ∀S∃T can only make the formula weaker, and
(b) S ||=ϕ[oi ← o′i]∧ δ implies σ(T ,S) |= ϕ[oi ← o′i]∧ δ for all T , which can only
make the left side of the implication stronger. In turn, Equation 7 is equivalent
to
∀oi ∈ O . ∀S ′ ∈ Mealy(I,O ∪ {o′i} \ {oi}) .
∀F ∈ Mealy(I ∪O ∪ {o′i} \ {oi}, {oi}) .∃T ∈ Moore(O, I) .(S ′ ||=ϕ[oi ← o′i] ∧ F ||=δ)→ (σ(T ,S ′ ◦ F ) 6|= ϕ) (8)
because for a given S ′ ||=ϕ[oi ← o′i] and F ||=δ from Equation 8 we can define
an equivalent system S = (S ′ ◦ F ) ∈ Mealy(I,O ∪ {o′i}) for Equation 7 such
3 The word “complete” indicates that every considered fault is revealed at every output.
The word “universal” indicates that this is achieved for every (otherwise correct)
system.
¬i i
¬o true
o
Fig. 6: Test strategy T5.
that S ||=ϕ[oi ← o′i] ∧ δ is satisfied. Also, for a given S ||=ϕ[oi ← o′i] ∧ δ from
Equation 7 we can define a corresponding S ′ ||=ϕ[oi ← o′i] and F ||=δ by stripping
off different outputs.
Theorem 1 states that Equation 6 is a sufficient condition for a universally
complete test suite to exist. If it were also a necessary condition, then computing
one test strategy per output signal would be enough. Unfortunately, this is not
the case in general.
Example 1. Consider a system with input I = {i}, output O = {o}, and spec-
ification ϕ =
(
G(i → G i) ∧ F i) → (G(o → G o) ∧ F o ∧ G(i ∨ ¬o)). The left
side of the implication assumes that the input i is set to true at some point,
after which i remains true. The right side requires the same for the output o.
In addition, o must not be raised while i is still false. This specification is real-
izable (e.g., by always setting o = i). The test suite TS = {T5} with T5 shown
in Fig. 6 is universally complete with respect to fault model δ = F(o ↔ ¬o′),
which requires the output to flip at least once: as long as i is false, any correct
system implementation S ′ ∈ Mealy({i}, {o′}) ||=ϕ[oi ← o′i] must keep the output
o′ = false. Eventually, F ||=δ must flip the output o to true. When this happens,
i is set to true by T5 so that the resulting trace σ(T ,S ′ ◦ F ) violates ϕ. Still,
Equation 6 is false4. Strategy T5 does not satisfy Equation 6 because for the
system S ∈ Mealy({i}, {o, o′}) that sets o′ = true and o = false in all time steps,
we have σ(T5,S) |=
(
ϕ[oi ← o′i] ∧ δ ∧ ϕ
)
. The reason is that i stays false, so
ϕ[oi ← o′i] and ϕ are vacuously satisfied by σ(T5,S). The formula δ is satisfied
because o ↔ ¬o′ holds in all time steps. Thus, S is a counterexample to T5
satisfying Equation 6. Similar counterstrategies exist for all other test strategies.
The fact that Equation 6 is not a necessary condition for a universally complete
test suite to exist is somewhat surprising, especially in the light of the following
two lemmas. Based on these lemmas, the subsequent propositions will show that
Equation 6 is both sufficient and necessary (i.e., one test per output is enough)
for many interesting cases.
Lemma 1. For every LTL specification ψ over some inputs I and outputs O, we
have that ∃T ∈ Moore(O, I) .∀S ∈ Mealy(I,O) . σ(T ,S) |= ψ holds if and only if
∀S ∈ Mealy(I,O) .∃T ∈ Moore(O, I) . σ(T ,S) |= ψ holds.
4 This is (at least partially) confirmed by our test strategy synthesis tool: it reports
that no test strategy with less than 12 states can satisfy Equation 6.
Proof. Synthesis from LTL specifications under complete information is (finite
memory) determined [35], which means that either ∃T ∈ Moore(O, I) .∀S ∈
Mealy(I,O) . σ(T ,S) |= ψ or ∃S ∈ Mealy(I,O) .∀T ∈ Moore(O, I) . σ(T ,S) |=
¬ψ holds, but not both. Less formal we can say that either there exists a test
strategy T that satisfies ψ for all systems S, or there exists a system S that can
violate ψ for all test strategies T . From that, it follows that
∃T ∈ Moore(O, I) .∀S ∈ Mealy(I,O) . σ(T ,S) |= ψ
iff ¬∃S ∈ Mealy(I,O) .
∀T ∈ Moore(O, I) . σ(T ,S) |= ¬ψ
iff ∀S ∈ Mealy(I,O) .∃T ∈ Moore(O, I) . σ(T ,S) |= ψ.
Lemma 2. For all LTL specifications A,G over inputs I and outputs O, we
have that
∀S ∈ Mealy(I,O) .∃T ∈ Moore(O, I) .
(S ||=A)→ (σ(T ,S) |= G) (9)
iff ∀S ∈ Mealy(I,O) .∃T ∈ Moore(O, I) .
σ(T ,S) |= (A→ G). (10)
Proof. Direction ⇒: We show that Equation 10 being false contradicts with
Equation 9 being true.
¬∀S ∈ Mealy(I,O) .∃T ∈ Moore(O, I) .
σ(T ,S) |= (A→ G)
iff ∃S ∈ Mealy(I,O) .∀T ∈ Moore(O, I) .
σ(T ,S) |= (A ∧ ¬G)
iff ∃S ∈ Mealy(I,O) .S ||=(A ∧ ¬G), which implies
∃S ∈ Mealy(I,O) .∀T ∈ Moore(O, I) .
(S ||=A) ∧ (σ(T ,S) |= ¬G).
Direction ⇐: Using the LTL semantics, we can rewrite σ(T ,S) |= (A → G)
in Equation 10 as
(
σ(T ,S) |= A) → (σ(T ,S) |= G). Since S ||= A implies
σ(T ′,S) |= A for every T ′ ∈ Moore(I,O), the assumption in Equation 9 is not
weaker, so Equation 9 is not stronger.
These two lemmas state that quantifiers can be swapped and that assuming
σ(T ,S) |= A is equivalent to assuming (S ||= A) for the case where T has
full information about the outputs of S. Yet, in our setting, test strategies T ∈
Moore(O, I) have incomplete information about the system S ∈ Mealy(I,O∪{o′i})
because they cannot observe o′i. Still, T must enforce (ϕ[oi ← o′i] ∧ δ) → ¬ϕ,
which refers to this hidden signal. Thus, Lemma 1 and 2 cannot be applied to
Equation 6 in general. However, in cases where there is (effectively) no hidden
information, the lemmas can be used to prove that Equation 6 is both a necessary
and a sufficient condition for a universally complete test suite to exist. The
following propositions show that this holds for many cases of practical interest.
The intuitive reason is that ϕ[oi ← o′i] can be rewritten to ϕ[oi ← ψ] in
Equation 6, which eliminates the hidden signal such that Lemma 1 and 2 can be
applied.
Proposition 1. Given a fault model of the form δ = G(o′i ↔ ψ), where ψ is an
LTL formula over I and O, a universally complete test suite TS ⊆ Moore(O, I)
with respect to δ, I, O, and ϕ exists if and only if Equation 6 holds.
Proof. ϕ[oi ← o′i] ∧ G(o′i ↔ ψ) is equivalent to ϕ[oi ← ψ] ∧ G(o′i ↔ ψ). Thus,
Equation 6 becomes
∀oi ∈ O . ∃T ∈ Moore(O, I) .∀S ∈ Mealy(I,O ∪ {o′i}) .
σ(T ,S) |= ((ϕ[oi ← ψ] ∧ G(o′i ↔ ψ))→ ¬ϕ),
which is equivalent to
∀oi ∈ O . ∃T ∈ Moore(O, I) .∀S ∈ Mealy(I,O) .
σ(T ,S) |= (ϕ[oi ← ψ]→ ¬ϕ)
Because of the G operator, a unique value for o′i exist in all time steps and thus,
o′i is just an abbreviation for ψ. Whether this abbreviation o
′
i is available as
output of S or not is irrelevant, because T cannot observe o′i anyway. Since o′i
no longer occurs, Lemma 1 and Lemma 2 can be applied to prove equivalence
between Equation 6 and
∀oi ∈ O . ∀S ∈ Mealy(I,O) .∃T ∈ Moore(O, I) .
(S ||=ϕ[oi ← ψ])→ σ(T ,S) 6|= ϕ.
As T cannot observe o′i, it is irrelevant whether the truth value of ψ is available
as additional output o′i of S or not. Hence, the above formula is equivalent to
∀oi ∈ O . ∀S ∈ Mealy(I,O ∪ {o′i}) .∃T ∈ Moore(O, I) .
(S ||=(ϕ[oi ← ψ] ∧ G(o′i ↔ ψ))→ σ(T ,S) 6|= ϕ
and
∀oi ∈ O . ∀S ∈ Mealy(I,O ∪ {o′i}) .∃T ∈ Moore(O, I) .
(S ||=(ϕ[oi ← o′i] ∧ δ)→ σ(T ,S) 6|= ϕ,
i.e., to Equation 7. The remaining steps can be taken from the proof of Theorem 1.
Proposition 1 entails that computing one test strategy per output oi ∈ O is
enough for fault models such as permanent bit flips (defined by δ = G(o′i ↔ ¬oi)).
Proposition 2. If the fault model δ does not reference o′i, a universally complete
test suite TS ⊆ Moore(O, I) with respect to δ, I, O, and ϕ exists iff Equation 6
holds.
Proof. We show that Equation 6 holds if and only if Equation 7 holds. The
remaining steps have already been proven for Theorem 1.
Lemma 3. Equation 6 holds if and only if
∀oi ∈ O . ∃T ∈ Moore(O, I) .∀S ∈ Mealy(I,O) .
σ(T ,S) |= (δ → ¬ϕ). (11)
Proof. Direction ⇐ is obvious because Equation 6 contains stronger assumptions
(and ∀S ∈ Mealy(I,O) can be changed to ∀S ∈ Mealy(I,O∪{o′i}) in Equation 11
because δ → ¬ϕ does not contain o′i).
Direction ⇒: We show that Equation 11 being false contradicts with Equation 6
being true.
¬∀oi ∈ O . ∃T ∈ Moore(O, I) .
∀S ∈ Mealy(I,O) . σ(T ,S) |= (δ → ¬ϕ) (12)
iff ∃oi ∈ O . ∀T ∈ Moore(O, I) .
∃S ∈ Mealy(I,O) . σ(T ,S) |= (δ ∧ ϕ) (13)
iff ∃oi ∈ O . ∃S ∈ Mealy(I,O) .
∀T ∈ Moore(O, I) . σ(T ,S) |= (δ ∧ ϕ) (14)
iff ∃oi ∈ O . ∃S ∈ Mealy(I,O) .S ||=(δ ∧ ϕ) (15)
iff ∃oi ∈ O . ∃S ′ ∈ Mealy(I,O ∪ {o′i}) .
S ′ ||=(ϕ[oi ← o′i] ∧ δ ∧ ϕ), (16)
iff ∃oi ∈ O . ∃S ′ ∈ Mealy(I,O ∪ {o′i}) .
∀T ∈ Moore(O ∪ {o′i}, I) .
σ(T ,S) |= (ϕ[oi ← o′i] ∧ δ ∧ ϕ), (17)
iff ∃oi ∈ O . ∀T ∈ Moore(O ∪ {o′i}, I) .
∃S ′ ∈ Mealy(I,O ∪ {o′i}) .
σ(T ,S) |= (ϕ[oi ← o′i] ∧ δ ∧ ϕ), (18)
⇒ ∃oi ∈ O . ∀T ∈ Moore(O, I) .
∃S ′ ∈ Mealy(I,O ∪ {o′i}) .
σ(T ,S) |= (ϕ[oi ← o′i] ∧ δ ∧ ϕ), (19)
which contradicts Equation 6. (13)⇔(14) holds because of Lemma 1 and (Equa-
tion 15)⇔(Equation 16) holds because δ ∧ ϕ does not contain o′i, so S ′ can
be S with o′i ↔ oi. (Equation 17)⇔(Equation 18) holds because of Lemma 1.
Finally, (Equation 18) implies (Equation 19) because T has less information in
(Equation 19).
Lemma 4. Equation 11 holds if and only if Equation 7 holds.
Proof. Direction ⇒: is obvious because Equation 11 is equivalent to Equation 6
(Lemma 3) and Equation 6 implies Equation 7 (see proof for Theorem 1).
Direction ⇐: we show that Equation 11 being false contradicts Equation 7
being true. Equation 11 being false implies Equation 16 (see above). As S ′ ||=
(ϕ[oi ← o′i] ∧ δ ∧ ϕ) implies (S ′ ||= ϕ[oi ← o′i] ∧ δ) ∧
(
σ(T ,S) |= ϕ) for all
T ∈ Moore(O∪{o′i}, I) and thus also for all T ∈ Moore(O, I), Equation 7 cannot
hold.
Thus, the assumption S ′ ||=ϕ[oi ← o′i] can be dropped from Equation 5 if
the fault model does not reference o′i. Correspondingly, σ(T ,S) |=
(
(ϕ[oi ←
o′i] ∧ δ)→ ¬ϕ
)
simplifies to σ(T ,S) |= (δ → ¬ϕ) in Equation 6. Since o′i is now
gone, Lemma 1 and 2 apply. In general, the assumption S ′ ||=ϕ[oi ← o′i] is needed
to prevent a faulty system S ′ 6||=ϕ[oi ← o′i] from compensating the fault F ||=δ
such that S ′ ◦ F ||=ϕ. E.g., for I = ∅, O = {o}, ϕ = G o and δ = G(o ↔ ¬o′),
Equation 5 would be false without S ′ ||=ϕ[oi ← o′i] because there exists an S ′ that
always sets o′ = false, in which case S ′ ◦ F has o correctly set to true. However,
if δ does not reference o′, such a fault compensation is not possible.
Proposition 2 applies to permanent or transient stuck-at-0 or stuck-at-1 faults
(e.g., δ = F¬oi or δ = GF oi), but also to faults where oi keeps its previous value
(e.g., δ = F(oi ↔ X(oi)) or takes the value of a different input or output (e.g.,
δ = GF(oi ← i3)). Together with Proposition 1, it shows that computing one test
strategy per output is enough for many interesting fault models. Finally, even
if neither Proposition 1 nor Proposition 2 applies, computing one test strategy
per output may still suffice for the concrete ϕ and δ at hand. In the next section,
we thus rely on Equation 6 to compute one test strategy per output in order to
obtain universally complete test suites.
5.2 Test Strategy Computation
Basic idea Our test case generation approach builds upon Theorem 1: for every
output oi ∈ O, we want to find a test strategy Ti ∈ Moore(O, I) such that
∀S ∈ Mealy(I,O ∪ {o′i}) . σ(Ti,S) |=
(
(ϕ[oi ← o′i] ∧ δ) → ¬ϕ
)
holds. Recall
from Section 4 that a synthesis procedure M = syntp(I,O, ψ, I ′, Θ) with partial
information computes a Moore machine M ∈ Moore(I ′, O) \ Θ with I ′ ⊆ I
such that a certain LTL objective ψ is enforced in all environments, i.e., ∀S ∈
Mealy(O, I) . σ(M,S) |= ψ. If no such M exists, syntp returns unrealizable. Also
recall that a test strategy is a Moore machine with input and output signals
swapped. We can thus call Ti := syntp
(
O ∪ {o′i}, I, (ϕ[oi ← o′i] ∧ δ)→ ¬ϕ,O,Θ
)
for every output oi ∈ O in order to obtain a universally complete test suite with
respect to fault model δ for a system with inputs I, outputs O, and specification
ϕ. If syntp succeeds (does not return unrealizable) for all oi ∈ O, the resulting test
suite TS = {Ti | oi ∈ O} is guaranteed to be universally complete. However, since
Theorem 1 only gives a sufficient but not a necessary condition, this procedure
may fail to find a universally complete test suite, even if one exists, in general. In
Fault frequency
Fault kind
𝐆
𝐅
Globally
(permanent fault)
Eventually
(fault occurs only once)
Arbitrary 
intermediate 
stage…
…Stuck-at-0/1 faults Bit-flip Delay fault
𝐹(𝑜𝑖 ↔ ¬𝑜𝑖′)
𝐺(𝑜𝑖 ↔ ¬𝑜𝑖′)
𝐹(𝑜𝑖 = 0/1)
𝐺(𝑜𝑖 = 0/1)
𝐹(𝑋(𝑜𝑖) ↔ 𝑜𝑖 ′)
𝐺(𝑋(𝑜𝑖) ↔ 𝑜𝑖′)
𝐺(𝐹(𝑜𝑖 ↔ ¬𝑜𝑖′)) ……
⋮
⋮
Fig. 7: Relationship between fault kind and fault frequency.
cases where Proposition 1 or Proposition 2 applies, it is both sound and complete,
though.
Fault models In order to simplify the user input, we split the fault model δ in
our coverage objective from Definition 1 into two parts: the fault kind κ and the
fault frequency frq (Fig. 7 illustrates the relationship). The fault kind κ is an
LTL formula that is given by the user and defines which faults we consider. For
instance, κ = ¬oi describes a stuck-at-0 fault, κ = oi ↔ ¬o′i defines a bit-flip,
and κ = o′i ↔ X(oi) describes a delay by one time step. The fault frequency frq
describes how often a fault of the specified kind occurs, and is chosen by our
algorithm, unless it is specified by the user. We distinguish 4 fault frequencies,
which we describe using temporal LTL operators.
– Fault frequency G means that the fault is permanent.
– Frequency FG means that the fault occurs from some time step i on per-
manently. Yet, we do not make any assumptions about the precise value of
i.
– Frequency GF states that the fault strikes infinitely often, but not when
exactly.
– Frequency F means that the fault occurs at least once.
The fault model δ is then defined as δ = frq(κ). Note that there is a natural
order among our 4 fault frequencies: a fault of kind κ that occurs permanently
(frequency G) is just a special case of the same fault κ occurring from some point
onwards (frequency FG), which is in turn a special case of κ occurring infinitely
often (frequency GF), which is a special case of κ occurring at least once. Thus,
a test strategy that reveals a fault that occurs at least once (without knowing
when) will also reveal a fault that occurs infinitely often, etc. We say that F is the
lowest and G is the highest fault frequency. In our approach, we thus compute
Algorithm 1 SyntLtlTest: Synthesizes a universally complete test suite from
an LTL specification for all outputs in O
1: procedure SyntLtlTest(I,O, ϕ, κ), returns: A set TS of test strategies
2: TS := ∅
3: for each oi ∈ O do
4: TS := TS ∪ SyntLtlIterate(I,O, ϕ, oi, κ, ∅);
5: return TS
Algorithm 2 SyntLtlIterate: Synthesize an adaptive test strategy from an
LTL specification with the lowest fault occurrence frequency
1: procedure SyntLtlIterate(I,O, ϕ, oi, κ, Θ), returns: A singleton {T } with a
test strategy T on success or ∅
2: for each frq from (F,GF,FG,G) in this order do
3: T := syntp
(
O ∪ {o′i}, I,
(
ϕ[oi ← o′i] ∧ frq(κ)
)
→ ¬ϕ,O,Θ
)
4: if T 6= unrealizable then
5: return {T };
6: return ∅
test strategies to detect faults at the lowest frequency for which a test strategy
can be found.
Algorithm The procedure SyntLtlTest in Algorithm 1 formalizes our approach
using the procedure SyntLtlIterate in Algorithm 2 as a helper. The input
consists of (1) the inputs I of the SUT, (2) the outputs O of the SUT, (3) an LTL
specification ϕ of the SUT, and (4) a fault kind κ. The result of SyntLtlTest is
a test suite TS. The algorithm iterates over all outputs oi ∈ O (Line 3) and invokes
the procedure SyntLTLIterate (Line 4). The procedure SyntLTLIterate
then iterates over the 4 fault frequencies (Line 2), starting with the lowest one,
and attempts to compute a strategy to reveal a fault (Line 3). If such a strategy
exists, it is returned to Algorithm 1 and added to TS. Otherwise, the procedures
proceeds with the next higher fault frequency.
Sanity checks Note that our coverage goal in Equation 5 is vacuously satisfied
by any test suite if ϕ or δ is unrealizable. The reason is that the test suite must
reveal every fault F realizing δ for every system S ′ realizing ϕ. If there is no
such fault or system, this is trivial. As a sanity check, we thus test the (Mealy)
realizability of ϕ and Gκ before starting Algorithm 1 (because if Gκ is realizable,
then so are FGκ, GFκ and Fκ).
Handling unrealizability If, for some output, Line 3 of Algorithm 2 returns
unrealizable for the highest fault frequency frq = G, we print a warning and
suggest that the user examines these cases manually. There are two possible
reasons for unrealizability. First, due to limited observability, we do not find a
test strategy although one exists (see Example 1). Second, no test strategy exists
because there is some S ′ ||= ϕ[oi ← o′i] and F ||= δ such that the composition
S = S ′ ◦ F (see Fig. 5) is correct, i.e., S ′ ◦ F ||= ϕ. In other words, for some
realization, adding the fault may result in an equivalent mutant in the sense
that the specification is still satisfied. For example, in case of a stuck-at-0 fault
model, there may exist a realization of the specification that has the considered
output oi ∈ O fixed to false. Such a high degree of underspecification is at
least suspicious and may indicate unintended vacuities [7] in the specification
ϕ, which should be investigated manually. If Proposition 1 or 2 applies, or if
synt
(
O∪{o′i}, I,
(
ϕ[oi ← o′i]∧G(κ)
)→ ¬ϕ,Θ) returns unrealizable, we can be sure
that the second reason applies. Then, we can even compute additional diagnostic
information in the form of two Mealy machines S ′ ||=ϕ[oi ← o′i] and F ||=δ (by
synthesizing some Mealy machine S ||=(ϕ[oi ← o′i] ∧ G(κ) ∧ ϕ) and splitting it
into S ′ and F by stripping off different outputs). The user can then try to find
inputs for S ′ ◦ F such that the resulting trace violates the specification. Failing
to do so, the user will understand why no test strategy exists (see also [31]).
For cases where the specification is as intended but no test strategy exists, we
can follow the approach by Faella [18,19] to synthesize best-effort strategies that
are not guaranteed to cause a specification violation but at least do not give up
trying. But we leave this extension for future work.
Complexity Both syntp(O, I, ψ,O
′, Θ) and synt(O, I, ψ,Θ) are 2EXPTIME com-
plete in |ψ| [32], so the execution time of Algorithm 2, and consequently also
Algorithm 1, are at most doubly exponential in |ϕ|+ |κ|.
Theorem 2. For a system with inputs I, outputs O, and LTL specification ϕ
over I∪O, if the fault kind κ is of the form κ = ψ or κ = (o′i ↔ ψ), where ψ is an
LTL formula over I and O, SyntLtlTest(I,O, ϕ, κ) will return a universally
complete test suite with respect to the fault model δ = G(κ) if such a test suite
exists.
Proof. Since G(κ) implies frq(κ) for all frq ∈ {F,GF,FG,G}, Theorem 1 and the
guarantees of syntp entail that the resulting test suite TS is universally complete
with respect to δ = G(κ) if |TS| = |O|, i.e., if SyntLtlTest found a strategy for
every output. It remains to be shown that |TS| = |O| for κ = ψ or κ = (o′i ↔ ψ)
if a universally complete test suite for δ = G(κ) exists: either Proposition 1 or
Proposition 2 states that Equation 6 holds with δ = G(κ). Thus, syntp cannot
return unrealizable in SyntLtlIterate with frq = G, so |TS| must be equal to
|O| in this case.
Theorem 2 states that SyntLtlTest is not only sound but also complete
for many interesting fault models such as stuck-at faults or permanent bit-flips.
For κ = ψ, Theorem 2 can even be strengthened to hold for all δ = frq(κ) with
frq ∈ {F,GF,FG,G}.
5.3 Extensions and Variants
A test suite computed by SyntLtlTest for specification ϕ and fault model δ is
universally complete and detects all faults with respect to ϕ and δ independent
i ¬i
∗
∗
i/o
¬i/¬o
∗/o
∗/¬o
Fig. 8: Test strategy T6 and a faulty system implementation of the specification
ϕ = G((i↔ X(¬i))→ X(o)).
of the implementation and the concrete fault manifestation if the fault manifests
at one of the observable outputs as illustrated in Fig. 5.
In this section, we discuss some alternatives and extensions of our approach
to improve fault coverage and performance.
User-specified fault frequencies Besides the four fault frequencies (G, FG, GF, and
F), other fault frequencies (with different precedences) may be of interest, e.g., if
a specific time step is of special interest. Algorithm 2 supports full LTL and thus
the procedure can be extended by replacing Line 2 by “for each frq from Frq in
this order”, where Frq is an additional parameter provided by the user.
Faults at inputs In the fault model in the previous section, we only consider
faults at the outputs. However, considering SUTs that behave as if they would
have read a faulty input is possible as well (by changing Line 3 in Algorithm 1
to “for each o ∈ I ∪O do”).
Multiple faults Faults that occur simultaneously at multiple (inputs or) outputs
{o1, . . . , ok} ⊆ O can be considered by computing a test strategy
T := syntp
(
O ∪ {o′1, . . . , o′k}, I, (ϕ[o1 ← o′1, . . . , ok ← o′k] ∧
k∧
i=1
δi)→ ¬ϕ,O,Θ
)
,
where the fault model δi can be different for different outputs oi ∈ {o1, . . . , ok}.
Faults within a SUT If a fault manifests in a conditional fault in a system
implementation, a universally complete TS may not be able to uncover the fault
(see Example 2).
Example 2. Consider a system with input I = {i}, output O = {o}, and spec-
ification ϕ = G((i ↔ X(¬i)) → X(o)). The specification enforces o to be set to
true whenever input i alternates between true and false in consecutive time steps.
Consider a stuck-at-0 fault δ = GF¬o at the output o. The test suite TS = {T6}
with the test strategy T6 illustrated in Fig. 8 (on the left) is universally complete
r1
¬r2 ∗
r1
r2
∗ r1 ∗
Fig. 9: Test strategy T7 on the left, T8 in the middle and T9 on the right.
with respect to δ. The test strategy T6 flips input i in every time step and thus
forces the system to set o = true in the second time step. Now consider the
concrete and faulty system implementation in Fig. 8 (on the right) of ϕ. The test
strategy T6, when executed, first follows the bold edge and then remains forever
in the same state. As a consequence, the fault in the system implementation, i.e.,
o stuck-at-0, is not uncovered. To uncover the fault, i has to be set to false in
the initial state.
Faults within a system implementation can be considered by computing more
than one test strategy for a given test objective. We extend Algorithm 1 to
generate a bounded number b of test strategies by setting Θ = TS in Line 4 and
enclosing the line by a while-loop that uses an additional integer variable c to
count the number of test strategies generated per output oi. The while-loop
terminates if no new test strategy could be generated or if c becomes equal to b.
Note that this approach is correct in the sense that all computed test strategies
are universally complete with respect to the fault model frq(κ); however, in many
cases it is more efficient to determine the lowest fault frequency first in Line 4
of Alg. 2 and then generate multiple test strategies with the same (or higher)
frequency by enclosing Line 3 with the while-loop.
Test strategy generalization A synthesis procedure usually assigns concrete values
to all variables in every state of the generated test strategy. In many cases, however,
not all assignments are necessary to enforce a test objective (see Example 3).
Example 3. Consider a system with inputs I = {r1, r2} and outputs O = {g1, g2},
which implements the specification of a two-input arbiter ϕ = G(r1 → F g1) ∧
G(r2 → F g2) ∧ G(¬g1 ∨ ¬g2), i.e., every request ri shall eventually be granted
by setting gi to true and there shall never be two grants at the same time. A
valid test strategy T7 that tests for a stuck-at-0 fault of signal g1 from some
point in time onwards may simply set r1 = true and r2 = false all the time (see
Fig. 9). This forces the system in every time step to eventually grant this one
request by setting g1 = true. Another valid test strategy T8 sets r1 = true and
r2 = true all the time (see Fig. 9). Now the system has to grant both requests
eventually. Both T7 and T8 test for the defined stuck-at-0 fault of signal g1 from
some point in time onwards but will likely execute different paths in the SUT.
Thus, considering the more general strategy T9 (see Fig. 9) that sets r1 = true all
the time but puts no restrictions on the value of r2, allows the tester to evaluate
different paths in the SUT while still testing for the defined fault class.
Algorithm 3 Generalize: Generalize a test strategy.
1: procedure Generalize(I,O, ϕ, oi, frq, κ, T ), returns: A generalization of T
2: for each qi ∈ T do
3: for each xi ∈ ΣI do
4: T ′ := remove assignment to xi from state qi in T
5: if modelcheckl(T ′,
(
ϕ[oi ← o′i] ∧ frq(κ)
)
→ ¬ϕ) then
6: T := T ′
7: return T
The procedure in Algorithm 3 generalizes a given test strategy T by systemat-
ically removing variable assignments from states and employing a modelchecking
procedure to ensure that the generalized test strategy still enforces the same test
objective. The procedure loops in Line 2 over all states of T and in Line 3 over
all inputs. In Line 4 the assignment to the input xi in a state is removed such
that the corresponding variable becomes non-deterministic. If the resulting test
strategy still enforce the test objective, then T is replaced by its generalization.
Otherwise, the change is reverted. Algorithm 3 is integrated into Algorithm 2
and applied in Line 5 to generalize each generated test strategy.
Note that generalizing a test strategy is a a special way of computing multiple
concrete test strategies, which was discussed in the previous section. However,
generalization may fail when computing multiple strategies succeeds (by following
different paths).
Optimization for full observability If we restrict our perspective to the case
with no partial information, i.e., all signals are fully observable, we can employ
the optimization discussed in Proposition 2 to improve the performance of test
strategy generation. In Line 3 of Algorithm 2 we drop a part of the assumption
and simplify the synthesis step to Ti := synt
(
O, I, frq(κ) → ¬ϕ,Θ) for cases
in which κ does not refer to a hidden signal o′i. Also, for a fault model δ that
describes a fault of kind κ = (o′i ↔ ψ), where ψ is an LTL formula over I and O,
we can drop the part of the assumption according to Proposition 1 if frq = G.
This simplifies Line 3 of Algorithm 2 to Ti := synt
(
O, I, ϕ[oi ← ψ] → ¬ϕ,Θ
)
.
These simplifications, moreover, no longer require a synthesis procedure with
partial information and thus, a larger set of synthesis tools is supported.
Mutating the specification We can also synthesize adaptive test strategies that
would uncover bugs where the SUT implements a mutated (i.e., slightly modified)
specification ϕ′ instead of ϕ by calling T := synt(O, I, ϕ′ → ¬ϕ,Θ). The impli-
cation requires the original specification ϕ to be violated under the assumption
that the mutated specification ϕ′ has been implemented in the SUT. This variant
does not require partial information synthesis.
Other specification formalisms We worked out our approach for LTL, but it
works for other languages if (1) the language is closed under Boolean connectives
(∧,¬), (2) the desired fault models are expressible, and (3) a synthesis procedure
(with partial information) is available. These prerequisites do not only apply to
many temporal logics but also to various kinds of automata over infinite words.
6 Case Study
To evaluate our approach, we apply it in a case study on a real component
of a satellite that is currently under development. We first present the system
under test and specify a version of the respective component in LTL. Using this
specification, we compute a set of test strategies and evaluate the test suite on a
real implementation. Additional case studies can be found in [9].
6.1 Eu:CROPIS FDIR Specification
An important task of each space and satellite system is to maintain its health
state and react on failure. In modern space systems this task is encapsulated in
the Fault Detection, Isolation, and Recovery (FDIR) component, which collects
the information from all relevant sensors and on-board computers, analyzes
and assess the data in terms of correctness and health, and initiates recovery
actions if necessary. The FDIR component is organized hierarchically in multiple
levels [49] with the overall objective of maximizing the system life-time and
correct operation.
In this section, we focus on system-level FDIR and present the high-level
abstraction of a part of the FDIR mechanisms used in the Eu:CROPIS satellite
mission as a case-study for adaptive test strategy generation. On the system-level,
the FDIR mechanism deals with coarse-granular anomalies of the system behavior
like erroneous sensor data or impossible combinations of signals. Likewise the
recovery actions are limited to restarting certain sub-systems, switching between
redundant sub-systems if available, or switching into the satellite’s safe mode.
The FDIR component is highly safety- and mission-critical; if recovery on this
level fails, in many cases the mission has to be considered lost.
Eu:CROPIS FDIR In Fig. 10 we illustrate where the FDIR component for the
magnetic torquers of the Eu:CROPIS on-board computing system is placed in
practice and in Fig. 11, we give a high-level overview of the FDIR component
and its environment. The FDIR component regularly obtains housekeeping
information from two redundantly-designed control units, S1 and S2, which
control the magnetic torquers of the satellite, and interacts with them via
the electronic power system, EP. The control units S1 and S2 have the same
functionality, but only one of them is active at any time. The other control unit
serves as a backup that can be activated if necessary. The FDIR component
signals the activation (or deactivation) of a control unit to the EP which regulates
the power supply.
We distinguish two types of errors, called non-critical error and severe error,
signaled to the FDIR component via housekeeping information. In case of a non-
critical error, two recovery actions are allowed. Either the erroneous control unit
output
input
Satellite software
Commands Data & housekeeping
Ground
segment
System FDIR
Specification
Fault model
output
input
Satellite software
Commands Data & housekeeping
Ground
segment
Adaptive
test 
strategy
FDIR
Synthesis
Fig. 10: FDIR in practice (left) and the intended test setup (right).
EP
FDIR𝑆1
𝑆2
System
on1
off1
on2
off2
mode1
mode2
errnc
errs
reset
safemode
Fig. 11: High-level overview of the satellite software of Eu:CROPIS.
is disabled for a short time and enabled afterwards again or the erroneous control
unit is disabled and the redundant control unit is activated to take over its task.
In case of the severe error, however, only the latter recovery action is allowed,
i.e., the erroneous control unit has to be disabled and the redundant control
unit has to be activated. If this happens more than once and the redundant
control unit as well shows erroneous behavior, the FDIR component initiates a
switch of the satellite mode into safe mode. The safe mode is a fall-back satellite
mode designed to give the operators on ground the maximum amount of time to
analyze and fix the problem. It is only invoked once a problem cannot be solved
on-board and requires input from the operators to restore nominal operations.
LTL specification We model the specification of the FDIR component in LTL.
Let IFDIR = {mode1, mode2, errnc, errs, reset} and OFDIR = {on1, off1, on2,
Table 1: Descriptions of inputs and outputs of the FDIR component.
Boolean variable Description
mode1 true iff S1 is activated
mode2 true iff S2 is activated
errnc true iff a non-critical error is signaled by S1 or S2
errs true iff a severe error is signaled by S1 or S2
reset true iff the FDIR component is reset
on1 true iff S1 shall be switched on
off1 true iff S1 shall be switched off
on2 true iff S2 shall be switched on
off2 true iff S2 shall be switched off
safemode true iff the FDIR component initiates the safemode
of the satellite
lastup true if the last active system was S1 and false if
the last active system was S2
allowswitch true iff a switch of S1 to S2 or S2 to S1 is allowed
off2, safemode} be the Boolean variables corresponding to the input signals and
the output signals of the FDIR component, respectively.
These Boolean variables are abstractions of the real hardware/software imple-
mentation. The values of the Boolean variables are automatically extracted from
the housekeeping information which is periodically collected from EP (mode1,
mode2) and S1 or S2 (errnc, errs). The two error variables encompass multiple
error conditions (e.g. communication timeouts, invalid responses, electrical errors
like over-current or under-voltage, etc.) which are detected by the sub-system.
The reset variable corresponds to a telecommand sent from ground to the FDIR
component. For the output direction the values of the variables are used to
generate commands which are sent to the EP or the satellite mode handling
component. Additionally, we use the auxiliary Boolean variables O′ = {lastup,
allowswitch} to model state information on specification level which does not
correspond to any real signals in the system. These auxiliary variables serve as
unobservable outputs of the FDIR component. In Table 1, we summarize the
Boolean variables involved in the specification and their meaning.
The complete LTL specification of the FDIR component consists of the
assumptions A1-A6 and the guarantees G1-G13. All properties are listed in
Table 2, expressing the following intentions:
A1 Whenever both systems are off, then there is no running system that can
have an error. Thus, the error signals have to be low as well.
A2 The error signals are mutual exclusive. If the environment enforces a reset
then both error signals have to be low, because we assume that ground control
has taken care of the errors.
A3 After a reset enforced by the environment, one of the two systems has to be
running and the other has to be off.
A4 Whenever the FDIR component sends on1, we assume that in the next time
step system number one is running (mode1) and the state of the second system
(mode2) does not change. The same assumption applies analogously for on2.
A5 Whenever the FDIR component sends off1, we assume that in the next time
step system number one is off (¬mode1) and the state of the second system
(mode2) does not change. The same assumption applies analogously for off2.
A6 We assume that the environment, more specifically the electronic power unit,
is not immediately free to change the state of the systems when there is no
message from the FDIR component. It has to wait for one more time step
(with no messages of the FDIR component).
G1 This guarantee stores which system was last activated by the FDIR compo-
nent.
G2 We require the signals on1, off1, on2 and off2 to be mutually exclusively
set to high.
G3 Whenever both systems are off, then the FDIR component eventually requests
to switch on one of the systems (on1, on2) or activates safemode or observes
a reset.
G4 We restrict the FDIR component to not enter safemode as long as the
component can switch to the backup system.
G5 The FDIR component must not request to switch on one of the systems (on1,
on2) as long as one of the systems is running.
G6 Whenever the FDIR component is not allowed anymore to switch to the
backup system, then it must not request to switch the backup system on.
G7 Once the FDIR component switches to the backup system it is not allowed
anymore to switch again (unless the environment performs a reset, see G9).
G8 As long as the FDIR component only restarts the same system it is still
allowed to switch in the future.
G9 A reset by the environment allows the FDIR component again to switch to
the backup system if required.
G10 Whenever the FDIR component is in safemode it must not request to
switch-on one of the systems (on1,on2).
G11 Once a switch is not allowed anymore and the environment does not perform
a reset, then the switch is also not allowed in the next time step.
G12 Whenever the FDIR component observes a server error (errs), it must
eventually switch to the backup system or activate safemode unless the
environment performs a reset or the error disappears by itself (without
restarting the system).
G13 Whenever the FDIR component observes a non-critical error (errnc), it
must eventually switch to the backup system or activate safemode or the
error disappears (restarting the currently running system is allowed).
6.2 Experimental Results
The test strategy computation from the specification is independent of the
implementation. Thus, we first present the experimental results of the strategies
Table 2: Temporal specification of system-level FDIR component in LTL.
Assumptions A1–A6
A1 G(¬mode2 ∧ ¬mode1 → ¬errnc ∧ ¬errs)
A2 G(¬errnc ∨ ¬errs) ∧ G(reset→ ¬errnc ∧ ¬errs)
A3 G(reset→ X(mode2 ⊕ mode1))
A4 G(¬mode1 ∧ on1 ∧ ¬off1 ∧ ¬on2 ∧ ¬off2 ∧ ¬reset ∧ ¬safemode→
X(mode1) ∧ (mode2 ↔ X(mode2)))
G(¬mode2 ∧ ¬on1 ∧ ¬off1 ∧ on2 ∧ ¬off2 ∧ ¬reset ∧ ¬safemode→
X(mode2) ∧ (mode1 ↔ X(mode1)))
A5 G(mode1 ∧ ¬on1 ∧ off1 ∧ ¬on2 ∧ ¬off2 ∧ ¬reset ∧ ¬safemode→
X(¬mode1) ∧ (mode2 ↔ X(mode2)))
G(mode2 ∧ ¬on1 ∧ ¬off1 ∧ ¬on2 ∧ off2 ∧ ¬reset ∧ ¬safemode→
X(¬mode2) ∧ (mode1 ↔ X(mode1)))
A6 G((¬(¬on2 ∧ ¬off1 ∧ ¬on1 ∧ ¬off2) ∧ X(¬on2 ∧ ¬off1 ∧ ¬on1 ∧ ¬off2)∧
(¬reset ∧ X(¬reset) ∧ ¬safemode ∧ X(¬safemode))→
X((mode2 ↔ X(mode2)) ∧ (mode1 ↔ X(mode1)))
Guarantees G1–G13
G1 G((on1 ∧ ¬on2)→ (X(lastup)))
G((¬on1 ∧ on2)→ (X(¬lastup)))
G((¬on1 ∧ ¬on2)→ (lastup↔ X(lastup)))
G2 G(on1 → ¬off1 ∧ ¬on2 ∧ ¬off2)
G(off1 → ¬on1 ∧ ¬on2 ∧ ¬off2)
G(on2 → ¬on1 ∧ ¬off1 ∧ ¬off2)
G(off2 → ¬on1 ∧ ¬on2 ∧ ¬off1)
G3 G(¬mode2 ∧ ¬mode1 → F(reset ∨ on2 ∨ on1 ∨ safemode))
G4 G(allowswitch→ ¬safemode)
G5 G((mode2 ∨ mode1)→ ¬on1 ∧ ¬on2)
G6 G(¬allowswitch ∧ lastup→ ¬on2)
G(¬allowswitch ∧ ¬lastup→ ¬on1)
G7 G(¬reset ∧ allowswitch ∧ lastup ∧ on2 → X(¬allowswitch))
G(¬reset ∧ allowswitch ∧ ¬lastup ∧ on1 → X(¬allowswitch))
G8 G((allowswitch∧¬(((lastup∧on2)∨(¬lastup∧on1))))→ X(allowswitch))
G9 G(reset→ X(allowswitch))
G10 G(safemode→ (¬on1 ∧ ¬on2))
G11 G(¬allowswitch ∧ ¬reset→ X(¬allowswitch))
G12 G((errs ∧ mode1 ∧ ¬reset)→
F(reset ∨ safemode ∨ mode2 ∨ (mode1 U (mode1 ∧ ¬errs))))
G((errs ∧ mode2 ∧ ¬reset)→
F(reset ∨ safemode ∨ mode1 ∨ (mode2 U (mode2 ∧ ¬errs))))
G13 G((errnc∧mode1∧¬reset)→ F(reset∨safemode∨mode2∨(mode1∧¬errnc)))
G((errnc∧mode2∧¬reset)→ F(reset∨safemode∨mode1∨(mode2∧¬errnc)))
Table 3: Results for the FDIR specification. The suffix “k” multiplies by 103.
Fault oi frq |T | Time Peak
Memory
[s] [MB]
S-a-0
on1 FG 4 1.2k 400
off1 FG 3 517 396
safemode FG 4 934 324
S-a-1
on1 GF 4 438 222
off1 FG 4 753 378
safemode GF 3 169 192
Bit-Flip
on1 GF 4 26k 3.6k
off1 FG 4 98.9k 4.3k
safemode GF 3 13.1k 4.3k
derived from the LTL specification of the FDIR component given in Table 2,
then we execute and evaluate the computed strategies on the implementation of
the specification in the system of the Eu:CROPIS satellite.
Test strategy computation
Experimental setting All experiments for computing the test strategies are
conducted in a virtual machine with a 64 bit Linux system using a single core of
an Intel i5 CPU running at 2.60 GHz. We use the synthesis procedure PARTY [30]
as black-box, which implements SMT-based bounded synthesis for full LTL and,
thus, we call our tool PARTYStrategy.
Test strategy computation From the previously described LTL specification, we
compute test strategies for the outputs on1, off1 and safemode of the FDIR
component considering the fault models stuck-at-0, stuck-at-1, and bit-flip with
the lowest possible fault frequencies. These are general fault assumptions and cover
faults where the specification is violated with this signal being high (stuck-at-1),
faults where the specification is violated with this signal being low (stuck-at-0)
and faults where the specification is violated with this signal having the wrong
polarity (bit-flip). We do not synthesize test strategies for the outputs on2 and
off2 because they behave identical to on1 and off1, respectively, if the role of
S1 and S2 are mutually interchanged. For synthesizing test strategies, both, the
bound for the maximal number of states of a test strategy and the bound for the
maximal number of test strategies, are set to four. We chose the bound to be
four, because for this bound there exist strategies for all our chosen fault models
and output signals. The size for the maximum number of strategies per variable
and fault model is set arbitrarily to four and could also be set to a different value.
In Table 3, we list the time and memory consumption for synthesizing the
test strategies with our synthesis tool PARTYStrategy. The more freedom there
is for implementations of the specification, the harder it becomes to compute a
strategy. The search for strategies that are capable of detecting a bit-flip is the
most difficult one as we cannot make use of our optimization for full observability
of the output signals. For all signals with a stuck-at-0 fault and for the off1 signal
with one of the other two faults we are able to derive test strategies that can
detect the fault if it is permanent from some point onwards. For the signals on1
and safemode we are able to derive strategies for stuck-at-1 faults and bit-flips
also at a lower frequency, i.e., we can detect those faults also if they occur at
least infinitely often.
Illustration of a computed strategy We illustrate and explain one derived strategy
in detail. The strategy derived for the signal safemode being stuck-at-0 computed
with PARTYStrategy consists of four states. Fig. 12 illustrates the strategy. In the
first state (state 0) we have the first system running (mode1) and set the errnc
flag, i.e., we raise a non critical error that requires the component to restart
until the error is gone or to switch to the other system. We loop in this state
until the FDIR component, if it behaves according to the specification, switches
off the running system. In the next state we (state 1) do not set any input and
wait for the FDIR component to eventually switch on one of the systems. If the
component switches on the same system, then we go back to the previous state
(state 0), if it switches on the other system we go into the next state (state 3). In
this state we have the second system running (mode2) and set again the errnc
flag, i.e., we again raise a non critical error. We loop in this state until the FDIR
component reacts and, if it conforms to the specification, switches off the running
system. Continuing according to the strategy we always raise a non critical error
whatever system the FDIR component activates. Eventually the FDIR component
has to activate safemode or violate the specification. State 2 is only entered when
the FDIR violates G5. In this state, it is irrelevant how the test strategy behaves
(as long as the assumptions are satisfied) because the specification has already
been violated (which is easy to detect during test execution).
Test strategy evaluation
Test setting In the Eu:CROPIS satellite the FDIR component is implemented in
C++. The implementation for the magnetic torquer FDIR handling is not an exact
realization of the specification in Table 2 but extends it by allowing commands
to the EP to be lost (e.g. due to electrical faults). This is accommodated by
adding timeouts for the execution of the switch-on/off commands and reissuing
the commands if the timeout is triggered.
The implementation is designed with testability and portability in mind and
uses an abstract interface to access other sub-systems of the satellite. This allows
to exchange the used interface with a set of test adapters which connect to the
signals generated by the test strategies. As we are only interested in the functional
properties of the implementation, we can run the code on a normal Linux system,
instead of the microprocessor which is used in the satellite. This gives access to
0
errnc
¬errs
mode1
¬mode2
¬reset
3
errnc
¬errs
¬mode1
mode2
¬reset
2
errnc
¬errs
mode1
mode2
¬reset
1
¬errnc
¬errs
¬mode1
¬mode2
¬reset
else
on2
off1
on1
off2
elseelse
off1
off2
else
on2
on1
Fig. 12: Test strategy that tests for a stuck-at-0 fault of signal safemode.
all Linux based debugging and test tools and allows us to use gcov to measure
the line and branch coverage of the source code.
A time step of a test run consists of the following operations: request values
for the input variables IFDIR from the test strategy; feed the values to the test
adapter from which they are read by the FDIR implementation; run the FDIR
implementation for one cycle; extract the output values OFDIR from the test
adapter and feed them back to the test strategy to get new input values. For
each time step the execution trace is recorded, i.e., the values assigned to the
inputs IFDIR and outputs OFDIR of the FDIR component.
Mutation testing We apply mutation analysis to assess the effectiveness, i.e.,
fault finding abilities, of a test suite. A test suite kills a mutant program M if it
contains at least one test strategy that, when executed on M and the original
program P , produces a trace where at least one output of M differs in at least
one time step from the respective output of P (for the same input sequence). A
mutant program M is equivalent to the original program P if M does not violate
the specification. For our evaluation we manually identify and remove equivalent
mutants.
We generate mutant programs of the C++ implementation of the FDIR
component by systematically introducing the following four mutations in each
line: 1) deletion of the line, 2) replacement of true with false or false with
true, 3) replacement of == with != or != with ==, and 4) replacement of && with
|| or || with &&. In total, 198 mutant programs are generated. We use the GNU
compiler gcc to remove all mutant programs which do not compile and thus not
conform to the C++ programming language. Also all mutant programs which fail
during runtime e.g. by raising a segmentation fault are removed. We analyzed the
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
state 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 3 1 3 1 3 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1
mode1
mode2
errnc
errs
reset
on1
off1
on2
off2
safemode
Fig. 13: Execution trace from a faulty system under the strategy that tests for a
stuck-at-0 fault of signal safemode. Bold signals are controlled by the strategy.
remaining 96 mutants manually and identified 23 mutants that are correct with
respect to the specification, i.e., equivalent mutants. Thus, 73 mutants violate the
specification. Moreover, 11 of these 73 mutants can only violate the specification
if the off1 and off2 commands can fail, which contradicts our assumptions on
the EP unit. We keep those mutants to check whether the strategies can kill
them nevertheless. Next, we executed all test strategies on the mutant programs
for 80 time steps each and log the corresponding execution traces.
From the 73 mutants that violate the specification, our strategies all together
are able to kill 52, i.e., we achieve a mutation score of 71.23%. If we do not take
the 11 mutants into account that violate our assumptions for the test strategy
generation, then the mutation score increases to 80.65%. We illustrate in Fig. 13
the execution of the test strategy from Fig. 12 on a mutant. This strategy aims for
revealing a stuck-at-0 fault of signal safemode. The test strategy first forces the
FDIR component to eventually switch to the backup system. The switch happens
in time step 14 after several restarts of the system. Then the strategy forces
the FDIR component to eventually activate safemode. However, this mutant is
faulty and instead of activating safemode the system remains silent from time
step 26 onwards. Thus, violating guarantee G35.
As the are only derived from requirements, without any implementation-
specific knowledge, they are applicable on any system that claims to implement the
given specification. The mutation score of 71.23% illustrates that our strategies,
although computed for only three different faults that are assumed to only affect
a single output signal, are also sensitive to many other faults.
If we only apply one of the four strategies we computed per fault model and
output signal, then the resulting test suite can kill (1) 51 mutants, (2) 51 mutants,
(3) 49 mutants and (4) 49 mutants. While one strategy per fault and output
5 Given that the user has decided that we have waited long enough for safemode to
become true.
Table 4: Mutation coverage by fault models and signal when executing all four
derived strategies.
Output Fault Model
S-a-0 S-a-1 Bit-Flip All
[%] [%] [%] [%]
on1 65.75 39.73 5.48 65.75
off1 5.48 4.11 9.59 9.59
safemode 61.64 6.85 6.85 61.64
All 71.23 39.73 9.59 71.23
already achieves a high mutation score, these numbers illustrate the advantage
of computing multiple strategies per fault model and output signal.
In Table 4 we present the mutation score of the individual combinations of
signals and fault models. From all the mutants killed, there were 9 mutants
only killed by a single signal / fault model combination, namely on1 with stuck-
at-0 assumption exclusively killing 7 mutants and safemode with stuck-at-0
assumption exclusively killing 2 mutants.
Random testing We compared the fault finding abilities of the generated test
strategies and random testing executed for 100, 10’000, and 100’000 time steps,
respectively. For random testing we use a similar test setup to the test strategy
setup, but instead of requesting the input values IFDIR from a test strategy
we use uniformly distributed random values. For each time step, the input and
output values are recorded. For each mutant the same input sequence is supplied
and the output sequence of the mutant is compared to the output sequence of
the actual implementation.
Random testing for 100 time steps killed 46 mutants (mutation score of 63%),
while random testing for 10’000 time steps killed 69 mutants (mutation score of
94.5%). With increased time steps the results stayed the same. Random testing
for 100’000 time steps killed 69 mutants as well.
Our strategies are able to kill three mutants that are missed by all of the three
random test sequences. These mutants can only be killed when executing certain
input/output sequences and it is very unlikely for random testing to hit one of
the required sequences. The corresponding sequence requires that a sequence of
errnc, mode1 going low and mode1 going high is executed multiple times before
either errs or reset is triggered.
One mutant is neither covered by the test strategies nor by the random
sequences. This mutant requires a longer sequence as well in order to be executed.
The mutant is not covered by the test strategies because the sequence is about
the timeout of an EP command, which is not covered by the specification from
which the test strategies are derived.
Table 5: Code coverage by testing approach. The suffix “k” multiplies by 103.
Approach #Steps Coverage Criterion
Line Branch
[%] [%]
Random 100 80.5 64.8
Random 10k 96.3 85.2
Random 100k 96.3 85.2
Test strategy 80 76.8 64.8
Together 97.6 87.0
Code coverage Table 5 lists the line coverage and branch coverage measured with
gcov for the different testing approaches. The table is built as follows: each line
belongs to one testing approach. The first column names the approach, the second
column lists the number of time steps, and the third and the fourth column present
the line and branch coverage. Overall, the random testing approaches achieve a
higher code coverage than the generated adaptive test strategies when executed on
the source code of the FDIR component. The test strategies are directly derived
from the specification and independent from a concrete implementation. Parts of
the implementation which refine the specification or which are not specified at
all are not necessarily covered. As mentioned in Section 6.2 the implementation
adds timeouts for operations of the EP. Manual analysis revealed that removing
the corresponding instructions would increase the line coverage to 87.3% and
the branch coverage to 74.5%. In combination random tests and our strategies
together achieve a line coverage of 97.6% and a branch coverage of 87%.
7 Conclusion
We presented a new approach to compute adaptive test strategies from temporal
logic specifications using reactive synthesis with partial information. The com-
puted test strategies reveal all instances of a user-defined fault class for every
realization of a given specification. Thus, they do not rely on implementation
details, which is important for products that are still under development or
for standards that will be implemented by multiple vendors. Our approach is
sound but incomplete in general, i.e., may fail to find test strategies even if they
exist. However, for many interesting cases, we showed that it is both sound and
complete.
The worst-case complexity is doubly exponential in the specification size, but
in our setting, the specifications are typically small. This also makes our approach
an interesting application for reactive synthesis. Our experiments demonstrate
that our approach can compute meaningful tests for specifications of industrial
size and that the computed strategies are capable of detecting faults hidden in
paths that are unlikely to be activated by random input sequences.
We applied our approach in a case study on the fault detection, isolation
and recovery component of the satellite Eu:CROPIS that is currently under
development. Our computed test suite, based only on three different types of
faults, increases the mutation score of random testing from 94.5% to 98.6%.
We can also increase the branch coverage of the code from 85.2% to 87%. In
particular, our approach detects faults that require more complex input sequences
to be triggered that are not covered by random testing.
Current directions for future work include improving scalability, success-rate,
and usability of our approach. To this end, we are investigating using random
testing for inputs in the strategies that are not fixed to single values, and best-
effort strategies [18,19] for the case that there are no test strategies that can
guarantee triggering the fault. Another direction for future work is research on
evaluating LTL properties specified on infinite paths on finite traces to improve
the evaluation process when executing the derived strategies.
Acknowledgment
This work was supported in part by the Austrian Science Fund (FWF) through
the research network RiSE (S11406-N23) and by the European Commission
through projects IMMORTAL (317753) and eDAS (608770). We thank Ayrat
Khalimov for helpful comments and assistance in using PARTY.
References
1. Allen Troy Acree, Timothy Alan Budd, Richard A. DeMillo, Richard J. Lipton, and
Frederick Gerald Sayward. Mutation analysis. Technical Report GIT-ICS-79/08,
Georgia Institute of Technology, Atlanta, Georgia, 1979.
2. Bernhard K. Aichernig, Harald Brandl, Elisabeth Jo¨bstl, Willibald Krenn, Rupert
Schlick, and Stefan Tiran. Killing strategies for model-based mutation testing.
Softw. Test., Verif. Reliab., 25(8):716–748, 2015.
3. Rajeev Alur, Costas Courcoubetis, and Mihalis Yannakakis. Distinguishing tests
for nondeterministic and probabilistic machines. In Frank Thomson Leighton and
Allan Borodin, editors, Proceedings of the Twenty-Seventh Annual ACM Symposium
on Theory of Computing, 29 May-1 June 1995, Las Vegas, Nevada, USA, pages
363–372. ACM, 1995.
4. Paul Ammann, Wei Ding, and Daling Xu. Using a model checker to test safety
properties. In 7th International Conference on Engineering of Complex Computer
Systems (ICECCS 2001), 11-13 June 2001, Sko¨vde, Sweden, pages 212–221. IEEE
Computer Society, 2001.
5. Roy Armoni, Limor Fix, Alon Flaisher, Orna Grumberg, Nir Piterman, Andreas
Tiemeyer, and Moshe Y. Vardi. Enhanced vacuity detection in linear temporal logic.
In Warren A. Hunt Jr. and Fabio Somenzi, editors, Computer Aided Verification,
15th International Conference, CAV 2003, Boulder, CO, USA, July 8-12, 2003,
Proceedings, volume 2725 of Lecture Notes in Computer Science, pages 368–380.
Springer, 2003.
6. Andreas Bauer, Martin Leucker, and Christian Schallhart. Runtime verification for
LTL and TLTL. ACM Trans. Softw. Eng. Methodol., 20(4):14:1–14:64, 2011.
7. Ilan Beer, Shoham Ben-David, Cindy Eisner, and Yoav Rodeh. Efficient detection of
vacuity in temporal model checking. Formal Methods in System Design, 18(2):141–
163, 2001.
8. Andreas Blass, Yuri Gurevich, Lev Nachmanson, and Margus Veanes. Play to test.
In Grieskamp and Weise [25], pages 32–46.
9. Roderick Bloem, Robert Ko¨nighofer, Ingo Pill, and Franz Ro¨ck. Synthesizing
adaptive test strategies from temporal logic specifications. In Ruzica Piskac and
Muralidhar Talupur, editors, 2016 Formal Methods in Computer-Aided Design,
FMCAD 2016, Mountain View, CA, USA, October 3-6, 2016, pages 17–24. IEEE,
2016.
10. Sergiy Boroday, Alexandre Petrenko, and Roland Groz. Can a model checker
generate tests for non-deterministic systems? Electr. Notes Theor. Comput. Sci.,
190(2):3–19, 2007.
11. Edmund M. Clarke and E. Allen Emerson. Design and synthesis of synchronization
skeletons using branching-time temporal logic. In Dexter Kozen, editor, Logics of
Programs, Workshop, Yorktown Heights, New York, USA, May 1981, volume 131
of Lecture Notes in Computer Science, pages 52–71. Springer, 1981.
12. Alexandre David, Kim Guldstrand Larsen, Shuhao Li, and Brian Nielsen. A game-
theoretic approach to real-time system testing. In Donatella Sciuto, editor, Design,
Automation and Test in Europe, DATE 2008, Munich, Germany, March 10-14,
2008, pages 486–491. ACM, 2008.
13. Giuseppe De Giacomo, Riccardo De Masellis, and Marco Montali. Reasoning on
LTL on finite traces: Insensitivity to infiniteness. In Carla E. Brodley and Peter
Stone, editors, Proceedings of the Twenty-Eighth AAAI Conference on Artificial
Intelligence, July 27 -31, 2014, Que´bec City, Que´bec, Canada., pages 1027–1033.
AAAI Press, 2014.
14. Giuseppe De Giacomo and Moshe Y. Vardi. Linear temporal logic and linear
dynamic logic on finite traces. In Francesca Rossi, editor, IJCAI 2013, Proceedings
of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China,
August 3-9, 2013, pages 854–860. IJCAI/AAAI, 2013.
15. Richard A. DeMillo, Richard J. Lipton, and Frederick G. Sayward. Hints on test
data selection: Help for the practicing programmer. IEEE Computer, 11(4):34–41,
1978.
16. Isil Dillig, Thomas Dillig, Kenneth L. McMillan, and Alex Aiken. Minimum
satisfying assignments for SMT. In P. Madhusudan and Sanjit A. Seshia, editors,
Computer Aided Verification - 24th International Conference, CAV 2012, Berkeley,
CA, USA, July 7-13, 2012 Proceedings, volume 7358 of Lecture Notes in Computer
Science, pages 394–409. Springer, 2012.
17. Ru¨diger Ehlers. Symbolic bounded synthesis. Formal Methods in System Design,
40(2):232–262, 2012.
18. Marco Faella. Best-effort strategies for losing states. CoRR, abs/0811.1664, 2008.
19. Marco Faella. Admissible strategies in infinite games over graphs. In Rastislav
Kra´lovic and Damian Niwinski, editors, Mathematical Foundations of Computer
Science 2009, 34th International Symposium, MFCS 2009, Novy Smokovec, High
Tatras, Slovakia, August 24-28, 2009. Proceedings, volume 5734 of Lecture Notes in
Computer Science, pages 307–318. Springer, 2009.
20. Bernd Finkbeiner and Sven Schewe. Bounded synthesis. STTT, 15(5-6):519–539,
2013.
21. Gordon Fraser and Paul Ammann. Reachability and propagation for LTL re-
quirements testing. In Hong Zhu, editor, Proceedings of the Eighth International
Conference on Quality Software, QSIC 2008, 12-13 August 2008, Oxford, UK, pages
189–198. IEEE Computer Society, 2008.
22. Gordon Fraser and Franz Wotawa. Test-case generation and coverage analysis
for nondeterministic systems using model-checkers. In Proceedings of the Second
International Conference on Software Engineering Advances (ICSEA 2007), August
25-31, 2007, Cap Esterel, French Riviera, France, page 45. IEEE Computer Society,
2007.
23. Gordon Fraser, Franz Wotawa, and Paul Ammann. Issues in using model checkers
for test case generation. Journal of Systems and Software, 82(9):1403–1418, 2009.
24. Gordon Fraser, Franz Wotawa, and Paul Ammann. Testing with model checkers: a
survey. Softw. Test., Verif. Reliab., 19(3):215–261, 2009.
25. Wolfgang Grieskamp and Carsten Weise, editors. Formal Approaches to Soft-
ware Testing, 5th International Workshop, FATES 2005, Edinburgh, UK, July 11,
2005, Revised Selected Papers, volume 3997 of Lecture Notes in Computer Science.
Springer, 2006.
26. Klaus Havelund and Grigore Rosu. Monitoring programs using rewriting. In 16th
IEEE International Conference on Automated Software Engineering (ASE 2001),
26-29 November 2001, Coronado Island, San Diego, CA, USA, pages 135–143. IEEE
Computer Society, 2001.
27. Robert M. Hierons. Applying adaptive test cases to nondeterministic implementa-
tions. Inf. Process. Lett., 98(2):56–60, 2006.
28. Yue Jia and Mark Harman. An analysis and survey of the development of mutation
testing. IEEE Trans. Software Eng., 37(5):649–678, 2011.
29. HoonSang Jin, Kavita Ravi, and Fabio Somenzi. Fate and free will in error traces.
STTT, 6(2):102–116, 2004.
30. Ayrat Khalimov, Swen Jacobs, and Roderick Bloem. PARTY parameterized
synthesis of token rings. In Natasha Sharygina and Helmut Veith, editors, Computer
Aided Verification - 25th International Conference, CAV 2013, Saint Petersburg,
Russia, July 13-19, 2013. Proceedings, volume 8044 of Lecture Notes in Computer
Science, pages 928–933. Springer, 2013.
31. Robert Ko¨nighofer, Georg Hofferek, and Roderick Bloem. Debugging formal speci-
fications: a practical approach using model-based diagnosis and counterstrategies.
STTT, 15(5-6):563–583, 2013.
32. Orna Kupferman and Moshe Y. Vardi. Advances in Temporal Logic, chapter
Synthesis with Incomplete Informatio, pages 109–127. Springer Netherlands, 2000.
33. Orna Kupferman and Moshe Y. Vardi. Vacuity detection in temporal model
checking. STTT, 4(2):224–233, 2003.
34. Gang Luo, Gregor von Bochmann, and Alexandre Petrenko. Test selection based
on communicating nondeterministic finite-state machines using a generalized wp-
method. IEEE Trans. Software Eng., 20(2):149–162, 1994.
35. Donald A. Martin. Borel determinacy. Annals of Mathematics, 102(2):363–371,
1975.
36. Aditya P. Mathur. Foundations of Software Testing. Addison-Wesley, second edition
edition, 2008.
37. Kohei Miyase and Seiji Kajihara. XID: don’t care identification of test patterns for
combinational circuits. IEEE Trans. on CAD of Integrated Circuits and Systems,
23(2):321–326, 2004.
38. Andreas Morgenstern, Manuel Gesell, and Klaus Schneider. An asymptotically cor-
rect finite path semantics for LTL. In Nikolaj Bjørner and Andrei Voronkov, editors,
Logic for Programming, Artificial Intelligence, and Reasoning - 18th International
Conference, LPAR-18, Me´rida, Venezuela, March 11-15, 2012. Proceedings, volume
7180 of Lecture Notes in Computer Science, pages 304–319. Springer, 2012.
39. Lev Nachmanson, Margus Veanes, Wolfram Schulte, Nikolai Tillmann, and Wolfgang
Grieskamp. Optimal strategies for testing nondeterministic systems. In George S.
Avrunin and Gregg Rothermel, editors, Proceedings of the ACM/SIGSOFT In-
ternational Symposium on Software Testing and Analysis, ISSTA 2004, Boston,
Massachusetts, USA, July 11-14, 2004, pages 55–64. ACM, 2004.
40. A. Jefferson Offutt. Investigations of the software testing coupling effect. ACM
Trans. Softw. Eng. Methodol., 1(1):5–20, 1992.
41. Alexandre Petrenko, Adenilso da Silva Sima˜o, and Nina Yevtushenko. Generating
checking sequences for nondeterministic finite state machines. In Giuliano Antoniol,
Antonia Bertolino, and Yvan Labiche, editors, Fifth IEEE International Conference
on Software Testing, Verification and Validation, ICST 2012, Montreal, QC, Canada,
April 17-21, 2012, pages 310–319. IEEE Computer Society, 2012.
42. Alexandre Petrenko and Adenilso Sima˜o. Generalizing the ds-methods for testing
non-deterministic fsms. Comput. J., 58(7):1656–1672, 2015.
43. Alexandre Petrenko and Nina Yevtushenko. Conformance tests as checking ex-
periments for partial nondeterministic FSM. In Grieskamp and Weise [25], pages
118–133.
44. Alexandre Petrenko and Nina Yevtushenko. Adaptive testing of nondeterministic
systems with FSM. In 15th International IEEE Symposium on High-Assurance
Systems Engineering, HASE 2014, Miami Beach, FL, USA, January 9-11, 2014,
pages 224–228. IEEE Computer Society, 2014.
45. Amir Pnueli. The temporal logic of programs. In 18th Annual Symposium on
Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1
November 1977, pages 46–57. IEEE Computer Society, 1977.
46. Amir Pnueli and Roni Rosner. On the synthesis of a reactive module. In Conference
Record of the Sixteenth Annual ACM Symposium on Principles of Programming
Languages, Austin, Texas, USA, January 11-13, 1989, pages 179–190. ACM Press,
1989.
47. Jean-Pierre Queille and Joseph Sifakis. Specification and verification of concurrent
systems in CESAR. In Mariangiola Dezani-Ciancaglini and Ugo Montanari, editors,
International Symposium on Programming, 5th Colloquium, Torino, Italy, April
6-8, 1982, Proceedings, volume 137 of Lecture Notes in Computer Science, pages
337–351. Springer, 1982.
48. Li Tan, Oleg Sokolsky, and Insup Lee. Specification-based testing with linear
temporal logic. In Du Zhang, E´ric Gre´goire, and Doug DeGroot, editors, Proceedings
of the 2004 IEEE International Conference on Information Reuse and Integration,
IRI - 2004, November 8-10, 2004, Las Vegas Hilton, Las Vegas, NV, USA, pages
493–498. IEEE Systems, Man, and Cybernetics Society, 2004.
49. Massimo Tipaldi and Bernhard Bruenjes. Survey on fault detection, isolation, and
recovery strategies in the space domain. J. Aerospace Inf. Sys., 12(2):235–256, 2015.
50. Mihalis Yannakakis. Testing, optimizaton, and games. In Josep Dı´az, Juhani
Karhuma¨ki, Arto Lepisto¨, and Donald Sannella, editors, Automata, Languages and
Programming: 31st International Colloquium, ICALP 2004, Turku, Finland, July
12-16, 2004. Proceedings, volume 3142 of Lecture Notes in Computer Science, pages
28–45. Springer, 2004.
