Using Valued Booleans to Find Simpler Counterexamples in Random Testing of Cyber-Physical Systems by Lindstr\uf6m Claessen, Koen et al.
Using Valued Booleans to Find Simpler Counterexamples in Random
Testing of Cyber-Physical Systems
Downloaded from: https://research.chalmers.se, 2020-07-11 08:36 UTC
Citation for the original published paper (version of record):
Lindström Claessen, K., Smallbone, N., Lidén Eddeland, J. et al (2018)
Using Valued Booleans to Find Simpler Counterexamples in Random Testing of Cyber-Physical
Systems
IFAC-PapersOnLine, 51(7): 408-415
http://dx.doi.org/10.1016/j.ifacol.2018.06.333
N.B. When citing this work, cite the original published paper.
research.chalmers.se offers the possibility of retrieving research publications produced at Chalmers University of Technology.
It covers all kind of research output: articles, dissertations, conference papers, reports etc. since 2004.
research.chalmers.se is administrated and maintained by Chalmers Library
(article starts on next page)
IFAC PapersOnLine 51-7 (2018) 408–415
ScienceDirect
Available online at www.sciencedirect.com
2405-8963 © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Peer review under responsibility of International Federation of Automatic Control.
10.1016/j.ifacol.2018.06.333
© 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
10.1016/j.ifacol.2018.06.333 2405-8963
Using Valued Booleans to Find Simpler
Counterexamples in Random Testing of
Cyber-Physical Systems
Koen Claessen ∗ Nicholas Smallbone ∗ Johan Eddeland ∗∗,∗∗∗
Zahra Ramezani ∗∗ Knut A˚kesson ∗∗
∗Department of Computer Science and Engineering, Chalmers
University of Technology, Gothenburg, Sweden (e-mail: {koen,
nicsma}@chalmers.se
∗∗Department of Electrical Engineering, Chalmers University of
Technology, Gothenburg, Sweden (e-mail: {johedd, rzahra,
knut}@chalmers.se
∗∗∗Volvo Car Corporation, Gothenburg, Sweden (e-mail:
johan.eddeland@volvocars.com)
Abstract: We propose a new logic of valued Booleans for writing properties which are not just
true or false but compute how severely they are falsified. The logic is reminiscent of STL or
MTL but gives the tester control over what severity means in the particular problem domain.
We use this logic to simplify failing test inputs in the context of random testing of cyber-physical
systems and show that it improves the quality of counterexamples found. The logic of valued
Booleans might also be used as an alternative to the standard robust semantics of STL formulas
in optimization-based approaches to falsification.
Keywords: Reachability analysis, verification and abstraction of hybrid systems; embedded
computer control systems and applications; logical design, physical design, and implementation
of embedded computer systems; supervision and testing; model-driven systems engineering.
1. INTRODUCTION
Automated systems typically consist of controllers that
interact with a physical environment. These systems are
becoming more complex and are, in many situations, also
safety-critical. Therefore, rigorous methods are needed to
establish that these systems behave according to given
requirements. For finite-state systems, model-checking
(Clarke et al., 2009) can be used to prove properties
of the system. However, for systems that contain both
discrete and continuous dynamics, i.e. hybrid systems, the
problem of determining if a state is reachable is in general
undecidable, as shown by Henzinger et al. (1995).
In software testing, as in cyber-physical systems testing,
test suites are traditionally developed by hand. However,
creating a comprehensive test suite is painstaking work
and, unless the test suite is very large, bugs can easily
slip through. A possible solution is to put the computer in
charge of creating test cases. There are many methods for
doing this (Anand et al., 2013). In this paper we consider
constrained random test case generation as supported by
the tool QuickCheck (Claessen and Hughes, 2000).
QuickCheck generates random test cases to attempt to
falsify a property supplied by the tester. A property can
be, for example, an invariant which must always hold
 Research supported by Swedish Research Council (VR) project
SyTeC VR 2016-06204, and Swedish Governmental Agency for
Innovation Systems (VINNOVA) project TESTRON 2015-04893.
during the execution of a system. Any test case that fails is
reported as a counterexample to the property. The strength
of QuickCheck is the sheer variety and amount of testing
it does: random test cases often do things that no human
tester would dream of trying, and thousands of test cases
can easily be run. Because of this, QuickCheck is good
at finding complex bugs which human testers miss (Arts
et al., 2015; Hughes, 2016). Our hope is to translate this
advantage to hybrid system testing.
Random testing may be good at triggering complex bugs,
but the random counterexamples it finds usually contain
a lot of irrelevant features and are therefore hard to
understand for a human tester. Understanding why a test
case fails is vital in any debugging process. After finding
a failing test case, QuickCheck deploys a method called
shrinking to reduce it to its bare essentials; after shrinking,
all features present in the failing test case are needed to
make the test case fail.
QuickCheck has until now mainly been used to test soft-
ware. In this work we take the first steps towards adapting
the QuickCheck approach to hybrid systems. In doing
so, we encounter three main questions: (1) How do we
randomly generate test cases suitable for use in a hybrid
system? (2) How do we specify properties of a hybrid
system in a way that is amenable to automated testing?
(3) How do we simplify failing test cases in hybrid systems
testing?
14th IFAC Workshop on Discrete Event Systems
May 30 - June 1, 2018. Sorrento Coast, Italy
Copyright © 2018 IFAC 408
Using Valued Booleans to Find Simpler
Counterexamples in Random Testing of
Cyber-Physical Systems
Koen Claessen ∗ Nicholas Smallbone ∗ Johan Eddeland ∗∗,∗∗∗
Zahra Ramezani ∗∗ Knut A˚kesson ∗∗
∗Department of Computer Science and Engineering, Chalmers
University of Technology, Gothenburg, Sweden (e-mail: {koen,
nicsma}@chalmers.se
∗∗Department of Electrical Engineering, Chalmers University of
Technology, Gothenburg, Sweden (e-mail: {johedd, rzahra,
knut}@chalmers.se
∗∗∗Volvo Car Corporation, Gothenburg, Sweden (e-mail:
johan.eddeland@volvocars.com)
Abstract: We propose a new logic of valued Booleans for writing properties which are not just
true or false but compute how severely they are falsified. The logic is reminiscent of STL or
MTL but gives the tester control over what severity means in the particular problem domain.
We use this logic to simplify failing test inputs in the context of random testing of cyber-physical
systems and show that it improves the quality of counterexamples found. The logic of valued
Booleans might also be used as an alternative to the standard robust semantics of STL formulas
in optimization-based approaches to falsification.
Keywords: Reachability analysis, verification and abstraction of hybrid systems; embedded
computer control systems and applications; logical design, physical design, and implementation
of embedded computer systems; supervision and testing; model-driven systems engineering.
1. INTRODUCTION
Automated systems typically consist of controllers that
interact with a physical environment. These systems are
becoming more complex and are, in many situations, also
safety-critical. Therefore, rigorous methods are needed to
establish that these systems behave according to given
requirements. For finite-state systems, model-checking
(Clarke et al., 2009) can be used to prove properties
of the system. However, for systems that contain both
discrete and continuous dynamics, i.e. hybrid systems, the
problem of determining if a state is reachable is in general
undecidable, as shown by Henzinger et al. (1995).
In software testing, as in cyber-physical systems testing,
test suites are traditionally developed by hand. However,
creating a comprehensive test suite is painstaking work
and, unless the test suite is very large, bugs can easily
slip through. A possible solution is to put the computer in
charge of creating test cases. There are many methods for
doing this (Anand et al., 2013). In this paper we consider
constrained random test case generation as supported by
the tool QuickCheck (Claessen and Hughes, 2000).
QuickCheck generates random test cases to attempt to
falsify a property supplied by the tester. A property can
be, for example, an invariant which must always hold
 Research supported by Swedish Research Council (VR) project
SyTeC VR 2016-06204, and Swedish Governmental Agency for
Innovation Systems (VINNOVA) project TESTRON 2015-04893.
during the execution of a system. Any test case that fails is
reported as a counterexample to the property. The strength
of QuickCheck is the sheer variety and amount of testing
it does: random test cases often do things that no human
tester would dream of trying, and thousands of test cases
can easily be run. Because of this, QuickCheck is good
at finding complex bugs which human testers miss (Arts
et al., 2015; Hughes, 2016). Our hope is to translate this
advantage to hybrid system testing.
Random testing may be good at triggering complex bugs,
but the random counterexamples it finds usually contain
a lot of irrelevant features and are therefore hard to
understand for a human tester. Understanding why a test
case fails is vital in any debugging process. After finding
a failing test case, QuickCheck deploys a method called
shrinking to reduce it to its bare essentials; after shrinking,
all features present in the failing test case are needed to
make the test case fail.
QuickCheck has until now mainly been used to test soft-
ware. In this work we take the first steps towards adapting
the QuickCheck approach to hybrid systems. In doing
so, we encounter three main questions: (1) How do we
randomly generate test cases suitable for use in a hybrid
system? (2) How do we specify properties of a hybrid
system in a way that is amenable to automated testing?
(3) How do we simplify failing test cases in hybrid systems
testing?
14th IFAC Workshop on Discrete Event Systems
May 30 - June 1, 2018. Sorrento Coast, Italy
Copyright © 2018 IFAC 408
Using Valued Booleans to Find Simpler
Counterexamples in Random Testing of
Cyber-Physical Systems
Koen Claessen ∗ Nicholas Smallbone ∗ Johan Eddeland ∗∗,∗∗∗
Zahra Ramezani ∗∗ Knut A˚kesson ∗∗
∗Department of Computer Science and Engineering, Chalmers
University of Technology, Gothenburg, Sweden (e-mail: {koen,
nicsma}@chalmers.se
∗∗Department of Electrical Engineering, Chalmers University of
Technology, Gothenburg, Swed n (e-mail: {johedd, rzahra,
knut}@chalmers.se
∗∗∗Volvo Car Corpora ion, Gothenburg, Sweden (e-mail:
johan.eddela d@volvocars.com)
Abstract: We propose a new logic of valued Booleans for writing properties which are not just
true or false but compute how severely they are fal ified. The logic is reminiscent of STL or
MTL but gives he tester control over what severity m ans in the particular problem domain.
We use this logic to simplify failing test inputs in the context of random esting of cyber-physical
systems and show that it improves the quality of cou rexamples found. The logic of valued
Booleans might also be used as an al ernative to the standard robust semantics of STL formulas
in ptimization-based approaches to falsification.
Keywords: Reachability analysis, verification and abstraction of hybrid systems; embedded
computer control systems and applications; logical design, physical design, and implementation
of embedded computer systems; supervision and testin ; model-driven systems engineering.
1. INTRODUCTION
Automated systems typically consist of controllers that
interact with a physical environmen . These syst ms are
becoming more complex a d are, in many situations, also
safety-critical. Therefore, rigorous methods are needed t
establish th t t ese systems behave according to given
requirements. For finite-state syst ms, model-checking
(Clarke t al., 2009) can be used to prove properties
of the system. However, for syst ms that contain both
discrete and continuous dynamic , i.e. hybrid systems, the
problem of determining if a state is reachable is in general
undecidable, as show by Henzinger et al. (1995).
In software testing, as in cyber-physical systems testing,
test suites ar raditionally developed by hand. However
creating a comprehensive test suit is painstaking work
and, unless the t st suite is very large, bugs can easily
slip through. A possible solution is to put the computer in
charge f creating test cases. There are many methods for
doin this (Ana d t al., 2013). In this paper w c nside
c nstrained random test case generat on as supported by
the tool QuickCheck (Claessen and Hughe , 2000).
QuickCheck generates random test cases to attempt to
falsify a property supplied by th ester. A prop rty can
be, for example, an invariant which must always hold
 Research supported by Swedish Research Council (VR) project
SyT C VR 2016-06204, and Swedish Governmental Agency for
Innovation Systems (VINNOVA) project TESTRON 2015-04893.
during the execution of a system. Any test case that fails is
reported as a co nterexample to the property. The s rength
of QuickCheck is he sheer variety and amount of t sting
it does: random test cas s often do things that no human
tester woul dream of trying, an thousands of test cases
can easily be un. Because of this, QuickCheck is good
at finding complex bugs which human testers mis (Arts
e al., 2015; Hughes, 2016). Our hope is to translate thi
advantage to hybrid system testing.
Random testing may be good at triggering complex bugs,
but the random counterexamples it finds usually contain
a lo of irrelevant features and are therefore hard to
understand for a human tester. Understanding why a test
case fails is vital in any debugging p ocess. After finding
a failing test c se, QuickCheck deploys a method called
shrinking to reduce it to its bare essentials; aft r shrinking,
all features present in the failing t s case are needed to
mak he test case fail.
QuickCheck has until now mainly been used to test soft-
ware. In this work we take the first st ps towards adapting
the QuickCheck approach to hybrid ystems. In do
so, we encounter three main quest on : (1) How do we
randomly generate test cases suitable for use in a hybrid
system? (2) How do we pecify propertie of
in a way that is amenable to automated testing?
(3) How do we simplify failing test cases in hybri systems
testing?
14th IFAC Workshop on Discrete Event Systems
May 30 - June 1, 2018. Sorrento Coast, Italy
Copyright  IFAC 408
l ∗ i l ll ∗ l ∗∗,∗∗∗
i ∗∗ ∗∗
∗ rt t f t r i i ri , l rs
i rsit f l , t r , ( - il: ,
i s l rs.s
∗∗ rt t f l tri l i ri , l rs i rsit f
l , t r , ( - il: j , rz r ,
t l rs.s
∗∗∗ l r r r ti , t r , ( - il:
j . l l rs. )
: r s l i f l l s f r riti r rti s i r t j st
tr r f ls t t s r l t r f lsifi . l i is r i is t f r
t i s t t st r tr l r t s rit s i t rti l r r l i .
s t is l i t si lif f ili t st i ts i t t t f r t sti f r- si l
s st s s t t it i r s t lit f t r l s f . l i f l
l s i t ls s s lt r ti t t st r r st s ti s f f r l s
i ti i ti - s r s t f lsifi ti .
r s: ilit l sis, rifi ti str ti f ri s st s;
t r tr l s st s li ti s; l i l si , si l si , i l t ti
f t r s st s; s r isi t sti ; l- ri s st s i ri .
. I I
t t s st s t i ll sist f tr ll rs t t
i t r t it si l ir t. s s st s r
i r l r , i sit ti s, ls
s f t - riti l. r f r , ri r s t s r t
st lis t t t s s st s r i t i
r ir ts. r fi it -st t s st s, l- i
( l r t l., ) s t r r rti s
f t s st . r, f r s st s t t t i t
is r t ti s i s, i. . ri s st s, t
r l f t r i i if st t is r l is i r l
i l , s s i r t l. ( ).
I s ft r t sti , s i r- si l s st s t sti ,
t st s it s r tr iti ll l . r,
r ti r si t st s it is i st i r
, l ss t t st s it is r l r , s sil
sli t r . ssi l s l ti is t t t t r i
r f r ti t st s s. r r t s f r
i t is ( t l., ). I t is r si r
str i r t st s r ti s s rt
t t l i ( l ss s, ).
i r t s r t st s s t tt t t
f lsif r rt s li t t st r. r rt
, f r l , i ri t i st l s l
 esearc s orte e is esearc o cil ( ) roject
e 2016-06204, a e is o er e tal ge c for
I o atio ste s ( I ) roject 2015-04893.
ri t ti f s st . t st s t t f ils is
r rt s t r l t t r rt . str t
f i is t s r ri t t f t sti
it s: r t st s s ft t i s t t
t st r l r f tr i , t s s f t st s s
sil r . s f t is, i is
t fi i l s i t st rs iss ( rts
t l., ; s, ). r is t tr sl t t is
t t ri s st t sti .
t sti t tri ri l s,
t t r t r l s it fi s s ll t i
l t f irr l t f t r s r t r f r r t
rst f r t st r. rst i t st
s f ils is it l i i r ss. ft r fi i
f ili t st s , i l s t ll
s ri i t r it t its r ss ti ls; ft r s ri i ,
ll f t r s r s t i t f ili t st s r t
t t st s f il.
i s til i l s t t st s ft-
r . I t is r t t first st s t r s ti
t i r t ri s st s. I i
s , t r t r i sti s: ( )
r l r t t st s s s it l f r s i ri
s st ( ) s if r rti s f ri
s st i t t is l t t t t sti
( ) si lif f ili t st s s i ri s st s
t sti
t  I  r   i r t  t t
  -  , . rr t  t, It l
ri t   I
si g al e oolea s to i i ler
o tere a les i a o esti g of
er- sical ste s
oen Claessen ∗ icholas S allbone ∗ Johan Eddeland ∗∗,∗∗∗
Zahra a ezani ∗∗ nut kesson ∗∗
∗Department of Computer Science and Engineering, Chalmers
University of Technology, Gothenburg, Sweden (e-mail: {koen,
nicsma}@chalmers.se
∗∗Department of Electrical Engineering, Chalmers University of
Technology, Gothenburg, Sweden (e-mail: {johedd, rzahra,
knut}@chalmers.se
∗∗∗Volvo Car Corporation, Gothenburg, Sweden (e-mail:
johan.eddeland@volvocars.com)
bstract: e propose a new logic of valued Booleans for writing properties which are not just
true or false but compute how severely they are falsified. The logic is reminiscent of STL or
TL but gives the tester control over what severity means in the particular problem domain.
e use this logic to simplify failing test inputs in the context of random testing of cyber-physical
systems and show that it improves the quality of counterexamples found. The logic of valued
Booleans might also be used as an alternative to the standard robust semantics of STL formulas
in optimization-based approaches to falsification.
Keywords: Reachability analysis, verification and abstraction of hybrid systems; embedded
computer control systems and applications; logical design, physical design, and implementation
of embedded computer systems; supervision and testing; model-driven systems engineering.
1. INTRODUCTION
Automated systems typically consist of controllers that
interact with a physical environment. These systems are
becoming more complex and are, in many situations, also
safety-critical. Therefore, rigorous methods are needed to
establish that these systems behave according to given
requirements. For finite-state systems, model-checking
(Clarke et al., 2009) can be used to prove properties
of the system. However, for systems that contain both
discrete and continuous dynamics, i.e. hybrid systems, the
problem of determining if a state is reachable is in general
undecidable, as shown by Henzinger et al. (1995).
In software testing, as in cyber-physical systems testing,
test suites are traditionally developed by hand. However,
creating a comprehensive test suite is painstaking work
and, unless the test suite is very large, bugs can easily
slip through. A possible solution is to put the computer in
charge of creating test cases. There are many methods for
doing this (Anand et al., 2013). In this paper we consider
constrained random test case generation as supported by
the tool QuickCheck (Claessen and Hughes, 2000).
QuickCheck generates random test cases to attempt to
falsify a property supplied by the tester. A property can
be, for example, an invariant which must always hold
 R search supported by Swedish Research Council (VR) project
SyTeC VR 2016-06204, and Swedish Governmental Agency for
Innovation Systems (VINNOVA) project TESTRON 2015-04893.
during the execution of a system. Any test case that fails is
reported as a counterexample to the property. The strength
of QuickCheck is the sheer variety and amount of testing
it does: random test cases often do things that no human
tester would dream of trying, and thousands of test cases
can easily be run. Because of this, QuickCheck is good
at finding complex bugs which human testers miss (Arts
et al., 2015; Hughes, 2016). Our hope is to translate this
advantage to hybrid system testing.
Random testing may be good at triggering complex bugs,
but the random counterexamples it finds usually contain
a lot of irrelevant features and are therefore hard to
understand for a human tester. Understanding why a test
case fails is vital in any debugging process. After finding
a failing test case, QuickCheck deploys a method called
shrinking to reduce it to its bare essentials; after shrinking,
all features present in the failing test case are needed to
make the test case fail.
QuickCheck has until now mainly been used to test soft-
ware. In this work we take the first steps towards adapting
the QuickCheck approach to hybrid systems. In doing
so, we encounter three main questions: (1) How do we
randomly generate test cases suitable for use in a hybrid
system? (2) How do we specify properties of a hybrid
system in a way that is amenable to automated testing?
(3) How do we simplify failing test cases in hybrid systems
testing?
14th IFAC Workshop on Discrete Event Systems
May 30 - June 1, 2018. Sorrento Coast, Italy
Copyright © 2018 IFAC 408
Using Valued Booleans to Find Simpler
Counterexamples in Random Testing of
Cyber-Physical Systems
Koen Claessen ∗ Nicholas Smallbone ∗ Johan Eddeland ∗∗,∗∗∗
Zahra Ram zani ∗∗ Knut A˚kesson ∗∗
∗Department of Computer Science and Engineering, Chalmers
University of Technology, Gothenburg, Sweden (e-ma l: {koen,
icsma}@chalmers.se
∗∗Department of Electrical Engin ering, Chalmers University of
Techn logy, Gothenburg, Sweden (e-mail: {joh dd, rzahra,
knu }@chalmers.se
∗∗∗Volvo Car Corporatio , Gothenburg, Sweden (e-mail:
johan.eddeland@volvocars.com)
Abstract: We propose a new logic of valued Boolean for writing properties which are n t just
true or fal e bu compute how severely they are falsifi d. The logic is reminiscent of STL or
MTL but gives the tester control over what severity means in th par icular problem domain.
We use this logic to simplify f iling test inputs in co xt of random testing of cyber-physic l
systems and show that it improves he quality of counterexamples found. The logic of valued
Bo leans might also be used as an alternative to the standard robust semantics of STL formulas
in optimization-based approaches to falsification.
Keywor s: Reachability analysis, verification and abstraction of hybrid syste s; emb dded
computer control systems and applications; logical desi n, physical design, and implementation
of embedded computer systems; supervision and testing; model-driven systems engineering.
1. INTRODUCTION
Auto ated systems typically consis of controll r that
int ract with a physical e vironment. These systems are
becoming more complex and are, in m ny situations, als
safety-critic l. T erefore, rigorous methods are needed to
est blish h t these systems behav according t given
requirem nts. For finite-state ystems, model-checking
(Clark et al., 2009) can be us d to p ove proper ies
of th system. However, for sy tems t t contain both
discrete and continuous dynamics, i.e. hybrid systems, the
problem of determini g if a state is reachable is in general
undecidable, a shown by Henzinger et al. (1995).
In software t s ing, as in cyber-physical systems testing
test s ites are tradi ionally develop d by hand. However,
crea ing a com r hensive test suite is painstaking work
and, unless the test suite is very l ge, bugs can easily
slip thr ugh. A possible solution is o put the computer in
har e of cre ti g t st case . There are m ny m th ds fo
d ing this (Anand et al., 2013). In th s paper we consider
constrained random test case generation a supported by
the tool QuickCheck (Cl essen and Hughes, 2000).
QuickCheck gener tes random t s cases o att mpt to
falsify a property supplied by the tester. A property can
be, for example, an invariant which must always hold
 Research supported by Swedish Research Council (VR) project
SyTeC VR 2016-06204, and Swedish Governmental Agency for
Innovation Systems (VINNOVA) project TESTRON 2015-04893.
during the exec tion of a system. Any test case ha fails is
rep rted s a coun erexample to the property. The str ngth
of QuickCheck is the sh er variety and mount of esting
it do s: ran om test cases often o things that no human
tester would d eam of trying, and thousands of test case
can easily be run. Because of this, QuickCheck i good
finding c mplex bugs which human testers miss (Art
et al., 2015; Hughes, 2016). Our hope is to translate this
advantage to hybrid system tes ing.
Random testing may be good at triggering complex bugs,
but he rand m counterexamples it finds usually con ain
a lot of rrelevant features and are therefore har to
understand for a human test r. Unde standing why a test
case fails is vit l in any de ugging process. After finding
failing test case, QuickCheck deploys a m thod called
shrinking o reduce it to its bare ess n ials; after shrinking,
all f a ures present in the failing test case are needed to
make the test case fail.
QuickCheck has until now mainl b en used to test soft-
ware. In this work we take the first step towards adapt
the QuickCheck approach to hybr d ystems. In doing
o, we encounter three main questi ns: (1) How do we
randomly generate test c es suitable for u e in
? (2) How do we specify properties of a hybrid
sy em in a way that is amenable to automate testing?
(3) How do we simplify failing test cases in hybrid systems
testing?
14th IFAC Workshop on Discrete Event Systems
May 30 - June 1, 2018. Sorrento Coast, Italy
Copyright © 2018 IFAC 408
The main focus of this paper is question (3). We found that
simplifying a failing test case for a hybrid system typically
resulted in a test case that still made a property fail, but
would do so in a minimal, unconvincing way, which we refer
to as a glitch. To solve this, we measure the severity of each
failure, and never simplify a failing test case in a way that
reduces its severity. To measure the severity of failing test
cases, we define a new logic of valued Booleans, or VBools,
which allows the tester to express severity information as
part of a property.
1.1 Related work
Falsification of temporal logic properties is an emerging
black-box approach to testing of hybrid systems. The
tools S-TaLiRo (Annpureddy et al., 2011) and Breach
(Donze´, 2010) use Metric Temporal Logic (MTL) and
Signal Temporal Logic (STL). The expressiveness of MTL
and STL is equivalent and the difference between them is
that the predicates are explicitly stated in STL as opposed
to in MTL. Because of this, we only introduce one of the
two temporal logics in this paper, namely STL.
The main idea behind falsification, introduced by Fainekos
and Pappas (2009), is to use a robust semantics (or quan-
titative semantics) of temporal logic to measure how far
away a specification is from being broken. The robustness
value of a temporal logic specification can thus be used as
the objective function for an optimization problem where
the goal is to falsify said specification. Previous work has
proposed alternative definitions of robustness for temporal
logic specifications (Akazaki and Hasuo, 2015) and the
reasons for these changes (Eddeland et al., 2017).
The valued Booleans presented in this paper are related
to robust semantics and can be seen as an alternative to
them. The main difference between the two approaches
is that VBools allow the tester to control how robustness
is measured for each property, while a robust semantics
imposes the same robustness measure on all properties. As
is shown in Sections 3 and 4, different robustness measures
make sense for different applications.
VBools and robust semantics are not closely related to
fuzzy logic (Driankov et al., 1993), although they may
appear to be at first glance. Fuzzy logics also augment
truth values with numbers, but the purpose of those
numbers is to express the certainty of a value being true or
false. With VBools (and in STL/MTL), it is always clear
whether or not a given value is true or false; the numerical
aspect only expresses how true or false the value is.
1.2 Contributions
The main contributions of this paper are:
i) Adaptation of random testing with QuickCheck to
hybrid systems;
ii) definition of VBools as a way to simplify counterex-
amples when testing hybrid systems;
iii) a comparison with existing falsification tools, to illus-
trate the strengths and weaknesses of random testing
and the importance of simplifying counterexamples.
The rest of this paper is organized as follows: in Sec-
tion 2 an example is used to introduce random testing,
falsification and shrinking. In Section 3 valued Booleans
are introduced including a comparison to signal temporal
logic. In Section 4 the use of valued Booleans for shrinking
counterexamples is presented for two example models, as
well as a discussion of how the approach compares to
Breach.
2. EXAMPLE
This section illustrates the difficulties we encounter when
using QuickCheck as-is to test a hybrid system model; we
will see how to solve them in Section 3. It also shows
how falsification works on the same model. For ease of
understanding we have chosen a linear system as the
example, but since the presented testing approach is black-
box, it can be applied to any hybrid system.
2.1 The model
The model considered is a heater which is controlled by a
PID controller. The variables and parameters of the model
are described in Table 1. The only input to the model is
the setpoint temperature r(t).
Table 1. The signals and parameters of the
heater example.
Signal Meaning
r(t) setpoint temperature
l(t) pump level
h(t) heater temperature
y(t) room temperature
Parameter Meaning Value
OT Outside Temperature -5
BT Boiler Temperature 90
HC Heater Coefficient 0.1
OC Outside Coefficient 0.05
Kp Proportional gain 0.012
Ki Integral gain 1.144689 · 10−4
Kd Derivative gain 0.005
The error fed to the PID controller is r(t) − y(t). The
continuous dynamics of the heater are given by equations
(1) and (2), where the initial conditions are h(0) = y(0) =
OT . The time is measured in minutes and the temperature
is measured in℃. The heater is simulated for 300 minutes,
i.e., the start time is t = 0 and the end time is t = 300.
Implementation-wise, the models are discretized with a
fixed sampling time and then simulated.
h˙(t) =
−(l +HC) · h(t) +BT · l(t) +HC · y(t)
1 +HC
(1)
y˙(t) =
−(HC +OC) · y(t) +HC · h(t) +OC ·OT
1 +HC +OC
(2)
We would like to check the following property.
Property 1. If the setpoint temperature has been constant
(steady) for 50 minutes, then the difference between the
setpoint temperature and the actual room temperature
should be at most 1℃.
2.2 Testing and shrinking with QuickCheck
QuickCheck tests properties, like the one above, on ran-
dom test inputs; the tester has control over the distri-
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
409
 Koen Claessen  et al. / IFAC PapersOnLine 51-7 (2018) 408–415 409
The main focus of this paper is question (3). We found that
simplifying a failing test case for a hybrid system typically
resulted in a test case that still made a property fail, but
would do so in a minimal, unconvincing way, which we refer
to as a glitch. To solve this, we measure the severity of each
failure, and never simplify a failing test case in a way that
reduces its severity. To measure the severity of failing test
cases, we define a new logic of valued Booleans, or VBools,
which allows the tester to express severity information as
part of a property.
1.1 Related work
Falsification of temporal logic properties is an emerging
black-box approach to testing of hybrid systems. The
tools S-TaLiRo (Annpureddy et al., 2011) and Breach
(Donze´, 2010) use Metric Temporal Logic (MTL) and
Signal Temporal Logic (STL). The expressiveness of MTL
and STL is equivalent and the difference between them is
that the predicates are explicitly stated in STL as opposed
to in MTL. Because of this, we only introduce one of the
two temporal logics in this paper, namely STL.
The main idea behind falsification, introduced by Fainekos
and Pappas (2009), is to use a robust semantics (or quan-
titative semantics) of temporal logic to measure how far
away a specification is from being broken. The robustness
value of a temporal logic specification can thus be used as
the objective function for an optimization problem where
the goal is to falsify said specification. Previous work has
proposed alternative definitions of robustness for temporal
logic specifications (Akazaki and Hasuo, 2015) and the
reasons for these changes (Eddeland et al., 2017).
The valued Booleans presented in this paper are related
to robust semantics and can be seen as an alternative to
them. The main difference between the two approaches
is that VBools allow the tester to control how robustness
is measured for each property, while a robust semantics
imposes the same robustness measure on all properties. As
is shown in Sections 3 and 4, different robustness measures
make sense for different applications.
VBools and robust semantics are not closely related to
fuzzy logic (Driankov et al., 1993), although they may
appear to be at first glance. Fuzzy logics also augment
truth values with numbers, but the purpose of those
numbers is to express the certainty of a value being true or
false. With VBools (and in STL/MTL), it is always clear
whether or not a given value is true or false; the numerical
aspect only expresses how true or false the value is.
1.2 Contributions
The main contributions of this paper are:
i) Adaptation of random testing with QuickCheck to
hybrid systems;
ii) definition of VBools as a way to simplify counterex-
amples when testing hybrid systems;
iii) a comparison with existing falsification tools, to illus-
trate the strengths and weaknesses of random testing
and the importance of simplifying counterexamples.
The rest of this paper is organized as follows: in Sec-
tion 2 an example is used to introduce random testing,
falsification and shrinking. In Section 3 valued Booleans
are introduced including a comparison to signal temporal
logic. In Section 4 the use of valued Booleans for shrinking
counterexamples is presented for two example models, as
well as a discussion of how the approach compares to
Breach.
2. EXAMPLE
This section illustrates the difficulties we encounter when
using QuickCheck as-is to test a hybrid system model; we
will see how to solve them in Section 3. It also shows
how falsification works on the same model. For ease of
understanding we have chosen a linear system as the
example, but since the presented testing approach is black-
box, it can be applied to any hybrid system.
2.1 The model
The model considered is a heater which is controlled by a
PID controller. The variables and parameters of the model
are described in Table 1. The only input to the model is
the setpoint temperature r(t).
Table 1. The signals and parameters of the
heater example.
Signal Meaning
r(t) setpoint temperature
l(t) pump level
h(t) heater temperature
y(t) room temperature
Parameter Meaning Value
OT Outside Temperature -5
BT Boiler Temperature 90
HC Heater Coefficient 0.1
OC Outside Coefficient 0.05
Kp Proportional gain 0.012
Ki Integral gain 1.144689 · 10−4
Kd Derivative gain 0.005
The error fed to the PID controller is r(t) − y(t). The
continuous dynamics of the heater are given by equations
(1) and (2), where the initial conditions are h(0) = y(0) =
OT . The time is measured in minutes and the temperature
is measured in℃. The heater is simulated for 300 minutes,
i.e., the start time is t = 0 and the end time is t = 300.
Implementation-wise, the models are discretized with a
fixed sampling time and then simulated.
h˙(t) =
−(l +HC) · h(t) +BT · l(t) +HC · y(t)
1 +HC
(1)
y˙(t) =
−(HC +OC) · y(t) +HC · h(t) +OC ·OT
1 +HC +OC
(2)
We would like to check the following property.
Property 1. If the setpoint temperature has been constant
(steady) for 50 minutes, then the difference between the
setpoint temperature and the actual room temperature
should be at most 1℃.
2.2 Testing and shrinking with QuickCheck
QuickCheck tests properties, like the one above, on ran-
dom test inputs; the tester has control over the distri-
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
409
410 Koen Claessen  et al. / IFAC PapersOnLine 51-7 (2018) 408–415
bution of test inputs. To test any piece of software with
QuickCheck, the tester supplies:
• A test generator, which describes how to generate
random test inputs for the software;
• a property, which is a function that takes such a test
input and returns true or false, true to indicate that
the test passed and false to indicate that it failed.
QuickCheck uses the generator to produce a large number
of random test inputs, and reports an error if the property
returns false for any of those test inputs.
In order to use QuickCheck on a hybrid system model, we
first discretise the model to be able to test it. QuickCheck
generates a random input signal, i.e. a sequence of nu-
meric values, runs the model and evaluates the property,
resulting in a pass or fail. The tester determines what sort
of input signals should be generated by choosing a test
generator.
For the heating system, the test input is a discrete sig-
nal representing r. The QuickCheck property applies the
heater model to r to get the output signal y, and returns
the value of the following formula:
(steady(r) =⇒ |y − r| ≤ 1) (3)
assuming that steady has been defined appropriately. The
 operator is used in STL to denote a property that
should hold globally from that point on; for a more formal
semantics see Section 3.2.
As described above, we must also supply a test generator.
Rather than generating random noise, which would be
extremely unlikely to keep r constant for 50 minutes,
we generate a piecewise constant function, choosing at
random the total simulation time and the number, lengths
and values of the constant pieces.
When we run QuickCheck, it discovers that the property
is false, and finds the counterexample shown in Figure 1.
0 20 40 60 80 100 120 140 160
Time [m]
-5
0
5
10
15
20
25
30
Te
m
pe
ra
tu
re
 [°
C]
setpoint
room
not ok
Fig. 1. Random testing performed on the heater example.
This counterexample clearly shows that the property does
not hold, but it is also rather complicated. For this
specific example, only the last two changes in setpoint are
responsible for the failure, but the tester can only know
this after a manual and time-consuming analysis of the
whole test case. This is a typical problem with random
testing, and in general it is often hard for the tester to
discover why the system failed to satisfy the property,
given a randomly-generated counterexample.
QuickCheck therefore simplifies the failing test case before
presenting it to the tester, a process called shrinking.
Shrinking works by applying a small simplification to the
counterexample, such as removing part of the test input,
and seeing if it still fails. If not, the simplification is undone
and a different one tried. Shrinking continues until none
of the simplification steps that are tried works.
After shrinking, QuickCheck finds the counterexample
shown in Figure 2. This test input is clearly much simpler,
but the room temperature is out of specification for only
one simulation step (in the next simulation step, the
room temperature would return to within one degree of
the setpoint). The property still fails, but only barely.
Shrinking has thus reduced a serious failure into something
a tester may well consider merely a glitch.
0 10 20 30 40 50 60
Time [m]
-5
0
5
10
15
20
25
Te
m
pe
ra
tu
re
 [°
C]
setpoint
room
not ok
Fig. 2. Shrinking turns the counterexample into a glitch.
In software testing, random testing and shrinking are a
powerful combination. Random test cases provoke bugs by
exercising the software in unusual ways, but finds compli-
cated counterexamples; shrinking reduces these counterex-
amples to their bare minimum, producing simple, easy-to-
understand counterexamples.
In hybrid systems testing, shrinking produces simple coun-
terexamples, but they are often mere glitches. This is
because shrinking removes all irrelevant features of the
test data, and a property is still falsified by test data that
makes it only fail for a short interval. Fixing this problem
is one of the main contributions of this paper.
2.3 Falsification with Breach
The standard falsification procedure used in Breach is
illustrated in Figure 3. The Generator takes the input
parametrization to generate an input to the system under
test. The Simulator generates a simulation trace, which
is used together with the requirement ϕ to evaluate the
robustness function for the simulation. The robustness
function ρ is evaluated to see whether the requirement
is falsified or not. If it is not falsified, new parameters
are sampled and the process is repeated. The Parameter
Optimizer does this in one of two ways: in the first
part of the process, the parameters are sampled in some
structured way (global optimization), while in the second
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
410
Generator Simulator
Robustness
function
Function
evaluation
Stop
Parameter
optimizer
Output S(t)
Not
falsified
Input signal
parameters k
Parameter initial
guess k Input u(t)
Objective function
value ρ
Requirement ϕ
Falsified
Fig. 3. A flowchart describing the optimization-based falsification procedure of Breach.
part, new parameters are found by means of optimization
(local optimization). The algorithm is described in more
detail by Donze´ (2010).
For the heater example, we choose an input generator that
generates piecewise constant signals, changing signal val-
ues at 9 different times. This means that the optimization
problem has 19 parameters: 10 signal values, each in the
range [10, 25], and 9 time specifications in the range [1, 80].
Every time the signal changes, the new value will hold
for between 1 and 80 minutes. Each time the system is
simulated, the total simulation time is 300 minutes.
The system is modeled in Simulink, and the STL specifi-
cation is
ϕ = [0,250](steady ⇒ abs(y(t+50)− r(t+50)) < 1), (4)
where steady = [0,49.99](|x(t+0.01)− x(t)| < ) for some
small  > 0.
Breach is able to falsify the system, with resulting outputs
shown in Figure 4. The counterexample is reminiscent of
the one in Figure 1 – the failure of the specification is clear,
but the test case is quite complicated.
0 50 100 150 200 250 300
Time [m]
-5
0
5
10
15
20
25
30
Te
m
pe
ra
tu
re
 [°
C]
setpoint
room
not ok
Fig. 4. Falsification of the heater property. The property
is false near t = 150. The complexity of the test case
is highly dependent on the input generator chosen.
3. APPROACH
In Section 2, we saw that shrinking turns counterexam-
ples into glitches. This is a serious problem, as random
testing without shrinking can produce very complicated
counterexamples. Our approach to solve this is as follows:
• The tester will specify how to compute the severity
of any counterexample to their property.
• During shrinking, QuickCheck will not perform any
simplification step which reduces the severity of the
counterexample.
The hope is that this approach will give us counterexam-
ples as severe as Figure 1 but as simple as Figure 2.
For the heater model, the tester might consider the severity
to be the integral of the part of the error above 1℃,∫
|y(t)− r(t)| − 1 dt, (5)
integrated over the time intervals in which the temperature
has been constant for 50 minutes but the error is over 1℃.
If we define those time intervals formally, we will see that
the severity formula almost exactly duplicates what was
written in the property. This means the tester has to write
everything down twice, once as a property and once as a
severity formula. To avoid this problem, we now introduce
valued Booleans, which allow the tester to express severity
information as part of a property, rather than separately.
3.1 Valued Booleans
A valued Boolean, or VBool, is a Boolean value together
with a robustness value, a non-negative real number that
indicates how true or false the VBool is. VBool formulas
look just like Boolean formulas, with the addition of extra
annotations that describe how to compute the robustness.
An important property of VBools is that the annotations
do not affect the Boolean value but only the robustness.
Property 3 can be written using VBools as
+(steady(r) =⇒v |y − r| ≤v 1). (6)
All we have done is to replace  with +, =⇒ with =⇒v
and ≤ with ≤v. As mentioned above, this property has
the same Boolean value as property 3. We shall see later
that when this property fails, its robustness is precisely
the value of the integral we saw above.
Formally, a VBool is a pair of a Boolean value and a
robustness value, which is a non-negative number:
V = B× R≥0
The robustness of a VBool represents how much the test
result would have to change for the Boolean value to
change. For a false VBool this roughly coincides with the
severity of the failure, and for a true VBool it roughly
coincides with how convincingly the test passed.
The comparison operator ≤v corresponds to ≤, and takes
the difference between its arguments as its robustness.
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
411
 Koen Claessen  et al. / IFAC PapersOnLine 51-7 (2018) 408–415 411
Generator Simulator
Robustness
function
Function
evaluation
Stop
Parameter
optimizer
Output S(t)
Not
falsified
Input signal
parameters k
Parameter initial
guess k Input u(t)
Objective function
value ρ
Requirement ϕ
Falsified
Fig. 3. A flowchart describing the optimization-based falsification procedure of Breach.
part, new parameters are found by means of optimization
(local optimization). The algorithm is described in more
detail by Donze´ (2010).
For the heater example, we choose an input generator that
generates piecewise constant signals, changing signal val-
ues at 9 different times. This means that the optimization
problem has 19 parameters: 10 signal values, each in the
range [10, 25], and 9 time specifications in the range [1, 80].
Every time the signal changes, the new value will hold
for between 1 and 80 minutes. Each time the system is
simulated, the total simulation time is 300 minutes.
The system is modeled in Simulink, and the STL specifi-
cation is
ϕ = [0,250](steady ⇒ abs(y(t+50)− r(t+50)) < 1), (4)
where steady = [0,49.99](|x(t+0.01)− x(t)| < ) for some
small  > 0.
Breach is able to falsify the system, with resulting outputs
shown in Figure 4. The counterexample is reminiscent of
the one in Figure 1 – the failure of the specification is clear,
but the test case is quite complicated.
0 50 100 150 200 250 300
Time [m]
-5
0
5
10
15
20
25
30
Te
m
pe
ra
tu
re
 [°
C]
setpoint
room
not ok
Fig. 4. Falsification of the heater property. The property
is false near t = 150. The complexity of the test case
is highly dependent on the input generator chosen.
3. APPROACH
In Section 2, we saw that shrinking turns counterexam-
ples into glitches. This is a serious problem, as random
testing without shrinking can produce very complicated
counterexamples. Our approach to solve this is as follows:
• The tester will specify how to compute the severity
of any counterexample to their property.
• During shrinking, QuickCheck will not perform any
simplification step which reduces the severity of the
counterexample.
The hope is that this approach will give us counterexam-
ples as severe as Figure 1 but as simple as Figure 2.
For the heater model, the tester might consider the severity
to be the integral of the part of the error above 1℃,∫
|y(t)− r(t)| − 1 dt, (5)
integrated over the time intervals in which the temperature
has been constant for 50 minutes but the error is over 1℃.
If we define those time intervals formally, we will see that
the severity formula almost exactly duplicates what was
written in the property. This means the tester has to write
everything down twice, once as a property and once as a
severity formula. To avoid this problem, we now introduce
valued Booleans, which allow the tester to express severity
information as part of a property, rather than separately.
3.1 Valued Booleans
A valued Boolean, or VBool, is a Boolean value together
with a robustness value, a non-negative real number that
indicates how true or false the VBool is. VBool formulas
look just like Boolean formulas, with the addition of extra
annotations that describe how to compute the robustness.
An important property of VBools is that the annotations
do not affect the Boolean value but only the robustness.
Property 3 can be written using VBools as
+(steady(r) =⇒v |y − r| ≤v 1). (6)
All we have done is to replace  with +, =⇒ with =⇒v
and ≤ with ≤v. As mentioned above, this property has
the same Boolean value as property 3. We shall see later
that when this property fails, its robustness is precisely
the value of the integral we saw above.
Formally, a VBool is a pair of a Boolean value and a
robustness value, which is a non-negative number:
V = B× R≥0
The robustness of a VBool represents how much the test
result would have to change for the Boolean value to
change. For a false VBool this roughly coincides with the
severity of the failure, and for a true VBool it roughly
coincides with how convincingly the test passed.
The comparison operator ≤v corresponds to ≤, and takes
the difference between its arguments as its robustness.
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
411
412 Koen Claessen  et al. / IFAC PapersOnLine 51-7 (2018) 408–415
This is because, in order for the value of x ≤ y to change,
one of the arguments has to change by at least |x− y|.
≤v : R× R→ V
x≤v y =
{
(, y − x) ifx ≤ y
(⊥, x− y) otherwise
The other comparison operators are defined in terms of
≤v.  and ⊥ denote true and false, respectively.
We define conjunction ∧+ as follows. The Boolean part
of x∧+ y is computed as x ∧ y. If both x and y are false,
then we add their robustnesses. This is because in order
for x∧+ y to become true, both x and y must become true.
(⊥, x)∧+(⊥, y) = (⊥, x+ y)
Similar reasoning shows that if exactly one of x and y is
false, we should take the robustness of the false argument:
(⊥, x)∧+(, y) = (⊥, x)
(, x)∧+(⊥, y) = (⊥, y)
When x and y are both true, only one of them has to
become false for x∧+ y to become false. As it is easier to
make x∧+ y false than it is to make x false, its robustness
should be lower; the same argument applies to y. The
following formula, inspired by parallel resistance, captures
this idea and leads to satisfactory algebraic properties:
(, x)∧+(, y) =
(
, 11
x +
1
y
)
(When computing reciprocals, we adopt the convention
that 1/0 =∞ and 1/∞ = 0.)
The other Boolean operators are defined as follows:
v = (,∞)
⊥v = (⊥,∞)
¬v(b, x) = (¬b, x)
x∨+ y = ¬v(¬v x∧+ ¬v y)
If we want to take the conjunction of two VBools whose
robustnesses may be wildly different, it can be useful to
first scale the two robustness values. We can do that using
the # operator, which multiplies the robustness of a VBool
by a constant:
(b, x)# k = (b, x · k)
Implication is defined non-classically, in order to give a
penalty to trivially true implications:
x =⇒v y = ¬v(x#K)∨+ y
Here K is an arbitrary constant, 1000 in our system.
Finally, the modal operators such as + are defined
in terms of the operators we have already seen. For
reasons of space we present them slightly informally. In
a discrete setting, the semantics of  can be defined
roughly as follows, where φ is a formula parametrised on
the simulation step:
φ =
∧
{φ(n) | 1 ≤ n ≤ N}
The semantics of + is then
+φ =
(∧
+
{φ(n) | 1 ≤ n ≤ N}
)
#′ δt
where δt is the simulation step size, and #′ is the following
temporal variant of #:
(⊥, x)#′ k = (⊥, x · k)
(, x)#′ k = (, x/k)
When +φ is false, its robustness is
∫
r(t) dt, where r(t) is
the robustness of φ at time t, over all time intervals where
φ is false. When +φ is true its robustness is 1/
∫
1/r(t) dt.
For example, (6) is equivalent to the formula∧
+
{steady(r, n) =⇒v |y[n]− r[n]| ≤v 1 | 0 ≤ n ≤ N}# δt
The reader can check that the robustness of this formula
when false is equal to the integral given in (5) 1 .
A variant of ∧ The semantics of ∧+, ∨+ and + is
not always what we want. For example, take an aircraft
collision avoidance system, where the distance d between
the two aircraft must never be under 1000m. This property
can be expressed as +(1000≤v d), which computes a
robustness of
∫
1000 − d(t) dt integrated over the faulty
intervals, i.e. it considers the amount of time the planes
are too close as well as their distance. The tester, however,
may decide that nearer misses are always worse.
To allow this, we introduce a second conjunction operator
∧max. This is defined so that the conjunction of two false
values takes the maximum of the two robustnesses:
(⊥, x)∧max(, y) = (⊥, x)
(, x)∧max(⊥, y) = (⊥, y)
(⊥, x)∧max(⊥, y) = (⊥, xmax y)
(, x)∧max(, y) = (, xmin y)
We define ∨max and max the same as we did above, but
replacing ∧+ with ∧max, and not scaling by δt in the case
of max. The robustness of maxφ, when false, is then the
maximum robustness of φ at a time instant when φ is
false. For the collision avoidance system, we can now write
the property max(1000≤v d), whose robustness is the
maximum value of 1000− d over all failing time intervals,
exactly what we want.
In the rest of this paper, we will refer to a property
which uses ∧+/∨+/+ as having “+” semantics, while a
property that uses ∧max/∨max/max has “max” semantics.
Despite this terminology, properties can freely mix and
match both sets of operators.
Properties VBools satisfy some, but not all, of the usual
properties of Boolean algebra. The most important prop-
erty is that the Boolean part of the VBool behaves exactly
as in Boolean algebra; using VBools adds robustness in-
formation but does not change the logical meaning of the
property.
The connectives ∧+ and ∨+ are associative and com-
mutative and have an identity and zero element. These
properties mean that the conjunction or disjunction of a
set of formulas is a well-defined notion, even when the
set may be empty. The connectives are deliberately not
idempotent because e.g. x∧+x has a robustness twice that
1 When a continuous signal is represented we write it as s(t),
but when the closed-loop system is simulated the system will be
discretized by the solver and approximated as a discrete time signal.
In situations where we would like to emphasize that it is a discrete
time signal we use the notation s[k].
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
412
of x: a predicate that occurs several times in the property
contributes more to the severity.
Implication x =⇒v y does not satisfy the classical defini-
tion of being ¬x ∨ y, but it does satisfy the property that
(x =⇒v (y =⇒v z) is equivalent to (x∧+ y) =⇒v z. This
means that properties with several preconditions behave
as expected.
3.2 Comparison with Signal Temporal Logic
We now compare VBools with Signal Temporal Logic. The
syntax of STL formulas is defined as follows (Donze´ and
Maler, 2010):
ϕ ::= µ| − µ|ϕ ∧ ψ|ϕ ∨ ψ|[a,b]ψ|♦[a,b]ψ|ϕU[a,b]ψ, (7)
where the predicate µ is µ ≡ µ(x) > 0, ψ and ϕ are STL
formulae; [a,b] denotes the globally operator between a
and b; ♦[a,b] denotes the finally operator between a and b;
U [a,b] denotes the until operator between a and b.
The semantics of STL are shown by considering the signal
x at time t and the satisfaction relation |=, where (x, t) |= µ
denotes that µ is true for signal value x at time t (Raman
et al., 2014).
(x, t) |= µ ⇔ µ(x(t)) > 0 (8)
(x, t) |= ¬µ ⇔ ¬((x, t) |= µ) (9)
(x, t) |= ϕ ∧ ψ ⇔ (x, t) |= ϕ ∧ (x, t) |= ψ (10)
(x, t) |= ϕ ∨ ψ ⇔ (x, t) |= ϕ ∨ (x, t) |= ψ (11)
(x, t) |= [a,b]ϕ ⇔ ∀t′ ∈ [t+ a, t+ b], (x, t′) |= ϕ
(12)
(x, t) |= ♦[a,b]ϕ ⇔ ∃t′ ∈ [t+ a, t+ b], (x, t′) |= ϕ
(13)
(x, t) |= ϕ U[a,b]ψ ⇔ ∃t′ ∈ [t+ a, t+ b] (x, t′) |= ψ
(14)
∧ ∀t′′ ∈ [t, t′], (x, t′′) |= ϕ
A robust semantics of STL formulas is a real-valued
function ρ of the signal x at time t.
ρ(µ, x, t) = µ(x(t)) (15)
ρ(¬µ, x, t) = − µ(x(t))) (16)
ρ(ϕ ∧ ψ, x, t) = min(ρ(ϕ, x, t), ρ(ψ, x, t)) (17)
ρ(ϕ ∨ ψ, x, t) = max(ρ(ϕ, x, t), ρ(ψ, x, t)) (18)
ρ([a,b]ϕ, x, t) = mint′∈[t+a,t+b]ρ(ϕ, x, t′) (19)
ρ(♦[a,b]ϕ, x, t) = maxt′∈[t+a,t+b]ρ(ϕ, x, t′) (20)
ρ(ϕ U[a,b]ψ, x, t) = maxt′∈[t+a,t+b](min(ρ(ψ, x, t′),
(21)
mint′′∈[t,t′]ρ(ϕ, x, t′′)))
VBools and STL are in many ways quite similar. Indeed,
the semantics of an STL formula is basically the same as
the corresponding VBool formula in which we have used
the “max” versions of every operator. 2 The reader might
wonder why we have introduced VBools for shrinking,
rather than using the existing robust semantics for STL.
The main reason is that STL imposes one particular
definition of robustness, while VBools allow the tester
2 Apart from the fact that the robustness 0 plays a special role in
STL (it is neither true nor false), which in turn means that STL
cannot express the difference between < and ≤.
to choose. The severity of a counterexample depends on
the physical interpretation of the property being tested,
so there is no single semantics that always fits best. We
have seen that both the “+” and the “max” semantics are
useful in different applications, depending on whether we
want the integral of the severity or the maximum severity;
in the next section we present some evidence that the
semantics used makes a real difference to shrinking, and
that shrinking based on VBools works better in some cases
than shrinking based on the robust semantics for STL.
We do not suggest that “+” and “max” are the only
reasonable semantics for VBools; our approach makes it
relatively easy to add more variants as needed, as we
encounter new sorts of properties with different physical
interpretations and requirements.
We have a strong suspicion that using VBools in optimiza-
tion based falsification, such as Breach and S-TaLiRo, may
actually improve bug finding effectiveness. The reason is
that in STL, a change in the robustness on one side of
a ∧ will only affect the result of that ∧ if that side was
the minimum already. In VBools, a change in any of the
arguments of ∧+ will always affect its result. So, a black-
box numerical optimization method gets more information
from formulas built with ∧+ than in STL. However, we do
not have enough experimental evidence yet to support this
claim.
4. EVALUATION
In this section we: 1) compare random testing against
falsification, chiefly using Breach; 2) compare VBools
against STL in the context of shrinking; and 3) examine
the quality of the counterexamples produced by VBool-
aware shrinking.
The two models presented here are available in Simulink.
Falsification with Breach, which is a MATLAB toolbox,
is done directly in Simulink. However, to run QuickCheck
for the models, we generate C code that is called through
Haskell scripts. Because of this, the models in this section
are run with a constant step time of 0.001 minutes (for
the heater example) and 0.01 seconds (for the automotive
transmission example). The complete code is available
from https://github.com/koengit/RealTesting, sub-
directory simulink.
For QuickCheck testing of the models, the test data
generator works as follows. We generate piecewise linear
input signals. The pieces can have any size, but are biased
towards smaller sizes. Each piece is randomly chosen to
be either a line or a constant. The line endpoints have a
10% chance of being extremal values, and a 90% chance of
being chosen uniformly at random.
The shrinking steps QuickCheck tries on a piecewise linear
function are:
• Removing a piece from the function;
• reducing a piece’s duration;
• reducing the numerical values of a line endpoint;
• merging two adjacent pieces (x1, y1)–(x2, y2) and
(x2, y
′
2)–(x3, y3) into one piece (x1, y1)–(x3, y3);• flattening a piece which is a line into a constant.
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
413
 Koen Claessen  et al. / IFAC PapersOnLine 51-7 (2018) 408–415 413
of x: a predicate that occurs several times in the property
contributes more to the severity.
Implication x =⇒v y does not satisfy the classical defini-
tion of being ¬x ∨ y, but it does satisfy the property that
(x =⇒v (y =⇒v z) is equivalent to (x∧+ y) =⇒v z. This
means that properties with several preconditions behave
as expected.
3.2 Comparison with Signal Temporal Logic
We now compare VBools with Signal Temporal Logic. The
syntax of STL formulas is defined as follows (Donze´ and
Maler, 2010):
ϕ ::= µ| − µ|ϕ ∧ ψ|ϕ ∨ ψ|[a,b]ψ|♦[a,b]ψ|ϕU[a,b]ψ, (7)
where the predicate µ is µ ≡ µ(x) > 0, ψ and ϕ are STL
formulae; [a,b] denotes the globally operator between a
and b; ♦[a,b] denotes the finally operator between a and b;
U [a,b] denotes the until operator between a and b.
The semantics of STL are shown by considering the signal
x at time t and the satisfaction relation |=, where (x, t) |= µ
denotes that µ is true for signal value x at time t (Raman
et al., 2014).
(x, t) |= µ ⇔ µ(x(t)) > 0 (8)
(x, t) |= ¬µ ⇔ ¬((x, t) |= µ) (9)
(x, t) |= ϕ ∧ ψ ⇔ (x, t) |= ϕ ∧ (x, t) |= ψ (10)
(x, t) |= ϕ ∨ ψ ⇔ (x, t) |= ϕ ∨ (x, t) |= ψ (11)
(x, t) |= [a,b]ϕ ⇔ ∀t′ ∈ [t+ a, t+ b], (x, t′) |= ϕ
(12)
(x, t) |= ♦[a,b]ϕ ⇔ ∃t′ ∈ [t+ a, t+ b], (x, t′) |= ϕ
(13)
(x, t) |= ϕ U[a,b]ψ ⇔ ∃t′ ∈ [t+ a, t+ b] (x, t′) |= ψ
(14)
∧ ∀t′′ ∈ [t, t′], (x, t′′) |= ϕ
A robust semantics of STL formulas is a real-valued
function ρ of the signal x at time t.
ρ(µ, x, t) = µ(x(t)) (15)
ρ(¬µ, x, t) = − µ(x(t))) (16)
ρ(ϕ ∧ ψ, x, t) = min(ρ(ϕ, x, t), ρ(ψ, x, t)) (17)
ρ(ϕ ∨ ψ, x, t) = max(ρ(ϕ, x, t), ρ(ψ, x, t)) (18)
ρ([a,b]ϕ, x, t) = mint′∈[t+a,t+b]ρ(ϕ, x, t′) (19)
ρ(♦[a,b]ϕ, x, t) = maxt′∈[t+a,t+b]ρ(ϕ, x, t′) (20)
ρ(ϕ U[a,b]ψ, x, t) = maxt′∈[t+a,t+b](min(ρ(ψ, x, t′),
(21)
mint′′∈[t,t′]ρ(ϕ, x, t′′)))
VBools and STL are in many ways quite similar. Indeed,
the semantics of an STL formula is basically the same as
the corresponding VBool formula in which we have used
the “max” versions of every operator. 2 The reader might
wonder why we have introduced VBools for shrinking,
rather than using the existing robust semantics for STL.
The main reason is that STL imposes one particular
definition of robustness, while VBools allow the tester
2 Apart from the fact that the robustness 0 plays a special role in
STL (it is neither true nor false), which in turn means that STL
cannot express the difference between < and ≤.
to choose. The severity of a counterexample depends on
the physical interpretation of the property being tested,
so there is no single semantics that always fits best. We
have seen that both the “+” and the “max” semantics are
useful in different applications, depending on whether we
want the integral of the severity or the maximum severity;
in the next section we present some evidence that the
semantics used makes a real difference to shrinking, and
that shrinking based on VBools works better in some cases
than shrinking based on the robust semantics for STL.
We do not suggest that “+” and “max” are the only
reasonable semantics for VBools; our approach makes it
relatively easy to add more variants as needed, as we
encounter new sorts of properties with different physical
interpretations and requirements.
We have a strong suspicion that using VBools in optimiza-
tion based falsification, such as Breach and S-TaLiRo, may
actually improve bug finding effectiveness. The reason is
that in STL, a change in the robustness on one side of
a ∧ will only affect the result of that ∧ if that side was
the minimum already. In VBools, a change in any of the
arguments of ∧+ will always affect its result. So, a black-
box numerical optimization method gets more information
from formulas built with ∧+ than in STL. However, we do
not have enough experimental evidence yet to support this
claim.
4. EVALUATION
In this section we: 1) compare random testing against
falsification, chiefly using Breach; 2) compare VBools
against STL in the context of shrinking; and 3) examine
the quality of the counterexamples produced by VBool-
aware shrinking.
The two models presented here are available in Simulink.
Falsification with Breach, which is a MATLAB toolbox,
is done directly in Simulink. However, to run QuickCheck
for the models, we generate C code that is called through
Haskell scripts. Because of this, the models in this section
are run with a constant step time of 0.001 minutes (for
the heater example) and 0.01 seconds (for the automotive
transmission example). The complete code is available
from https://github.com/koengit/RealTesting, sub-
directory simulink.
For QuickCheck testing of the models, the test data
generator works as follows. We generate piecewise linear
input signals. The pieces can have any size, but are biased
towards smaller sizes. Each piece is randomly chosen to
be either a line or a constant. The line endpoints have a
10% chance of being extremal values, and a 90% chance of
being chosen uniformly at random.
The shrinking steps QuickCheck tries on a piecewise linear
function are:
• Removing a piece from the function;
• reducing a piece’s duration;
• reducing the numerical values of a line endpoint;
• merging two adjacent pieces (x1, y1)–(x2, y2) and
(x2, y
′
2)–(x3, y3) into one piece (x1, y1)–(x3, y3);• flattening a piece which is a line into a constant.
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
413
414 Koen Claessen  et al. / IFAC PapersOnLine 51-7 (2018) 408–415
We intend the above design to be useful for random testing
of hybrid systems in general, rather than a set-up specific
to our examples.
4.1 The heater example
Figure 5 shows the result of shrinking a random coun-
terexample for Section 2’s heater property expressed using
VBools with “+” semantics. The test case is now simple
but has not become a glitch.
To achieve similar results in the falsification procedure, we
would have to change our input generators to only generate
signals that do not switch values many times. However,
this also reduces signal expressivity, and thus might not
be preferable when it is unknown which kind of faults or
bugs exist in the system under test.
As described in Section 3, the semantics of STL is essen-
tially the same as VBools with “max” semantics. We can
therefore test how well an STL-based shrinking method
would have worked by changing our property to use “max”
semantics. When we do so, we get a counterexample which
is cut off early: it stops at the instant in time where the
maximum error value is reached. The resulting counterex-
ample is not quite a glitch, but the property is often only
false for a minute or two. If the error had happened to
be maximal at the point in time where the property first
became false, the resulting counterexample would have
been shrunk into a glitch. This suggests that STL’s robust
semantics is not always appropriate for shrinking.
0 10 20 30 40 50 60 70 80
Time [m]
-5
0
5
10
15
20
25
30
Te
m
pe
ra
tu
re
 [°
C]
setpoint
room
VBool value
not ok
Fig. 5. Shrinking with valued Booleans. We have included
the VBool value in the graph for clarity, even though
it is not a temperature.
4.2 Automatic transmission example
The Automatic Transmission model was proposed as a
benchmark by Hoxha et al. (2014) and is a version of a
demo for the Simulink tool by Mathworks.
The model has one input, the throttle, which can vary in
the interval [0, 100]. There are three outputs: the vehicle
speed v in mph, the engine rotation speed ω in RPM, and
the gear g. We attempt to falsify an augmented version
ϕ2′ of property φ
AT
2 (Hoxha et al., 2014):
ϕ2′ = ((ω < 4500) ∨ (v < 120)). (22)
A counterexample of ϕ2′ must make the speed greater than
120 mph and the engine rotation speed greater than 4500
RPM, at the same time instant. One way to falsify this
is simply to run the vehicle at full throttle, and so both
Breach and QuickCheck are able to falsify the property.
QuickCheck’s counterexample is shown in Figure 6, in
which the vehicle gets up to 160 mph and 5000 RPM.
The counterexample before shrinking, which we omit for
lack of space, was quite complicated. It also only got
up to 140 mph and 4500 RPM – that is, in this case,
shrinking not only preserved but increased the severity of
the counterexample. The property used “+” semantics for
robustness. If we use “max” semantics instead, mimicking
STL, then shrinking does not manage to increase the
severity of the counterexample, and furthermore, the test
case is cut off the instant the vehicle reaches the desired
speed and RPM.
0 10 20 30 40 50 60 70 80
Time [s]
99
100
101
%
Throttle
0 10 20 30 40 50 60 70 80
Time [s]
0
100
200
sp
ee
d 
[m
ph
]
Speed
0 10 20 30 40 50 60 70 80
Time [s]
0
2000
4000
6000
sp
ee
d 
[R
PM
]
Engine rotation speed
Fig. 6. The counterexample for ϕ2′ from QuickCheck.
The thick red lines indicate where the specification
is broken, i.e., where the speed is greater than 120
and the rotational speed is greater than 4500 at the
same time.
QuickCheck is also able to falsify property ϕAT3 , which
states that the transmission never switches from gear 2 to
1 to 2 within 2.5 seconds. The counterexample is shown
in Figure 7. Finding the counterexample takes on average
about 20 simulations, and the shrunk counterexample is
indeed simpler than the original random one. Random
testing seems to be good here, perhaps because compli-
cated test cases are likely to falsify the property.
One property that we cannot falsify with QuickCheck is:
ϕAT6 = ¬(♦[0,20](v > 120) ∧(ω < 4000)), (23)
for which a counterexample means that engine speed ω is
always less than 4000 RPM and the vehicle speed v goes
above 120 within 20 seconds. The reason is that random
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
414
 Koen Claessen  et al. / IFAC PapersOnLine 51-7 (2018) 408–415 415
0 0.5 1 1.5 2 2.5
Time [s]
0
50
100
%
Throttle
0 0.5 1 1.5 2 2.5
Time [s]
1
1.5
2
ge
ar
Gear
Fig. 7. The counterexample for ϕAT3 from QuickCheck. The
thick red lines indicate where the gear switches to
second within 2.5 seconds of switching to first.
test inputs are very unlikely to fulfil the prerequisite ω <
4000. Breach actually manages to falsify this specification,
since the robustness value guides the optimizer to generate
inputs that satisfy the prerequisite ω < 4000.
5. CONCLUSION
QuickCheck testing of hybrid systems is a promising
approach. Random testing is able to falsify some properties
such as ϕAT3 which are difficult for falsification. On the
other hand, it performs poorly in some situations, such as
when the property has a precondition that random inputs
are unlikely to fulfil – in this case, an optimization-based
technique such as falsification is likely more useful. Both
techniques have a place in the tester’s toolbox.
It is important to simplify random counterexamples before
presenting them to the tester, but if not done carefully
this leads to counterexamples with small “glitches”, rather
than clear violations of the specification. Robustness-
aware shrinking, based on the VBool semantics presented
in the paper, is able to simplify random counterexamples
without making them less severe. The VBool semantics
seems more suited to shrinking than the STL robust
semantics, which does not take the length of the failure
into account.
For future work, it would be interesting to use VBools
in the optimization-based falsification procedure, to see
what kind of counterexamples are produced when using
different semantics for e.g. ∧ and ∨. We believe that
this will be of interest for many applications, since the
different semantics can be used to tune the objective of
the optimization problem.
Finally, shrinking seems to result in counterexamples that
are simpler than those found by falsification. For future
work, we will explore using VBool-aware shrinking to
simplify counterexamples found by falsification.
REFERENCES
Akazaki, T. and Hasuo, I. (2015). Time robustness in MTL
and expressivity in hybrid system falsification. In In-
ternational Conference on Computer Aided Verification,
356–374. Springer.
Anand, S., Burke, E.K., Chen, T.Y., Clark, J., Cohen,
M.B., Grieskamp, W., Harman, M., Harrold, M.J., and
McMinn, P. (2013). An orchestrated survey of method-
ologies for automated software test case generation.
Journal of Systems and Software, 86(8), 1978 – 2001.
Annpureddy, Y., Liu, C., Fainekos, G.E., and Sankara-
narayanan, S. (2011). S-TaLiRo: A tool for temporal
logic falsification for hybrid systems. In TACAS, volume
6605, 254–257. Springer.
Arts, T., Hughes, J., Norell, U., and Svensson, H. (2015).
Testing AUTOSAR software with QuickCheck. In Soft-
ware Testing, Verification and Validation Workshops
(ICSTW), 2015 IEEE Eighth International Conference
on, 1–4. IEEE.
Claessen, K. and Hughes, J. (2000). QuickCheck: A
lightweight tool for random testing of Haskell programs.
In Proceedings of the Fifth ACM SIGPLAN Interna-
tional Conference on Functional Programming, ICFP
’00, 268–279. ACM, New York, NY, USA.
Clarke, E.M., Emerson, E.A., and Sifakis, J. (2009). Model
checking: algorithmic verification and debugging. Com-
munications of the ACM, 52(11), 74–84.
Donze´, A. (2010). Breach, a toolbox for verification
and parameter synthesis of hybrid systems. In CAV,
volume 10, 167–170. Springer.
Donze´, A. and Maler, O. (2010). Robust satisfaction of
temporal logic over real-valued signals. In FORMATS,
volume 6246, 92–106. Springer.
Driankov, D., Hellendoorn, H., and Reinfrank, M. (1993).
An Introduction to Fuzzy Control. Springer-Verlag New
York, Inc., New York, NY, USA.
Eddeland, J., Miremadi, S., Fabian, M., and A˚kesson, K.
(2017). Objective functions for falsification of signal
temporal logic properties in cyber-physical systems. In
International Conference on Automation Science and
Engineering, 1326–1331.
Fainekos, G.E. and Pappas, G.J. (2009). Robustness of
temporal logic specifications for continuous-time signals.
Theoretical Computer Science, 410(42), 4262–4291.
Henzinger, T.A., Kopke, P.W., Puri, A., and Varaiya, P.
(1995). What’s decidable about hybrid automata? In
Proceedings of the twenty-seventh annual ACM sympo-
sium on Theory of computing, 373–382. ACM.
Hoxha, B., Abbas, H., and Fainekos, G. (2014). Bench-
marks for temporal logic requirements for automotive
systems. Proc. of Applied Verification for Continuous
and Hybrid Systems.
Hughes, J. (2016). Experiences with QuickCheck: testing
the hard stuff and staying sane. In A List of Successes
That Can Change the World, 169–186. Springer.
Raman, V., Donze´, A., Maasoumy, M., Murray, R.M.,
Sangiovanni-Vincentelli, A., and Seshia, S.A. (2014).
Model predictive control with signal temporal logic
specifications. In Decision and Control (CDC), 2014
IEEE 53rd Annual Conference on, 81–87. IEEE.
IFAC WODES 2018
May 30 - June 1, 2018. Sorrento Coast, Italy
415
