RERS 2019: Combining Synthesis with Real-World Models by Jasper, M. et al.
PDF hosted at the Radboud Repository of the Radboud University
Nijmegen
 
 
 
 
The following full text is a publisher's version.
 
 
For additional information about this publication click this link.
http://hdl.handle.net/2066/204610
 
 
 
Please be advised that this information was generated on 2019-07-12 and may be subject to
change.
RERS 2019: Combining Synthesis
with Real-World Models
Marc Jasper1, Malte Mues1, Alnis Murtovi1, Maximilian Schlu¨ter1,
Falk Howar1, Bernhard Steﬀen1, Markus Schordan2(B), Dennis Hendriks3,
Ramon Schiﬀelers4, Harco Kuppens5, and Frits W. Vaandrager5
1 TU Dortmund University, Dortmund, Germany
{marc.jasper,malte.mues,alnis.murtovi,maximilian.schlueter,falk.howar,
bernhard.steffen}@tu-dortmund.de
2 Lawrence Livermore National Laboratory, Livermore, CA, USA
schordan1@llnl.gov
3 ESI (TNO), Eindhoven, The Netherlands
dennis.hendriks@tno.nl
4 ASML and Eindhoven University of Technology,
Veldhoven/Eindhoven, The Netherlands
ramon.schiffelers@asml.com
5 Radboud University, Nijmegen, The Netherlands
{H.Kuppens,F.Vaandrager}@cs.ru.nl
Abstract. This paper covers the Rigorous Examination of Reactive Sys-
tems (RERS) Challenge 2019. For the ﬁrst time in the history of RERS,
the challenge features industrial tracks where benchmark programs that
participants need to analyze are synthesized from real-world models.
These new tracks comprise LTL, CTL, and Reachability properties. In
addition, we have further improved our benchmark generation infras-
tructure for parallel programs towards a full automation. RERS 2019
is part of TOOLympics, an event that hosts several popular challenges
and competitions. In this paper, we highlight the newly added industrial
tracks and our changes in response to the discussions at and results of
the last RERS Challenge in Cyprus.
Keywords: Benchmark generation · Program veriﬁcation ·
Temporal logics · LTL · CTL · Property-preservation · Obfuscation ·
Synthesis
1 Introduction
The Rigorous Examination of Reactive Systems (RERS) Challenge is an annual
event concerned with software veriﬁcation tasks—called benchmarks—on which
participants can test the limits of their tools. In its now 9th iteration, the
RERS Challenge continues to expand both its underlying benchmark gener-
ator infrastructure and the variety of its tracks. This year, RERS is part of
This is a U.S. government work and not under copyright protection in the U.S.;
foreign copyright protection may apply 2019
D. Beyer et al. (Eds.): TACAS 2019, Part III, LNCS 11429, pp. 101–115, 2019.
https://doi.org/10.1007/978-3-030-17502-3_7
102 M. Jasper et al.
TOOLympics [2]. As during previous years [9,12,13], RERS 2019 features tracks
on sequential and parallel programs in programming/speciﬁcation languages
such as Java, C99, Promela [11], and (Nested-Unit) Petri nets [8,19]. Proper-
ties that participants have to analyze range from reachability queries over linear
temporal logic (LTL) formulae [20] to computational tree logic (CTL) proper-
ties [6]. Participants only need to submit their “true”/“false” answers to these
tasks. As a new addition in 2019, we enrich RERS with industrial tracks in which
benchmarks are based on real-world models.
The main goals of RERS1 are:
1. Encourage the combination of methods from diﬀerent (and usually discon-
nected) research ﬁelds for better software veriﬁcation results.
2. Provide a framework for an automated comparison based on diﬀerently
tailored benchmarks that reveal the strengths and weaknesses of speciﬁc
approaches.
3. Initiate a discussion about better benchmark generation, reaching out across
the usual community barriers to provide benchmarks useful for testing and
comparing a wide variety of tools.
One aspect that makes RERS unique in comparison to other competitions
or challenges on software veriﬁcation is its automated benchmark synthesis: The
RERS generator infrastructure allows the organizers to distribute new and chal-
lenging veriﬁcation tasks each year while knowing the correct solution to these
tasks. Contrarily, in similar events such as the Software Veriﬁcation Competition
(SV-COMP) [3] which focuses on programs written in C and reachability queries,
benchmarks are hand-selected by a committee and most of them are used again
for subsequent challenge iterations. That the solutions to these problems are
already known does not harm because, e.g. SV-COMP, does not merely focus on
the answers to the posed problems, but also on details of how they are achieved.
To attain this, SV-COMP features a centralized evaluation approach along with
resource constraints where participants submit their tools instead of just their
answers to the veriﬁcation tasks. During this evaluation phase, which builds on
quite an elaborate competition infrastructure, obtained counterexample traces
are also evaluated automatically [4].
The situation is quite diﬀerent for the Model Checking Contest (MCC) [16], a
veriﬁcation competition that is concerned with the analysis of Petri nets, where
the correct solutions to the selected veriﬁcation tasks are not always known to
the competition organizers. In such cases, the MCC evaluation is often based
on majority voting concerning the submissions by participants, an approach
also followed by a number of other competitions despite the fact that this may
penalize tools of exceptional analysis power. In contrast, the synthesis procedure
of veriﬁcation tasks for RERS also generates the corresponding provably correct
solutions using a correctness-by-construction approach. Both SV-COMP and
MCC have therefore added RERS benchmarks to their problem portfolio.
1 As stated online at http://www.rers-challenge.org/2019/.
RERS 2019: Combining Synthesis with Real-World Models 103
As stated above, RERS aims to foster the combination of diﬀerent methods,
and this includes the combination of diﬀerent tools. During last year’s RERS
Challenge for example, one participant applied three diﬀerent available tools in
order to generate his submission2 and thereby won the Method Combination
Award within RERS3. In order to host an unmonitored and free-style challenge
such as RERS on a regular basis—one where just the “true”/“false” answers
need to be submitted—an automated benchmark synthesis is a must.
Potential criticism of such a synthesis approach might be that the generated
veriﬁcation tasks are not directly connected to any real-world problem: Their size
might be realistic, however their inherent structure might be not. This criticism
very much reﬂects a perspective where RERS benchmarks are structurally com-
pared to handwritten code. On the other hand, being synthesized from temporal
constraints, RERS benchmarks very much reﬂect the structure that arises in gen-
erative or requirements-driven programming. In order to be close to industrial
practice, RERS 2019 also provides benchmarks via a combination of synthesis
with real-world models. For this endeavor, we collaborated with ASML, a large
Dutch semiconductor company.
When developing controller software, over time updates and version changes
inevitably turn originally well-documented solutions into legacy software, which
typically should preserve the original controller behavior. RERS 2019 addresses
this phenomenon by generating legacy-like software from models via a number
of property-preserving transformations that are provided by the RERS infras-
tructure [22]. This results in correct ‘obfuscated’ (legacy) implementations of the
real-world models provided by ASML.
The parallel benchmarks of the last RERS challenge were built on top of
well-known initial systems, dining philosophers of various sizes. As a next step
towards a fully automated benchmark generation process, we created the initial
system in a randomized fashion this year. The subsequent property-preserving
parallel decomposition process, which may result in benchmarks of arbitrary
degrees of parallelism, remained untouched [23]. For RERS 2020 we plan to use
the more involved synthesis approach presented in [15] in order to be able to
also guarantee benchmark hardness.
Moreover, in response to participants’ requests, we implemented a generator
that creates candidates for branching time properties for the parallel bench-
marks. The idea is to syntactically transform available LTL properties into
semantically ‘close’ CTL formulae. This turns out to provide interesting CTL
formulae for the benchmarks systems. These formulae’s validity has, of course,
to be validated via model checking as the generation process is not (cannot be)
semantics preserving.
In the following, the detailed observations from RERS 2018 are described
in Sect. 2. Section 3 then summarizes improvements within the parallel tracks
of RERS that we implemented for the 2019 challenge, before Sect. 4 introduces
2 Details at http://www.rers-challenge.org/2018/index.php?page=results.
3 The reward structure of RERS is described in previous papers such as [12].
104 M. Jasper et al.
the new industrial tracks with their dedicated benchmark construction. Our
conclusions and outlook to future work can be found in Sect. 5.
2 Lessons Learned: The Sequential Tracks of RERS 2018
For RERS 2018, we received four contributions to the Sequential Reachability
track and two contributions to the Sequential LTL track. Detailed results are
published online.4 The tools that participants used for the challenge are quite
heterogeneous: Their proﬁles range from explicit-state model checking over trace
abstraction techniques to a combination of active automata learning with model
checking [5,10,14,18,25]. During the preparations for the new sequential and
industrial tracks, we started a closer investigation on lessons we might learn
from the results of the RERS 2018 challenge in addition to the valuable feedback
collected during the RERS 2018 meeting in Limassol.
 0
 20
 40
 60
 80
 100
Problem 10
Problem 11
Problem 12
Problem 13
Problem 14
Problem 15
Problem 16
Problem 17
Problem 18
true
false
unknown
unknown true
unknown false
(a) University of Freiburg
 0
 20
 40
 60
 80
 100
Problem 10
Problem 11
Problem 12
Problem 13
Problem 14
Problem 15
Problem 16
Problem 17
Problem 18
(b) University of Twente
 0
 20
 40
 60
 80
 100
Problem 10
Problem 11
Problem 12
Problem 13
Problem 14
Problem 15
Problem 16
Problem 17
Problem 18
(c) LLNL
 0
 20
 40
 60
 80
 100
Problem 10
Problem 11
Problem 12
Problem 13
Problem 14
Problem 15
Problem 16
Problem 17
Problem 18
(d) LMU Munich
Fig. 1. Reachability results. (Color ﬁgure online)
In Fig. 1, the results of the participants of the reachability challenge are
visualized. The blue bars indicate how many properties have not been addressed
by the respective participant for a problem. Hence, these blue bars point to
potential opportunities for achieving better challenge results for each tool. It is
observable that the amount of green bars is decreasing with increasing problem
size and diﬃculty. This shows that less unreachable errors are detected with
increased problem size. In contrast, the purple bars still show a fair number of
results for reachable errors.
4 http://www.rers-challenge.org/2018/index.php?page=results.
RERS 2019: Combining Synthesis with Real-World Models 105
It is obvious that showing the absence of a certain error requires a more com-
plicated proof than demonstrating that it is reachable. Therefore, the observed
result is not unexpected. To investigate this further, the blue bars are split up
into the corresponding categories from which the unsolved properties originate.
An orange bar shows the number of unreported reachable errors. A yellow bar
shows the number of unreported unreachable errors. In most cases, the yellow
bar is comparable in size to the blue bar for a problem. On the one hand, this is
evidence which demonstrates that proving unreachable errors is still a hard chal-
lenge no matter which approach has been applied. On the other hand, the charts
indicate that participating tools scale quite well also on the larger problems for
demonstrating the existence of errors.
 0
 20
 40
 60
 80
 100
Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
Problem 6
Problem 7
Problem 8
Problem 9
true
false
unknown
unknown true
unknown false
(a) LLNL
 0
 20
 40
 60
 80
 100
Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
Problem 6
Problem 7
Problem 8
Problem 9
(b) University of Twente
Fig. 2. LTL results. (Color ﬁgure online)
We found a similar situation in the LTL track results reported in Fig. 2. In this
ﬁgure a purple bar indicates that a LTL formula holds. This proof requires a deep
understanding of major parts of the complete execution graph. This is therefore
the counterpart for proving an error unreachable. As expected, it appears to be
much easier for tools to disprove an LTL formula on the given examples the
same way as it seems signiﬁcantly easier to prove error reachability. With a few
exceptions, the blue bars indicating unreported properties for a given problem
are comparable in height with the orange bars for LTL formulae expected to hold
on the given instance. We want to highlight that the tools which participated in
RERS 2018 demonstrated a good scalability for disproving LTL formulae across
the diﬀerent problem sizes.
Based on the results handed in to RERS 2018, we observe some maturity
in tools disproving LTL formulae and ﬁnding errors, which are both charac-
terized by having single paths as witnesses. We appreciate this trend because a
lacking scalability of veriﬁcation tools was a major motivation to start the RERS
challenge.
As a next step, we intend to motivate future participants to further investi-
gate the direction of proving LTL formulae and error unreachability on systems.
These properties require more complex proofs as it is not possible to verify the
answer with a single violating execution path. Instead it is required to create
a deeper understanding of all possible execution paths in order to give a sound
answer. There is a higher chance to make a mistake and give a wrong answer
resulting in a penalty.
106 M. Jasper et al.
With RERS 2019, we therefore want to encourage people to invest into cor-
responding veriﬁcation tools by valuing that veriﬁable properties are more com-
plicated to analyze than refutable ones. In the future we will award two points
for each correct report of an unreachable error or a satisﬁed LTL formula in the
competition-based ranking. The achievement reward system remains unchanged.
3 Improvements in the Parallel Tracks for RERS 2019
The initial model used for the RERS 2018 tracks on parallel programs was
chosen to be the Dining Philosophers Problem in order to feature a well-known
system [13]. With the goal to reﬂect the properties of this system as best as
possible, the corresponding LTL and CTL properties were designed manually.
To streamline our generation approach and minimize the amount of manual work
involved, we decided to further automate these steps for RERS 2019.
In [15], a new workﬂow for the generation of parallel benchmarks was pre-
sented that fully automates the generation process while ensuring certain hard-
ness guarantees of the corresponding veriﬁcation/refutation tasks. Due to time
constraints, we could not fully integrate this new approach into our generation
pipeline for RERS 2019. Instead, we combined new and existing approaches to
achieve a full automation (Fig. 3). Our workﬂow for RERS’19 therefore does not
yet guarantee the formal hardness properties presented in [15]. On the other
hand, it integrates the generation of CTL properties, an aspect that was not
discussed in [15].
property
mining
model
checking
parallel MTSs
parallel MTSs
parallel LTSs
Model or
Code
Questionnaire
Q
uestionnaire
Pr
Be
nc
hm
ar
k
of
ile
Solution
parallel decomposition
parallel decom
position
alphabet
extension
modal
refinement
CTLs + solutionCTLs
LTLsLTLs + solution
G( ⇒F( ))
F( )
G( => F( ))
F( )
true
false
AG( true⇒
AF( true))
AF( true)
AG( true=> 
AF( true))
AF( true)
false
false
Fig. 3. Workﬂow of the benchmark generation for the RERS’19 parallel programs.
RERS 2019: Combining Synthesis with Real-World Models 107
Input to the overall workﬂow (Fig. 3) is a benchmark proﬁle that contains
metadata such as the number of desired veriﬁable/refutable LTL/CTL proper-
ties, number of parallel components in the ﬁnal code, and similar characteristics.
The generation of a parallel benchmark starts with a labeled transition system
(LTS). We chose to randomly generate these for RERS’19, based on parame-
ters in the input benchmark proﬁle. Alternatively, one could choose an existing
system modeled as an LTS if its size still permits to model check it eﬃciently.
3.1 Property Generation
Given the initial LTS, we randomly select veriﬁable and refutable properties
based on certain LTL patterns. This process is called property mining in Fig. 3
and was previously used to generate the parallel benchmarks of RERS’16 [9] and
some of RERS’17 [12].
As a new addition to the automated workﬂow, we implemented a generation
of CTL formulae based on the following idea:
– Syntactically transform an LTL formula φl to a CTL formula φc. This yields
structurally interesting CTL properties but is not guaranteed to preserve the
semantics.
– Check φc on the input model. This step compensates for the lack of property
preservation of the ﬁrst step.
– Possibly negate φc and then apply de Morgan-like rules to eliminate the
leading negation operator in case the ratio of satisﬁed and violated properties
does not match the desired characteristics. This works for CTL, as in contrast
to LTL, formulae or their negations are guaranteed to hold (law of excluded
middle).
We realized the transformation from an LTL formula to a corresponding CTL
formula by prepending an A (‘always’) to every LTL operator which requires
the formula to hold on every successor state. For a state to satisfy AGφ for
example, φ has to hold in every state on every path starting in the given state.
Additionally, we introduced a diamond operator for every transition label that
is not negated in the LTL formula and a box for every negated label as detailed
below. The transformation was implemented as follows where the LTL formula
to the left of the arrow is replaced by the CTL formula to the right of the arrow.5
Gφ → AGφ
Fφ → AFφ
φUψ → A(φUψ)
φWψ → A(φWψ)
a → 〈a〉true
¬a → [a]false
5 For more details on the syntax of the LTL and CTL properties, see http://rers-
challenge.org/2019/index.php?page=problemDescP.
108 M. Jasper et al.
The diamond operator 〈a〉φ holds in a state iﬀ the state has at least one
outgoing transition labeled with an a whose target state satisﬁes φ. In this
case 〈a〉true holds in a state if it has an outgoing transition labeled with a
because every state satisﬁes ‘true’. The box operator [a]φ holds in a state iﬀ
every outgoing transition labeled with an a satisﬁes φ. The negation of an atomic
proposition a was replaced by [a]false which is only satisﬁed by a state which
has no outgoing transitions labeled with an a.
Based on the previously mentioned steps, we can automatically generate
LTL and CTL properties that are given to participants of the challenge as a
questionnaire (see Fig. 3). Similarly, the corresponding solution is extracted and
kept secret by the challenge organizers until the submission deadline has passed
and the results of the challenge are announced.
3.2 Expansion and Translation of the Input Model
In order to synthesize challenging veriﬁcation tasks and provide parallel pro-
grams, we expand the initial LTS based on property-preserving parallel decom-
positions [23] (see top and right-hand side of Fig. 3). The corresponding pro-
cedure works on modal transition systems (MTSs) [17], an extension of LTSs.
This parallel decomposition can be iterated. During this expansion procedure,
the alphabet of the initial system is extended by artiﬁcial transition labels. More
details including examples can be found in [13,21].
As a last step, the ﬁnal model of the now parallel program is encoded in
diﬀerent target languages such as Promela or as a Nested-Unit Petri net [8] in
the standard PNML format6. The ﬁnal code or model speciﬁcation is presented
to participants of the challenge along with the questionnaire that contains the
corresponding LTL/CTL properties.
Please note the charm of verifying branching time properties: As CTL is
closed under negation, proving whether a formula is satisﬁed or violated can in
both cases be accomplished using standard model checking, and in both cases
one can construct witnesses in terms of winning strategies. Thus there is not
such a strong discrepancy between proving and refuting properties as in LTL.
4 Industrial Tracks
RERS 2019 includes tracks that are based on industrial embedded control sys-
tems provided by ASML. ASML is the world’s leading provider of lithography
systems for the semiconductor industry. Lithography systems are very complex
high-tech systems that are designed to expose patterns on silicon wafers. This
processing must not only be able to deliver exceptionally reliable results with an
extremely high output on a 24/7 basis, it must do so while also being extremely
precise. With patterns becoming smaller and smaller, ASML TWINSCAN lithog-
raphy systems incorporate an increasing amount of control software to compen-
sate for nano-scale physical eﬀects.
6 ISO/IEC 15909-2: https://www.iso.org/standard/43538.html.
RERS 2019: Combining Synthesis with Real-World Models 109
To deal with the increasing amount of software, ASML employs a component-
based software architecture. It consists of components that interact via explicitly
speciﬁed interfaces, establishing a formalized contract between those compo-
nents. Such formal interface speciﬁcations not only include syntactic signatures
of the functions of an interface, but also their behavioural constraints in terms
of interface protocols. Furthermore, non-functional aspects, such as timing, can
be described.
Formal interface speciﬁcations enable the full potential of a component-
based software architecture. They allow components to be developed, analyzed,
deployed and maintained in isolation. This is achieved using enabling techniques,
among which are model checking (to prove interface compliance), observers (to
check interface compliance), armoring (to separate error handling from compo-
nent logic) and test generation (to increase test coverage).
For newly developed components, ASML speciﬁes the corresponding interface
protocols. However, components developed in the past often do not have such
interface protocol speciﬁcation yet. ASML aims to obtain behavioral interface
speciﬁcations for such components. Model inference techniques help to obtain
such speciﬁcations in an eﬀective way [1]. Such techniques include, for instance,
static analysis exploiting information in the source code, passive learning based
on execution logs, active automata learning querying the running component,
and combinations of these techniques.
ASML collaborates with ESI7 in a research project on the development of
an integrated tool suite for model inference to (semi-automatically) infer inter-
face protocols from existing software components. This tool suite is applied and
validated in the industrial context of ASML. Recently, this tool suite has been
applied to 218 control software components of ASML’s TWINSCAN lithography
machines [26]. 118 components could be learned in an hour or less. The tech-
niques failed to successfully infer the interface protocols of the remaining 100
components.
Obtaining the best performing techniques to infer behavioral models for these
components is the goal of the ASML-based industrial tracks of RERS 2019.
Any model inference technique, including source code analyzers, passive learn-
ing, (model-based) testers and (test-based) modelers including active automata
learning, and free-style approaches, or combinations of techniques can be used.
The best submissions to the challenge might be used by ASML and ESI and
incorporated into their tool suite.
4.1 ASML Components for RERS
For the RERS challenge, ASML disclosed information about roughly a hundred
TWINSCAN components. We decided to select 30 among them to generate
challenging benchmark problems for RERS 2019, and three additional ones that
are used for training problems. Using these components allows participants to
7 ESI is a TNO Joint Innovation Centre, a collaboration between the Netherlands
Organisation for Applied Scientiﬁc Research (TNO), industry, and academia.
110 M. Jasper et al.
apply their tools and techniques on components of industrial size and complexity,
evaluating their real-world applicability and performance.
For the disclosed components, Mealy machine (MM) models and (generated)
Java and C++ source-code exist. The generation of benchmarks for the RERS
challenge is based on the MM models. This allows us to open the industrial tracks
also to tools that analyze C programs. The Java code of the challenge is gen-
erated by the organizers as described later in Sect. 4.4 and does not represent
the originally generated Java code provided by ASML. This prevents partici-
pants from exploiting potential structural patterns in this original Java code
(such structural information does not exist in legacy components). Furthermore,
an execution log is provided for each component. Each execution log contains
a selected number of logged traces, provided by ASML, representing behavior
exhibited by either a unit or integration test.
The remainder of this section provides a brief overview of how properties are
generated for these benchmarks and how code is generated using the obfusca-
tion infrastructure from previous sequential RERS tracks. Figure 4 presents an
overview of the corresponding benchmark generation workﬂow that is described
in the following.
CodeQuestionnaire
Be
nc
hm
ar
k
Pr
of
ile
Q
uestionnaire
Solution
complete MM w. 
error transitions
partial MM w. some
error transitions
partial MM without
error transitions
LTLs + solutionCTLs + solution
partial MM w. some
error transitions
(initial) code model
MM expansion
MM expansion
discrimination tree
program model
construction
error selection
property
mining
Section 4.1 Section 4.2 Section 4.3 Section 4.4
program model
elaboration
G( ⇒F( ))
F( )
true
false
AG( ⇒EF )
AG( )
true
false
err
err
err err
Fig. 4. Workﬂow of the benchmark generation for the new industrial tracks.
4.2 Generation of CTL Properties
We compute CTL formulae from Mealy machines using conformance testing
algorithms. We generate a small set of traces that characterizes each state.
RERS 2019: Combining Synthesis with Real-World Models 111
Using this, we can deﬁne for each state q a CTL state formula σq that char-
acterizes part of its behavior. If i1/o1, i2/o2 is an IO sequence of state q, then
formula σq takes the form
EX(i1 ∧ EX(o1 ∧ EX(i2 ∧ EXo2))).
These characterizing formulae are the basis for CTL properties, e.g., of the form
AG(σ1 ∨ σ2 ∨ . . . ∨ σn),
AG(σ1 ⇒ EX(i ∧ EXo ∧ σ2)), or
AG(σ1 ⇒ EFσ3),
where i and o denote symbols from the set of inputs and outputs of the Mealy
machine model, respectively. Additionally, we generate CTL formulae that do
not hold in the model using the same approach.
4.3 LTL and Reachability Properties
Regarding the new ASML-based benchmarks, we used a property mining app-
roach for the generation of LTL properties. By mining we mean that properties
are extracted from the model without altering this model. As a ﬁrst step, we
temporarily discard all error transitions from the input Mealy machine (MM)
(see Fig. 4): In line with the benchmark deﬁnition used in former editions of the
RERS tracks on sequential programs, our LTL properties only constrain inﬁ-
nite paths. This nicely reﬂects the fact that controllers or protocols are typically
meant to continuously run in order to react on arising input.
Having discarded all error transitions, our approach ﬁrst generates random
properties from relevant patterns according to [7]. A model checker is then used
to determine whether or not the generated properties hold on the given input
model. We iterate this process until we ﬁnd a desired ratio between satisﬁed and
violated properties. This mining approach is very similar to the previous LTL
generation in RERS (cf. [22]), with the exception that no properties are used for
synthesizing a MM. Because we have never altered the original MM with regard
to its inﬁnite paths, all extracted LTL properties that are satisﬁed characterize
the input/output behavior of the given real-world model.
Similar to the former editions of RERS, the new industrial tracks also pro-
vides reachability tasks (“Is the error labeled x reachable?”). This generation
process is disjoint with the LTL track generation. While we discarded all error
transitions from the input model during conversion from the input model to code
in the LTL track generation, we select real errors from the given input model and
map them to unique error states before code generation during the reachability
task generation. This way, all included errors are taken from a real system and
are not synthetical. The same input models are used for the generation of the
benchmarks for the reachability track and the LTL track. There are just slightly
diﬀerent pre-processes in place that address the handling of error transition dur-
ing generation. Using real error paths is again in contrast to the benchmark
112 M. Jasper et al.
synthesis of RERS that was applied during previous years where reachability
tasks were artiﬁcially added to (already artiﬁcial) input models.
As depicted in the top-left corner of Fig. 4, the initial MM model is complete,
meaning that each input symbol that is not supported in a certain state is rep-
resented by an error transition leading to a sink state. We randomly select some
of those error transitions and reroute them so that they each lead to a distinct
error state. At the same time, we introduce unreachable error states to the MM
and enumerate both the reachable and unreachable error states. The resulting
reachability vector is reported back to the challenge organizer as part of the
Questionnaire Solution (Fig. 4). The error states are then rendered as guarded
“veriﬁer errors” in the ﬁnal program (see Sect. 4.4). Unsupported transitions
that were not selected for the reachability tasks are rendered as “invalid input”,
in line with the previous RERS tracks on sequential programs.
4.4 Obfuscation and Code Generation
The obfuscation and code generation steps are reused from the existing RERS
benchmark generator of the sequential tracks. As described in Fig. 4 and in
Section 11 and Section 12 of [22], the translation from the initial MM to the ﬁnal
code is divided into smaller steps, which are implemented as individual modules.
As shown in the right-hand side of Fig. 4, the partial MM is ﬁrst expanded
as done before. Additional states which are clones of existing states are added
such that they are unreachable. Next, a discrimination tree is constructed using
diﬀerent kinds of variables as properties on the nodes and constraints on these
variables on the outgoing edges of the decision tree. Based on the choice of
these variables, the current complexity of the synthetic RERS benchmarks is
controllable. It may range from plain encodings using only integer variables to
encode the subtrees, to options with string variables, arithmetic operations and
array variables in the same fashion as it was done in the past for RERS.
Next, the automaton is randomly mapped to the leaf nodes of the decision
tree. The constraints collected along a path from the decision tree root to a
leaf is used to encode a state of the automaton associated with that leaf. The
automaton transitions are encoded based on the decision tree encoding. The
now completely encoded problem is translated into the target language. While
ASML normally generates C++ code from its automaton models, we decided
to maintain the old RERS tradition of providing a Java and a C encoding for
each problem. The underlying MM is maintained during this obfuscation and
encoding step as it has been in the previous editions of RERS.
5 Conclusion and Outlook
With the addition of industrial tracks where benchmarks are based on real-world
models, RERS 2019 combined the strength of automated synthesis with the rel-
evance of actively used software. Due to these new tracks based on a collabora-
tion with the company ASML, the variety of diﬀerent tasks that participants of
RERS 2019: Combining Synthesis with Real-World Models 113
RERS can address has again expanded. Independently of this new addition, we
further improved our generation infrastructure and realized a fully-automated
synthesis of parallel programs that feature intricate dependencies between their
components.
In the future, we intend to fully integrate the approach presented in [15] into
our infrastructure in order to guarantee formal hardness properties also for vio-
lated formulae. Future work might include equivalence-checking tasks between
a model and its implementation, for example based on the systems provided by
ASML. Furthermore, we intend to provide benchmark problems for weak bisim-
ulation checking [24] for the RERS 2020 challenge. As a longer-term goal, we
continue our work towards an open-source generator infrastructure that allows
tool developers to generate their own benchmarks.
Acknowledgments. This work was partially performed under the auspices of the U.S.
Department of Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344, and was supported by the LLNL-LDRD Program under Project
No. 17-ERD-023. IM Release Nr. LLNL-CONF-766478.
References
1. Aslam, K., Luo, Y., Schiﬀelers, R.R.H., van den Brand, M.: Interface protocol
inference to aid understanding legacy software components. In: Proceedings of
MODELS 2018, co-located with ACM/IEEE 21st International Conference on
Model Driven Engineering Languages and Systems (MODELS 2018), Copenhagen,
Denmark, pp. 6–11 (2018)
2. Bartocci, E., et al.: TOOLympics 2019: an overview of competitions in formal
methods. In: Beyer, D., Huisman, M., Kordon, F., Steﬀen, B. (eds.) TACAS 2019.
LNCS, vol. 11429, pp. xx–yy. Springer, Cham (2019)
3. Beyer, D.: Competition on software veriﬁcation (SV-COMP). In: Flanagan, C.,
Ko¨nig, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 504–524. Springer, Heidelberg
(2012). https://doi.org/10.1007/978-3-642-28756-5 38
4. Beyer, D.: Software veriﬁcation and veriﬁable witnesses. In: Baier, C., Tinelli, C.
(eds.) TACAS 2015. LNCS, vol. 9035, pp. 401–416. Springer, Heidelberg (2015).
https://doi.org/10.1007/978-3-662-46681-0 31
5. Blom, S., van de Pol, J., Weber, M.: LTSmin: distributed and symbolic reachability.
In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 354–359.
Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6 31
6. Clarke, E.M., Emerson, E.A.: Design and synthesis of synchronization skeletons
using branching time temporal logic. In: Kozen, D. (ed.) Logic of Programs 1981.
LNCS, vol. 131, pp. 52–71. Springer, Heidelberg (1982). https://doi.org/10.1007/
BFb0025774
7. Dwyer, M.B., Avrunin, G.S., Corbett, J.C.: Patterns in property speciﬁcations for
ﬁnite-state veriﬁcation. In: Proceedings of the 1999 International Conference on
Software Engineering (IEEE Cat. No. 99CB37002), pp. 411–420, May 1999
8. Garavel, H.: Nested-unit Petri nets. J. Log. Algebraic Methods Program. 104,
60–85 (2019)
114 M. Jasper et al.
9. Geske, M., Jasper, M., Steﬀen, B., Howar, F., Schordan, M., van de Pol, J.:
RERS 2016: parallel and sequential benchmarks with focus on LTL veriﬁcation. In:
Margaria, T., Steﬀen, B. (eds.) ISoLA 2016. LNCS, vol. 9953, pp. 787–803.
Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47169-3 59
10. Heizmann, M., et al.: Ultimate Automizer with SMTInterpol. In: Piterman,
N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 641–643. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-36742-7 53
11. Holzmann, G.: The SPIN Model Checker: Primer and Reference Manual, 1st edn.
Addison-Wesley Professional, Boston (2011)
12. Jasper, M., et al.: The RERS 2017 challenge and workshop (invited paper). In:
Proceedings of the 24th ACM SIGSOFT International SPIN Symposium on Model
Checking of Software, SPIN 2017, pp. 11–20. ACM (2017)
13. Jasper, M., Mues, M., Schlu¨ter, M., Steﬀen, B., Howar, F.: RERS 2018: CTL,
LTL, and reachability. In: Margaria, T., Steﬀen, B. (eds.) ISoLA 2018. LNCS, vol.
11245, pp. 433–447. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-
03421-4 27
14. Jasper, M., Schordan, M.: Multi-core model checking of large-scale reactive systems
using diﬀerent state representations. In: Margaria, T., Steﬀen, B. (eds.) ISoLA
2016. LNCS, vol. 9952, pp. 212–226. Springer, Cham (2016). https://doi.org/10.
1007/978-3-319-47166-2 15
15. Jasper, M., Steﬀen, B.: Synthesizing subtle bugs with known witnesses. In:
Margaria, T., Steﬀen, B. (eds.) ISoLA 2018. LNCS, vol. 11245, pp. 235–257.
Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03421-4 16
16. Kordon, F., et al.: Report on the model checking contest at Petri nets 2011. In:
Jensen, K., van der Aalst, W.M., Ajmone Marsan, M., Franceschinis, G., Kleijn,
J., Kristensen, L.M. (eds.) Transactions on Petri Nets and Other Models of Con-
currency VI. LNCS, vol. 7400, pp. 169–196. Springer, Heidelberg (2012). https://
doi.org/10.1007/978-3-642-35179-2 8
17. Larsen, K.G.: Modal speciﬁcations. In: Sifakis, J. (ed.) CAV 1989. LNCS, vol. 407,
pp. 232–246. Springer, Heidelberg (1990). https://doi.org/10.1007/3-540-52148-
8 19
18. Meijer, J., van de Pol, J.: Sound black-box checking in the LearnLib. In: Dutle,
A., Mun˜oz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 349–366.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77935-5 24
19. Peterson, J.L.: Petri Net Theory and the Modeling of Systems. Prentice Hall PTR,
Upper Saddle River (1981)
20. Pnueli, A.: The temporal logic of programs. In: 18th Annual Symposium on Foun-
dations of Computer Science (SFCS 1977), pp. 46–57, October 1977
21. Steﬀen, B., Jasper, M., Meijer, J., van de Pol, J.: Property-preserving generation
of tailored benchmark Petri nets. In: 17th International Conference on Application
of Concurrency to System Design (ACSD), pp. 1–8, June 2017
22. Steﬀen, B., Isberner, M., Naujokat, S., Margaria, T., Geske, M.: Property-driven
benchmark generation: synthesizing programs of realistic structure. STTT 16(5),
465–479 (2014)
23. Steﬀen, B., Jasper, M.: Property-preserving parallel decomposition. In: Aceto, L.,
Bacci, G., Bacci, G., Ingo´lfsdo´ttir, A., Legay, A., Mardare, R. (eds.) Models, Algo-
rithms, Logics and Tools. LNCS, vol. 10460, pp. 125–145. Springer, Cham (2017).
https://doi.org/10.1007/978-3-319-63121-9 7
24. Steﬀen, B., Jasper, M.: Generating hard benchmark problems for weak bisimula-
tion. LNCS. Springer (2019, to appear)
RERS 2019: Combining Synthesis with Real-World Models 115
25. Wonisch, D., Wehrheim, H.: Predicate analysis with block-abstraction memoiza-
tion. In: Aoki, T., Taguchi, K. (eds.) ICFEM 2012. LNCS, vol. 7635, pp. 332–347.
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34281-3 24
26. Yang, N., et al.: Improving model inference in industry by combining active and
passive learning. In: IEEE 26th International Conference on Software Analysis,
Evolution, and Reengineering (SANER) (2019, to appear)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
