Abstract. The goal of System Level Formal Verification is to show system correctness notwithstanding uncontrollable events (disturbances), as for example faults, variations in system parameters, external inputs, etc. This may be achieved with an exhaustive Hardware In the Loop Simulation based approach, by considering all relevant scenarios in the System Under Verification (SUV) operational environment.
Contents

Introduction
A Cyber-Physical System (CPS) consists of interconnected hardware and software subsystems. As a result, the state of a CPS consists of continuous (e.g., stemming from analog devices) as well as discrete (e.g., stemming from software or digital devices) components. Accordingly, CPSs are typically modelled as Hybrid Systems (see, e.g., [2] and citations thereof). System Level Verification of CPSs has the goal of verifying that the whole (i.e., software + hardware) system meets the given specifications. Hardware In the Loop Simulation (HILS) is the main workhorse for system level verification and is supported by Model Based Design tools like Simulink (http://www. mathworks.com), Modelica (https://www.modelica.org) and VisSim (http://www.vissim.com). In HILS, the CPS software components read/send values from/to mathematical models (simulation) of the CPS physical subsystems (e.g., engines, analog circuits, etc.) they interact with. This allows designers to simulate the whole CPS behaviour on a given simulation scenario (i.e., a sequence of exogenous stimuli, such as faults, to the system).
Of course, in order to rule out the presence of design errors, one would like to consider all possible simulation scenarios, thereby aiming for System Level Formal Verification (SLFV). Since CPSs can be modelled as hybrid systems, one may think of using model checkers for hybrid systems in order to address SLFV for CPSs. Unfortunately, no model checker for hybrid systems can handle SLFV of actual CPSs. For this reason, currently HILS is basically the only approach used to carry out system level verification of CPSs.
Motivations
System Level Formal Verification (SLFV) is an exhaustive HILS, where all relevant simulation scenarios are considered. In [3, 4, 5, 6 ] a methodology has been presented which allows exhaustive HILS. Such a methodology works as follows.
First, we note that the CPS to be verified (the System Under Verification (SUV)) can be regarded as a hybrid system whose inputs belong to a finite set of uncontrollable events (disturbances), which model failures in sensors or actuators, variations in the system parameters, etc.
Second, the SUV is a deterministic system (the typical case for control systems). Nondeterministic behaviours (such as faults) are modelled with disturbances.
Third, sequences of inputs to the SUV are of bounded length, thus the problem addressed is indeed bounded SLFV. Accordingly, a simulation scenario is a finite sequence of disturbances.
From the above, it follows that a system (namely, our SUV) is expected to withstand all disturbance sequences that may arise in its operational environment. Correctness of a system (defined in terms of safety properties) is thus defined with respect to such admissible disturbance sequences.
Given a high-level model defining the admissible disturbance sequences (disturbance model), the approach in [3, 4, 5, 6 ]:
1. Generates the entire set of admissible disturbance sequences from the disturbance model; 2. Evenly splits such a set into k > 0 slices in order to enable parallel verification; 3. Computes (in parallel) an optimised simulation campaign from each slice; 4. Executes (in parallel) the generated simulation campaigns on a set of k independent simulators (e.g., Simulink or Modelica instances).
A simulation campaign is a sequence of simulation instructions, which exploit the capabilities of modern simulators to save and restore previously stored simulation states (much as in explicit model checking). In particular, a simulation campaign consists of the following commands: save a simulation state, restore a saved simulation state, inject a disturbance, advance the simulation for a given time length.
As soon as one of the simulators (running the simulation campaign corresponding to a given slice) finds an error, the whole parallel simulation activity stops, and the disturbance trace which triggered the error is returned as a counterexample. Also, as the generated optimised simulation campaigns (one per slice) randomise the verification order of the traces in the input slice, at anytime during the parallel simulation activity it is possible to compute an upper bound to the Omission Probability (OP), i.e., the probability that an error exists, but no error has been found so far, and to give a quite accurate estimation of the completion time.
Algorithms for all the activities described above have been presented in the cited papers. However, an off-the-shelf tool to effectively support companies working in the CPS business in their everyday SUV verification activities is not available. To provide such a tool is exactly the purpose of this paper.
Main Contributions
We present SyLVaaS (see Figure 1 ), a Web-based service taking as input a disturbance model and effectively computing the set of simulation campaigns to be used for a parallel HILS based SLFV. The main features of SyLVaaS can be summarised as follows.
Protection of the SUV Intellectual Property (IP)
HILS based SLFV takes as input three main artefacts: the SUV model, the definition of the property to be verified and the definition of the SUV operational environment (in terms of a disturbance model). In an industry setting, both the SUV model and the property to be verified are subject to IP protection, as they represent the main assets of the company designing the CPS (hence the SUV model). On the other hand, the definition of the operational environment does require IP protection, as it encodes the uncontrollable inputs (exogenous stimuli) to the SUV.
SyLVaaS introduces the new Verification as a Service (VaaS) paradigm, allowing verification engineers (SyLVaaS users) to use an external (Web) service (SyLVaaS) to compute the simulation campaigns needed for their HILS based SLFV activities, by fully protecting IP of their models. In particular, SyLVaaS does not require the SUV model nor the property to be verified, and takes as input only a disturbance model, defined as a CMurphi [7] model. Also, disturbances in the disturbance model are defined in a way fully decoupled from the SUV model, e.g., by means of integers. The actual verification activity is performed in parallel at the user premises (e.g., on a private cluster) running an arbitrarily large set of Simulink simulators, using the optimised simulation campaigns computed by SyLVaaS and plugging-in a Simulink driver downloadable from the SyLVaaS Web site. Before using such a driver, the user must define a suitable correspondence between the opaque values defining disturbances in the disturbance model (e.g., integers) and actual assignments to parameters of the SUV. This further contributes to protect IP of the SUV model, as also such correspondence is kept private.
Protection of the Verification Flow IP
SyLVaaS also protects the user verification flow IP to outsiders. In fact, in case an error is found during the verification activity (at the user premises), a counterexample is generated. Such a counterexample can then be used to revise the SUV, thereby producing a new SUV model. At this point, a new SLFV activity can start. Given that the set of admissible operational scenarios (hence: the disturbance model) has not changed, there is no need to interact with SyLVaaS again, as the previously computed simulation campaigns can be reused.
Fast Response Time via Parallel Computation
The operational scenario generation algorithm in [3] is a sequential algorithm, taking about half an hour (on a desktop PC) to generate a few millions of simulation scenarios. Although this time is negligible with respect to the whole HILS based SLFV activity (which can take weeks of computation), it becomes a major bottleneck in a VaaS context, as the one provided by SyLVaaS, since scenario generation is the most intensive part of the computation carried out on the SyLVaaS side (i.e., generation of optimised simulation campaigns for parallel HILS, see Figure 1 , right).
In order to achieve a fast response time in SyLVaaS, in this paper we present a new parallel algorithm for the generation of operational scenarios from a disturbance model, and discuss its distributed multicore implementation explicitly designed to operate efficiently on a cluster of possibly heterogeneous machines.
Our new operational scenario generation algorithm consists of an Orchestrator process which governs the exploration of the (state space of the finite state automaton defined by the) disturbance model provided by the user, splitting and delegating the work to a battery of available Slaves, whose work load is dynamically balanced. Slave processes are independent from each other and communicate only with the Orchestrator. This minimises coordination overhead.
Experimental Evaluation
We present experimental results on using our parallel algorithm on case studies consisting of disturbance models for two SUVs (namely, the Inverted Pendulum on a Cart (IPC) and the Fuel Control System (FCS) in the Simulink distribution) entailing a number of operational scenarios up to 35 641 501.
Our results show that our new parallel algorithm for operational scenario generation scales well with the number of Slaves. As the operational scenario generation is the most computationally intensive task in the SyLVaaS workflow, and given that the other steps performed by SyLVaaS (computation of optimised simulation campaigns) already exploit an embarrassingly parallel algorithm (i.e., an algorithm with no communication among the processes) from [4] , with our new parallel disturbance trace generator the entire SyLVaaS workflow can benefit of a cluster of machines at the SyLVaaS cloud infrastructure.
Background
In this section we give some background notions. Unless otherwise stated, all definitions are based on [8] and [3, 4, 5, 6 ] to which we refer the reader for more in-depth details.
In the following, we denote with R, R ≥0 , R + and N + the sets of, respectively, all real, non-negative real, strictly positive real, and strictly positive natural numbers, and with Bool = {0, 1} the set of Boolean values (where 0 means 'false' and 1 means 'true').
Modelling the SUV
A System Under Verification (SUV) is modelled as a Discrete Event System (DES), namely a continuous time Input-State-Output deterministic dynamical system [8] whose inputs are discrete event sequences. A discrete event sequence is a function u(t) associating to each (continuous) time instant t ∈ R + a disturbance event (or, simply, disturbance). Disturbances, encoded by integers in the interval [0, d] (for a given d ∈ N + ), represent uncontrollable events (e.g., faults). We use event 0 to represent the event carrying no disturbance. As no system can withstand an infinite number of disturbances within a finite time, we require that, in any time interval of finite length, a discrete event sequence u(t) differs from 0 only in a finite number of time points (Figure 2a ).
Modelling the Property to be Verified
The property to be verified is modelled as a continuous time monitor embedded in the SUV (see Figure 2b) , which observes the state of the system and checks whether the property under verification is satisfied. The output of the monitor (see Figure 2c ) is 0 as long as the property under verification is satisfied and becomes and stays 1 (sustain) as soon as the property fails, thus ensuring that we never miss a property failure report, even when sampling the monitor output only at discrete time points. The use of monitors gives us a flexible approach to model the property to be verified. In particular, it is easy to model bounded safety and bounded liveness properties as monitors. 
Modelling the SUV Operational Environment
System level verification follows an assume-guarantee approach aimed at showing that the SUV meets its specification (guarantee) as long as the SUV operational environment behaves as expected (assume).
As we focus on bounded system level verification, we model (Definition 2.2) the SUV operational environment as the sequence of disturbances our SUV is expected to withstand within a finite time horizon. We also bound the time quantum between two consecutive disturbances.
As it is typically infeasible to define the SUV operational environment by explicitly listing all the admissible disturbance traces, we define it by means of a disturbance model, which is in turn defined as the language accepted by a suitable automaton, called Disturbance Generator (DG) (see Definition 2.1 and Figure 3a • Z is a finite set of states;
• Z I ⊆ Z and Z F ⊆ Z are the set of, respectively, initial and final states;
• d ∈ N + defines the set of disturbance events represented (without loss of generality) with integers in [0, d], where value 0 represents the event carrying no disturbance; Formally, it is a sequence 
We denote with δ(j) the j-th disturbance occurring in trace δ (0 ≤ j < h).
Given τ ∈ R + (time quantum), to a disturbance trace δ for D we can univocally associate a discrete event sequence u τ δ , defined as follows:
. Thus a disturbance trace δ defines an operational scenario (namely, u τ δ ) for our SUV. Figure 3d shows the discrete event sequence associated to a disturbance trace. We represent our SUV operational environment as a finite set of disturbance traces
} (for a given τ ∈ R + ) defines the operational scenarios our SUV should withstand. Note that, by taking h large enough (as in Bounded Model Checking (BMC)) and τ small enough (to faithfully model our SUV operational scenarios), we can achieve any desired precision. On such considerations rests the effectiveness of the approach.
System Level Formal Verification
Definition 2.3 formalises our bounded System Level Formal Verification problem.
where: H is a DES with an embedded monitor modelling our SUV, D is a DG modelling a set of disturbance traces ∆ over horizon h ∈ N + , and τ ∈ R + is a time quantum.
The answer to SLFV problem is FAIL if there exists a disturbance trace δ in ∆ such that the SUV monitor output at time τ h is 1, when H is given u τ δ (the discrete event sequence associated to δ given time quantum τ ) as input, and PASS otherwise. In case of FAIL , the disturbance trace raising the error is returned as a counterexample.
Note that, notwithstanding the fact that the number of states of our SUV is infinite and we are in a continuous time setting, to answer a SLFV problem we only need to check a finite number of disturbance traces. This is because we are bounding: (a) our time horizon to T = τ h, and (b) the set of time points at which disturbances can take place, by taking τ as the time quantum among disturbance events.
Parallel HILS Based Anytime Random Exhaustive SLFV
We follow a black-box parallel approach to SLFV, where the DES H defining our SUV (plus the property to be verified) is defined using the modelling language of a suitable simulator (namely, MatLab and Stateflow for Simulink). We compute the answer to a SLFV problem (H, D, τ, h) by simulating each disturbance trace δ in the operational environment ∆, thus performing an exhaustive (with respect to ∆) Hardware In the Loop Simulation (HILS).
In order to enable parallel simulation over k ∈ N + machines available in the (private) user cluster, we evenly partition the sequence of disturbance traces ∆ into k ∈ N + sequences of disturbance traces ∆ 0 , . . . , ∆ k−1 . We then use such k slices to compute, in parallel on the SyLVaaS cluster, k highly optimised simulation campaigns, which can be executed in parallel using k independent simulators, each one running (on a different core of the user cluster) a model for H. The answer to the SLFV problem is FAIL if one of the simulation campaigns raises the simulator output function to 1 (in this case the disturbance trace δ which raised the error is returned as a counterexample). The answer is PASS otherwise.
Each simulator accepts four basic commands: store, load, free, run. Command store(l) stores in memory the current state of the simulator and labels with l such a state. Command load(l) loads into the simulator the stored state labelled with l. Command free(l) removes from the memory the state labelled with l. Command run(e, t) (with e ∈ [0, d] and t ∈ R + ) injects disturbance e and then advances the simulation of time t. A simulation campaign is thus a sequence of simulator commands.
Using commands store and load we can avoid revisiting simulation states (much as in explicit model checking). Using command free we can remove from the memory states that will never be needed in the remaining part of the simulation campaign. This is important, since each state may require many KB of memory (150-300 KB in the case studies presented in this paper).
Also, as each computed simulation campaign verifies the disturbance traces in the input slice in a random order, it is possible to compute at anytime during the simulation process (along the lines of [6] ), an estimation of the simulation completion time and an upper bound to the Omission Probability (OP), i.e., the probability that there is a yet-to-be-simulated disturbance trace which violates the property under verification. This information enables the verification engineer to evaluate if it is worth to continue the simulation activity, or instead stop it since the degree of assurance attained can be considered adequate for the application at hand (graceful degradation).
System Level Formal Verification as a Service
In this section we describe SyLVaaS in terms of input and output, and describe how to use the system output.
Input
SyLVaaS requires two inputs:
1. An integer k > 0 describing the number of computational cores available on the user side for parallel execution of simulation campaigns (hence, for parallel verification);
2. A disturbance model defining the operational environment, i.e., the set of disturbance traces the System Under Verification (SUV) should withstand, along with a bounded horizon h.
As it is typically infeasible for a verification engineer to define a SUV operational environment by explicitly listing all its disturbance traces, SyLVaaS takes as input a disturbance model defining a Disturbance Generator (DG) written in the high-level language accepted by the CMurphi [7] model checker. The following example clarifies this point.
Example 3.1. Assume that a SyLVaaS user wants to verify a SUV with two sensors, A and B, which may fail (without repair) at times multiple of 1 second. Fault of any sensor might occur only if the other one did not fail, or failed more than 2 seconds before. The CMurphi description for the DG modelling such operational environment is shown in Figure 4 , where the verification time horizon is 7 seconds.
Output
From the value of k and the input disturbance model, SyLVaaS produces k simulation campaigns, which can be executed in parallel on the user premises over k independent simulators, in an embarrassing parallel fashion (i.e., with no communication among processes).
Each simulation campaign verifies, in a highly optimised way, a disjoint and equally-sized portion of the disturbance traces entailed by the input disturbance model. Conversely, all disturbance traces entailed by the disturbance model are covered by exactly one simulation campaign. This guarantees that the System Level Formal Verification (SLFV) process is both exhaustive (with respect to the set of disturbance traces entailed by the disturbance model) and non-redundant. Also, the verification order of the disturbance traces covered by each simulation campaign is randomised. This, according to [6] , enables the computation of an upper bound to the Omission Probability (OP) at anytime during the parallel simulation.
The k simulation campaigns are returned to the user via the Web interface, together with an abstract Simulink driver. Such a driver is a MatLab script that reads and executes a SyLVaaS-generated simulation campaign, by sending simulation commands to Simulink. It is "abstract" as it must be plugged into the SUV Simulink model and configured at the user premises (see Figure 1 and Section 3.4).
Web Interface
The Web interface of SyLVaaS is hosted at http://mclab.di.uniroma1.it/sylvaas. It consists of four main pages:
1. A standard login page.
2. A user console page (accessible after login, see Figure 5 ) showing all current, pending, running and completed user jobs. For each job, the console shows the job unique id, the corresponding input and its the status (pending, running, completed or deleted). By selecting a job id, it is possible to see and download the corresponding input. If the job is completed, it is also possible to download the final k simulation campaigns.
3. A page to create a new job request, where the user must fill a form with the required input: the disturbance model, the horizon for the disturbance model and the number of computational cores available on the user side for parallel execution of simulation campaigns (see Section 3.1). When a job is completed, the user is warned by an email. He can then proceed to the download of the simulation campaigns.
How to Use SyLVaaS Output
Given the output downloaded by SyLVaaS, the verification engineer, in order to actually verify the SUV via exhaustive Hardware In the Loop Simulation (HILS), customises and plugs the abstract Simulink driver into the SUV Simulink model. This task is very easy and consists in properly filling the template files received by SyLVaaS as part of the abstract driver. Such files define: the SUV model, the SUV property to be verified (as a monitor module), the interface between the driver and the SUV, and the mapping between each disturbance (in the CMurphi disturbance model) and its counterpart in the SUV model.
At this point, the k downloaded simulation campaigns can be executed in parallel on k independent simulators. Given the randomisation of the verification order of the disturbance traces within each simulation campaign, at anytime during the simulation process, when ratios done 1 , done 2 , . . . , done k (with done i ∈ [0, 1] for all i) of the traces covered by each simulation campaign have been verified successfully (i.e., no error has been raised so far), the Omission Probability (OP), i.e., the probability that a future simulation command raises an error, is upper bounded by: 1 − min i∈ [1,k] (done i ) (see [6] 
Parallel Generation of Disturbance Traces
As reported in [3] , the most computationally intensive step of the workflow for the computation of simulation campaigns is disturbance trace generation starting from the user disturbance model. This task is performed in [3] using a modified version of the CMurphi model checker. As reported there, on a disturbance model entailing about 4 million traces (the same referred to as D 1 FCS in Section 5), trace generation takes about 30 minutes, while the subsequent step (i.e., computation of simulation campaigns) takes about 1 minute, as it can be massively parallelised [5] . The time to generate disturbance traces is anyway negligible if we consider also the time to carry out (in parallel) the actual simulation, which may take days.
However, in a Verification as a Service (VaaS) context as that of SyLVaaS, the simulation campaigns are actually executed at the user premises, and disturbance trace generation from the user disturbance model would become the most time-dominant step in the SyLVaaS workflow.
To this end, to achieve fast response time in SyLVaaS, here we present a new parallel algorithm for distributed trace generation. As a result, with this new algorithm the whole SyLVaaS workflow (i.e., generation of disturbance traces and computation of optimised simulation campaigns) can now take benefit from the availability of a cluster in the SyLVaaS cloud infrastructure (see Figure 6 ).
Algorithm Overview
Our new parallel algorithm for trace generation has been explicitly designed to operate efficiently on a cluster of possibly heterogeneous machines, and consists on a single Orchestrator process and a number S ∈ N + of Slaves. The Orchestrator governs the exploration of the state space of the Disturbance Generator (DG) defined by the disturbance model provided by the user, splitting and delegating the work to the Slaves. To avoid communication as well as data structures shared among the Slaves, the DG state space is regarded as a set of trees, one for each DG initial state. This does not pose any termination problem, as we are looking for disturbance traces of bounded length h.
The Orchestrator performs a Depth-First Search (DFS) up to bounded level (depth) L < h and delegates the exploration of the subtrees rooted at each node at depth L to an idle slave, see Figure 7 . The exploration of each subtree by a Slave s ∈ [1, S] is again carried out by DFS, and is called a computation bunch. Each computation bunch b executed from Slave s gives as output a sequence of traces which is appended to the sequence of traces ∆ s generated by s. Sequence ∆ s contains a subset The simplicity of the algorithm minimises network communication and coordination among processes. In particular, Slave processes are independent from each other and communicate only with the Orchestrator.
Distributed Trace Labelling
Both the Orchestrator and the Slaves work in DFS mode, and hence each computation bunch produces a sequence of disturbance traces in lexicographic order. Each disturbance trace prefix identifies a simulator state, and we associate a unique label to all prefixes of disturbance traces (Definition 4.1). 
As a consequence of Definition 4.1, prefixes of disturbance sequences (d 0 , . . . ,d p−1 ) common to multiple disturbance traces are followed by the same labell p = λ(d 0 , . . . ,d p−1 ). Labels identifying prefixes common to multiple disturbance traces are essential in the efficient computation of highly optimised simulation campaigns, as they represent the only simulator states which might be worth storing, as they may be needed later (see, for more details, the optimiser in [4] ). Note that, given that both the Orchestrator and the Slaves run in DFS mode, disturbance traces can be labelled at no additional computational cost during generation. In particular, the Orchestrator labels trace prefixes up to level L, while Slaves label trace prefixes longer than L.
Our parallel algorithm uses the following labelling schema, which results in an overall injective map λ for disturbance prefix labels while avoiding communication among the processes. Let S ∈ N + be the number of available Slaves. We set Λ = N + . The Orchestrator associates, to each new disturbance prefix, a label extracted from the set {l | l ∈ N + , l = j(S + 1) + 1, j ≥ 0}, according to their natural order. Analogously, each Slave s ∈ [1, S] associates, to each new disturbance prefix, a label extracted from the set {l | l ∈ N + , l = j(S + 1) + s + 1, j ≥ 0}, So, for example, if S = 2, the Orchestrator uses labels from set {1, 4, 7, . . .}, Slave 1 uses labels from {2, 5, 8, . . .}, and Slave 2 uses labels from {3, 6, 9, . . .}. Note that, as these sets of labels are disjoint, the resulting overall map is injective.
Orchestrator
The Orchestrator process, whose pseudocode is shown as Algorithm 1, governs the exploration of the DG state space, by performing a DFS up to a bounded depth (level) 1 ≤ L ≤ h − 1 (whose initial value is given as a parameter), also assigning unique labels (see variable λ) to disturbance trace prefixes. When level L is reached, the Orchestrator delegates the exploration of the subtree rooted at the current state to an idle Slave, forwarding to it the (labelled) prefix (containing exactly L disturbances) of the disturbance trace computed so far. Each such delegated task (computation bunch) is assigned a sequential id (see variable b). As the exploration is done by the Orchestrator using DFS, the disturbance sequences passed to the Slaves are generated in lexicographic order.
In order to keep a high efficiency of the whole parallel process, the value of L is dynamically and adaptively adjusted by the Orchestrator during exploration, depending on how much the Orchestrator is waiting to find an idle slave. Let w be the time the Orchestrator had to wait, in the last iteration, before finding an idle Slave, and let t be the overall time spent by the Orchestrator in the last iteration. If w t > maxW (value of maxW is given as a parameter), the Orchestrator increases value of L by one. This means that, from now on, the Orchestrator will perform DFS one level deeper and will delegate to the Slaves smaller computation bunches (i.e., the exploration of smaller subtrees), as it had evidence that Slaves are overloaded. Conversely, if w t < minW (value of minW is given as a parameter), the Orchestrator decreases value of L by one, hence starts delegating to the Slaves larger computation bunches (i.e., the exploration of larger subtrees), as it has evidence that Slaves are, on average, underloaded.
Together with the fact that the faster Slaves will, on average, execute a higher number of computation bunches than slower Slaves, the above described dynamic and adaptive adjustment of value L provides a simple yet very effective load balancing mechanism among the Orchestrator and the Slaves, which avoids any communication overhead: the communication among the processes is minimal and consists only of the set of one-way messages that the Orchestrator sends to the Slaves to delegate computation bunches to them.
Slaves
Slave processes follow Algorithm 2. Each Slave waits for an Orchestrator request to perform a computation bunch. Each such request consists in tuple (z 0 , b, δ λ | l L ), where z 0 ∈ Z I is one of the initial states of D, b is the computation bunch id, and δ λ | l L is a labelled prefix of disturbance traces (containing L disturbances), as computed by the Orchestrator.
Upon reception of (z 0 , b, δ λ | l L ), a Slave s ∈ [1, S]: (i) reaches the root of the subtree which is in charge to explore by following δ λ | l L , (ii) starts its own DFS from there, hence limiting its attention to that subtree.
Admissible (complete) disturbance traces found (which have δ λ | l L as a prefix) are appended to the output file ∆ s of Slave s and annotated with the id b of the current computation bunch. During ex-ploration, each Slave also carries out trace labelling using its own (disjoint) set of labels (see variable λ). 
Algorithm Correctness
Proof:
To prove (a), let us temporarily ignore the dynamic and adaptive adjustment of value L (lines 28-30 of Algorithm 1). In this case, the Orchestrator expands the computation path tree of DG D using a standard Depth-First Search (DFS) approach up to (fixed) depth level L (storing in stack the search frontier). From level L the DFS expansion of each subtree (and the relevant frontier) is delegated to a Slave. As L < h, every disturbance trace in ∆ (which is an admissible sequence of h disturbances) is generated by exactly one Slave. Thus, (∆ 1 , . . . , ∆ S ) form a partition of ∆. This fact is preserved under the dynamic and adaptive adjustment of value L (between 1 and h − 1). To see why, assume that at some iteration, the Orchestrator pops-out from stack record (z,d, j). If, at line 28, adjL = 0, we must have that adm(z,d) is true and j + 1 = L, i.e., the Orchestrator has just delegated the expansion of the subtree rooted atẑ = dist(z,d) to a Slave. Also, note that if adjL = 0, then adjL = ±1.
When adjL > 0 (and, hence, 1), L is incremented by one. This does not have any impact on the completeness of the algorithm: at the next iteration of the algorithm, when another record is popped out from the stack, the Orchestrator will simply go one more level deeper in the tree before delegating subtrees to the slaves.
On the other hand, decrementing L could in principle make a disturbance trace being generated in two different computation bunches. However, when adjL < 0 (and, hence, −1), the Orchestrator decrements L by one (i.e., sets it to value j) only ifd = d (see line 28 of Algorithm 1), i.e., only if the Orchestrator has processed the last disturbance possibly applicable to state z. This implies that all records in stack will be of the form (z , d , j ), such that if adm(z , d ), then dist(z , d ) (the state reached by applying disturbance d to z ) is different from and not an ancestor nor a descendant of z. This impedes that two subtrees whose exploration is delegated to the Slaves have a common disturbance sequence.
Proof of (b) follows directly from the observation that both the Orchestrator and the Slaves apply, to each state, disturbances in lexicographic order, as both algorithms push them into the stack in reverse lexicographic order.
Proof of (c) follows from the previous point and from the observation that sequence of values of the Orchestrator variable b (holding the computation bunch ids) is monotonically increasing. Theorem 4.2 shows that, from ∆ 1 , . . . , ∆ S , we can easily produce k ∈ N + lexicographically ordered slices (slice 1 , . . . , slice k ) of the same length (where k ∈ N + is the number of parallel cores available at the user side for parallel simulation), as required by [5] .
Once the k slices have been produced, they are independently given to k instances of the optimiser of [3] , which are responsible to generate k output simulation campaigns for them, also randomising the trace verification order, along the lines of [6] . This enables Omission Probability (OP) computation at anytime during the simulation activity at the user premises (see Section 2) as well as completion time estimation. As already shown in [5] , the generation of the k simulation campaigns can be scheduled on all the cores available to SyLVaaS in an embarrassingly parallel fashion. 
// start DFS from there
while stack is not empty do
Algorithm 2: Slave.
Experiments
In this section we experimentally evaluate SyLVaaS, and in particular our new parallel disturbance generation algorithm of Section 4 and the cloud deployment of the overall Verification as a Service (VaaS) infrastructure.
SyLVaaS Experimental Deployment
We deployed SyLVaaS on a cluster of Linux heterogeneous machines, whose configurations are shown in Table 1 . We used a maximum number of 89 CPU cores (7 out of the 8 available cores for machines of categories A and B, 15 out of the 16 available cores for machines of category C, and 1 out of the 2 available cores for the machine of category Z). The single Orchestrator process was always run on the single used core of the single machine of category Z. The SyLVaaS web interface application resides on a yet another host (a tiny virtual machine), external to the cluster and directly connected to the Internet.
Case Studies
We experiment with case studies consisting of disturbance models related to the System Level Formal Verification (SLFV) of two system models included in the Simulink distribution, namely the Inverted Pendulum on a Cart (IPC) and the Fuel Control System (FCS). For each system model, we define two disturbance models, whose properties are summarised in Table 2 .
Inverted Pendulum on a Cart (IPC)
The IPC is a control loop system where the controlled system is an inverted pendulum installed on a cart (see Figure 8 )
The IPC controller (actually a control software) senses the angular position θ of the pendulum, and computes the force F to be applied to the cart to move it left or right along the x axis. The goal is to keep the pendulum in its upright (vertical) unstable position. The physical constraint between the cart and the pendulum gives that both the cart and the pendulum have one degree of freedom each (x and θ, respectively).
The controlled system consists of the cart and the pendulum, whereas the controller consists of the control software computing F from the plant outputs (x and θ). Accordingly, our overall System Under Verification (SUV) model consists of the controlled system and the controller, whose Simulink block diagram is shown in the upper box of Figure 9 . Overall, the Simulink block diagram consists of 52 blocks.
The system level property that we verify is that after 2 seconds the pendulum is in upright position, i.e., angle θ is always between [−0.1, 0.1]. The monitor checking for this property is shown in the lower box of Figure 9 .
We introduce disturbances by injecting irregularities in the cart rail. We model such irregularities with a modification on the cart weight m with respect to its nominal value of 0.455 kg. For this, we define three disturbances representing normal rail operation (m = 0.455 kg), abnormal rail operation (m = 1.455 kg), and stressed rail operation (m = 2.455 kg). Table 2 . A more detailed description of such models is not relevant for the evaluation of our experiments below. We only point out that defining such disturbance models and encoding them in the language offered by CMurphi (and taken as input by SyLVaaS) would take about 1 or 2 days of an average verification engineer with some knowledge in formal methods.
Fuel Control System (FCS)
The FCS is a controller for a fault tolerant gasoline engine, which has also been used as a case study in [9, 10, 11, 12, 3, 5] ).
The FCS has four sensors: throttle angle, speed, EGO (measuring the residual oxygen present in the exhaust gas) and MAP (manifold absolute pressure). The goal of the control system is to maintain the air-fuel ratio (the ratio between the air mass flow rate pumped from the intake manifold and the fuel mass flow rate injected at the valves) close to the stoichiometric ratio of 14.6, which represents a good compromise between power, fuel economy, and emissions.
From the sensor measurements, the FCS estimates the mixture ratio and provides feedback to the closed-loop control, yielding an increase or a decrease of the fuel rate.
The FCS sensors are subject to faults (disturbances), and the whole control system can tolerate single sensor faults. In particular, if a sensor fault is detected, the FCS changes its control law by operating the engine with a higher fuel rate to compensate. In case two or more sensors fail, the FCS shuts down the engine, as the air-fuel ratio cannot be controlled.
The control logic of the FCS is implemented by six automata, each one with a number of states ranging from two to five. The signal flow is further subdivided into three subsystems, which exhibit several different Simulink block types, involving arithmetic, lookup tables, integrators, filters and interpolation [13] . Overall, the Simulink block diagram consists of 246 blocks.
We verify one of the system level specifications for such a model, namely: the fuel_air model variable is never 0 for more than one second. Accordingly, our SUV consists of the Simulink FCS model along with a monitor for the property under verification (such a model is shown as Figure 10 ). Table 2 . A more detailed description of such models is not relevant for the evaluation of our experiments below and can be found in [3] . Again, we point out that defining such disturbance models and encoding them in the language offered by CMurphi (and taken as input by SyLVaaS) would take about 1 or 2 days of an average verification engineer with some knowledge in formal methods. 
Experimental Results
In this section we outline our experimental results on the four disturbance models presented in Section 5.2. Table 4 shows the time needed by SyLVaaS to generate the disturbance traces entailed by our four disturbance models, when using a varying number S of parallel Slaves.
Parallel Disturbance Trace Generation
As specified in Section 4.3, the Orchestrator algorithm requires in input the following items:
• The disturbance model D and the horizon h. We use all four disturbance models listed in Table 2 , with the corresponding horizons.
• The starting value for the level (depth) L to which the Orchestrator bounds its search and triggers a Slave. We set this value to h 2 after preliminary experiments.
• The number of Slaves S. We set this value so that our cluster is used to the 33%, 66% and 100% of its total available number of cores. To neutralise biases due to the heterogeneity of our cluster machines, we kept fixed the ratios between the different types of cores listed in Table 3 : Allocation of the Slaves among our cluster computational cores.
chosen allocation of the Slaves on the available cores for the various experimental deployments is shown in Table 3 .
• minW , maxW as the minimum and maximum percentage of wall-clock time to be spent waiting for a Slave. After preliminary experiments, we set these values to 1% and 60% respectively.
In order evaluate the scalability of our parallel disturbance trace generation algorithm with the number S of Slaves, we have also run the algorithm with only one Slave (sequential algorithm, S = 1). In order to neutralise biases due to the heterogeneity of our cluster machines, we have performed 3 runs of the sequential algorithm, with the single Slave running on a core of a machine of category A, B and C. We then computed the sequential time as the weighted average of these three running times, where the weights are the ratios of the number of cores available for the execution of a Slave on machines of each category. Namely:
where ratio(c) is the ratio of the overall cores of category c ∈ {A, B, C} available for Slaves execution on machines of category c (i.e., 14/88, 14/88 and 60/88 for categories A, B and C respectively -remember that we use up to n − 1 cores on a machine with n cores), and seq_time(c) is the time of our sequential algorithm in which the single Slave was run on a core of a machine of category c (the Orchestrator always runs on the single available core of category Z).
For each disturbance model and each value for S, Table 4 reports the overall time for generating the whole set of disturbance traces (columns "time"), the number of computation bunches executed by the algorithm as well as speedup and efficiency with respect to the execution time of the sequential algorithm (the rows in Table 4 referring to S = 1).
As usual in the evaluation of parallel algorithms, for each value of S, the speedup is defined as t 1 /t S , where t 1 and t S are, respectively, the execution times of our disturbance trace generation algorithm when using 1 (sequential algorithm) and S parallel Slaves. For each value of S, the efficiency is computed as the ratio between the speedup and S.
As a result, efficiency is never below 75%, and it is often above 80%, showing that our parallel disturbance trace generation algorithm scales well with the number of available Slaves. The observed lack of efficiency, mostly due to network delays, is typical in a cluster setting. To this end, we note that high-performance parallel simulation typically has efficiency values in the range 40%-80% (see, e.g., [14] ). Accordingly, an efficiency of about 75%-80% is to be considered state-of-the-art.
Finally, Figure 11 shows how value of L (delegation level, i.e., the depth of the computation path tree at which the Orchestrator delegates the exploration to an available Slave) evolves for our case studies. Namely, such plots show how L/h (on the y-axis) varies as a function of the completion time percentage (on the x-axis), for each of the possible values of S. As a result, we have that L, in our case studies, tends to decrease as the completion time increases. This is due to the fact that, on average, our disturbance models are "left-unbalanced", in that admissible disturbance traces lie more frequently in the left part of the computation path tree.
Hence, in our case studies, the average time spent by a Slave in completing a computation bunch decreases during time due to pruning. To this end we remind that the exploration is done in lexicographic order, as this simplifies trace labelling and the forthcoming computation of optimised simulation campaigns, see [5] .
Of course, the actual time evolution of value L strongly depends on the structure of the disturbance model at hand. What is important here is that the Orchestrator effectively mitigates any bias during exploration by quickly reacting to any observed unbalanced workload among Slaves. Table 5 reports the overall SyLVaaS response time (summing up trace generation, splitting, and simulation campaign optimisation times, column "overall time"), for each disturbance model and each value for k. Results in Table 5 have been obtained using S = 88 Slaves during trace generation and 89 cores to compute the k simulation campaigns (thus, on average, each core computed k/89 campaigns).
SyLVaaS Complete Workflow
Download of Simulation Campaigns
SyLVaaS stores simulation campaigns computed as above in .zip archives which are then downloaded by the user. In our experiments, the size of such files is up to the order of a few Gigabytes. Hence, their download into the user cluster can be done seamlessly over a standard broad-band Internet connection. Table 5 : SyLVaaS time of the entire workflow.
Related Work
The papers closest to ours are [3, 4, 5, 6] , where the algorithms underlying SyLVaaS workflow are presented. The work in [3, 4, 5, 6 ] presents a parallel approach to System Level Formal Verification (SLFV) for Cyber-Physical Systems (CPSs) (i.e., for the class of hybrid systems handled by a simulator like Simulink). This is done by effectively decoupling the computation of the set of system runs (operational scenarios) to be exercised during the Hardware In the Loop Simulation (HILS) based SLFV from their actual simulation. In this paper we complement such results by focusing on parallelising the most intensive computation step within the SyLVaaS workflow, namely the generation of the set of all operational scenarios. System Verification as a Service (also known as Model Checking in the Cloud) is still in its infancy. In [15] , it is argued that ideas may be borrowed from workflow modelling, management and analysis of business processes. In [16] a Map-Reduce algorithm for verification of CTL formulas on a cloud system is proposed. Moreover, panels to discuss on how to set up a reliable Verification as a Service (VaaS) tool are ongoing in major conferences (see, e.g., recent proceedings of the International Conference on Formal Methods in Computer-Aided Design, FMCAD). However, none of such works propose an implemented and available tool, with the features described in Section 1.2. We also point out that "verification as a service" is sometimes used to refer to a consulting service, where a company rents formal verification experts to another company in order to carry out a certain verification activity. Of course this is not our meaning for VaaS and we note that such a consultant based approach does not provide the same level of Intellectual Property (IP) protection as our SyLVaaS based approach. Furthermore, our proposed approach is fully automatic and does not require formal verification experts. Thus, to the best of our knowledge, SyLVaaS is the first tool providing a genuine Verification as a Service (VaaS) approach.
HILS-based SLFV has been addressed in [17] . However the approach presented in [17] rests on closely coupling a simulator (SIMSAT) with a model checker (CMurphi, [7] ). Accordingly, such an approach cannot be directly used to develop the VaaS approach described here. Formal verification of Simulink models has been widely investigated, examples are in [18, 19, 20] . Such methods however focus on discrete time models (e.g., Stateflow or Simulink restricted to discrete time operators) with small domain variables. Therefore they are well suited to analyse critical subsystems, but cannot handle complex system level verification tasks (e.g., our case studies). This is indeed the motivation for the development of statistical model checking methods as those in [9, 10] , as well as for the exhaustive HILS based approach in [3] . Simulation based best-effort falsification methods able to handle any Simulink/Stateflow model have been investigated in [21, 22] . Annotated Stateflow models comprising both discrete and continuous variables can be analysed with simulation based tools like C2E2 [23] . We differ from C2E2 by providing a black-box approach that, furthermore, does not require model annotations.
Symbolic approaches (typically based on polyhedra or SMT solving) to hybrid system verification have also been widely investigated. Although they are not black-box approaches, for sake of completeness we provide a glimpse on some of the available tools in such a context. Timed automata (i.e., hybrid systems whose continuous variables have time derivative equal to 1) can be analysed with UPPAAL [24] . Linear hybrid automata (see, e.g., [25] ) can be analysed with HyTech [26] . Piecewise affine hybrid systems can be analysed with symbolic model checkers like PHAVer [27] and SpaceEx [28, 29, 30] . A symbolic model checker capable of handling nonlinear hybrid systems is presented in [31] . Currently, with respect to our proposed approach, the main limitations of symbolic approaches are: (i) they are not black-box, and (ii) they can handle only hybrid systems of moderate size (whereas our approach does not depend on the size of the system to be verified). Finally, within such a symbolic context, we note that, while we use automata to define the set of scenarios to be simulated, temporal logic could also be used to that end. An example is in [32] .
Random model checking is a formal verification approach closely related to our setting. A random model checker provides, at any time during the verification process, an upper bound to the Omission Probability (OP). Upon detection of an error, a random model checker stops returning a counterexample. Random model checking algorithms have been investigated, e.g., in [33, 34, 35, 36] . The main differences with respect to our approach are the following. (i) All random model checkers generate simulation scenarios using a sort of Monte-Carlo based random walk. As a result, unlike our algorithm, none of them is exhaustive (within a finite time horizon). (ii) Random model checkers (see, e.g., [34] ) assume availability of a lower bound to the probability of selecting (with a random-walk) an error trace. Of course, being exhaustive, we do not have any such assumption.
Probabilistic (see, e.g., [37, 38] ) and, more specifically, simulation-based statistical model checking approaches (see, e.g., [39, 40, 41, 33, 42, 10, 9, 43, 44, 45] ) are closely related to our work. In particular, [10] addresses statistical model checking of Simulink models and presents experimental results on the Simulink Fuel Control System we use here. The main differences between such approaches and ours are the following. (i) Probabilistic model checking is a white-box approach (a model is available), whereas we are in a black-box setting (only a simulator is available). Thus, only simulation-based statistical model checking approaches can be used in our context. (ii) Statistical model checking is not exhaustive (within a finite time horizon), whereas we are. (iii) Both probabilistic and statistical model checking use a stochastic model for the System Under Verification (SUV), whereas in our setting the SUV is deterministic and disturbances are nondeterministic (i.e., we are looking for the worst case scenario). (iv) None of the available simulation-based statistical model checking approaches addresses the problem of the op-timisation of the simulation campaigns, which is an essential step to make our parallel exhaustive HILS based model checking viable.
Synergies between simulation and formal methods have been also widely investigated in digital hardware verification. Examples are in [46, 47, 48, 49] and citations thereof. The main differences between the above approaches and ours are: (i) they focus on finite state systems whereas we focus on infinite state systems (namely, hybrid systems); (ii) they are white-box (requiring availability of the system model) whereas we are black-box. As for hybrid systems, synergies between explicit and symbolic model checking methods have been investigated in [50, 51, 52, 53, 54, 55, 56, 57] in the context of automatic synthesis of controllers for discrete time linear hybrid systems.
Parallel algorithms for explicit state exploration have been widely investigated. Examples are in [58, 59, 60, 61, 62, 63] . The main difference with our approach is that all the above works focus on parallelising the state space exploration engine by devising techniques to minimise locking of the visited state hash table. Conversely, we leave unchanged the state space exploration engine (the simulator in our context), split the set of simulation scenarios into equal size subsets to be simulated on different processors, and stop verification as soon as one of such processors finds an error, thereby enabling an embarrassing parallel approach.
Conclusions
We have presented SyLVaaS, a Web-based software-as-a-service tool for HILS-based System Level Formal Verification (SLFV). Such a tool allows verification engineers to obtain from a Web service the most important part of their HILS campaigns, i.e. a set of simulation campaigns to exercise the System Under Verification (SUV) on all the relevant operational scenarios (disturbance traces).
As the simulation campaigns are executed at the user premises, SyLVaaS provides full Intellectual Property (IP) protection for both the SUV model, the property to be verified, and the user verification flow. The simulation may be carried out in parallel on a user cluster whose machines have Simulink installed.
To achieve a short response time and increase the quality of service provided by SyLVaaS, we also proposed a new algorithm to parallelise the most computationally intensive part of the SyLVaaS workflow, i.e., the generation of disturbance traces. As the other step performed by SyLVaaS (computation of optimised simulation campaigns) already exploits an embarrassingly parallel algorithm, with our new parallel disturbance trace generator the entire SyLVaaS workflow can benefit of a cluster of machines at the SyLVaaS cloud infrastructure.
To the best of our knowledge, SyLVaaS is the first Web-based software-as-a-service tool for HILSbased SLFV.
