A significant challenge to the formal validation of softwarebased industrial control systems is that system requirements are often imprecise, non-modular, evolving, or even simply unknown. We propose a framework for mining requirements from the closed-loop model of an industrial-scale control system, such as one specified in the Simulink modeling language. The input to our algorithm is a requirement template expressed in Parametric Signal Temporal Logic -a formalism to express temporal formulas in which concrete signal or time values are replaced by parameters. Our algorithm is an instance of counterexample-guided inductive synthesis: an intermediate candidate requirement is synthesized from simulation traces of the system, which is refined using counterexamples to the candidate obtained with the help of a falsification tool. The algorithm terminates when no counterexample is found. Mining has many usage scenarios: mined requirements can be used to validate future modifications of the model, they can be used to enhance understanding of legacy models, and can also guide the process of bug-finding through simulations. We present two case studies for requirement mining: a simple automobile transmission controller and an industrial airpath control model for an engine.
INTRODUCTION
Industrial-scale controllers used in automobiles and avionics are now commonly developed using a model-based design (MBD) paradigm [23, 28] . The MBD process consists of a sequence of steps. In the first step, the designer captures the plant model, i.e., the dynamical characteristics of the physical parts of the system using logical, differential and algebraic equations. Examples of plant models include the rotational dynamics model of the camshaft in an automobile engine, the thermodynamic model of an internal combustion engine, and atmospheric turbulence models. The next step is to design a controller that employs specific control laws to regulate the behavior of the physical system. The closedloop model consists of the composition of the plant and the controller.
The designer may then perform extensive simulations of such a model. The goal is to analyze the controller design by observing the temporal behavior of the signals of interest by exciting the exogenous, time-varying inputs to the model. An important aspect of this step is validation, i.e. checking if the temporal behavior of the system matches a set of requirements. Unfortunately, in practice, these requirements are high-level and often vague. Examples of requirements include, "better fuel-efficiency", "signal should eventually settle", and "resistance to turbulence". If the simulation behavior is deemed unsatisfactory, then the designer refines or tunes the controller design and repeats the validation step.
In the formal methods literature, a requirement (also called a specification) is a mathematical expression of the design goals or desirable design properties in a suitable logic. In an industrial design setting, requirements are rarely expressed formally, and it is common to find them written in natural language. Control designers then validate their design manually by comparing experimental time traces to these informal requirements. In some cases, they simply use simulation-data and their domain expertise to determine the quality of the design. Moreover, to date, formal validation tools have been unable to digest the format or scale of industrial-scale models. As a result, widespread adoption of formal tools has been restricted to testing syntactic coverage of the controller code, with the hope that higher coverage implies better chances of finding bugs. It is clear that even simulation-based tools would benefit from the more semantic notions of coverage offered by formal requirements.
In this paper, we propose a scalable technique to systematically mine requirements from the closed-loop model of a control system from observations of the system behavior. In addition to the model being analyzed, our technique takes as input a template requirement. The final output is a synthesized requirement matching the template. In our current implementation, we assume that the model is specified in Simulink [21] , an industry-wide standard that is able to: (1) express complex dynamics (differential and algebraic equations), (2) capture discrete state-machine behavior by allowing both Boolean and real-valued variables, (3) allow a layered design approach through modularity and hierarchical composition, and (4) perform high-fidelity simulations. We remark that our technique is not restricted to Simulink models; in principle, it is applicable in any setting where the closed-loop system can be simulated, e.g., hardware-in-theloop simulations, and tests on the physical system.
Formalisms such as Metric Temporal Logic (MTL) [2, 17] , and later Parametric Signal Temporal Logic (PSTL) [6] have emerged as logics well-suited to capture both the real-valued and time-varying behaviors of hybrid control systems. As PSTL is equipped with parameters, properties in PSTL naturally express template requirements. As an example, consider the following natural language specification: "eventually between time 0 and some unspecified time τ1, the signal x is less than some value π1, and from that point for some τ2 seconds, it remains less than some value π2". In PSTL the above property would be expressed as:
Here, we interpret the unspecified values τ1, τ2, π1, π2 as parameters. The subset of PSTL with no parameters is referred to as STL. Robust satisfaction of MTL formulas [13] and quantitative semantics for PSTL [11] allow reasoning about how "close" a system behavior is to satisfying a given specification. Intuitively, a lower satisfaction value corresponds to a stronger property, making it easier for a behavior to violate the property. The proposed mining algorithm is an iterative procedure; in each iteration, it performs the following steps:
1. In the first step, the algorithm synthesizes a candidate requirement from a given PSTL template and a set of simulation traces of the model. The candidate requirement is the strongest STL property satisfied by the given set of traces. It is obtained by instantiating the PSTL template with the parameter values that minimize the satisfaction value of the PSTL property over the given traces.
2. It then tries to falsify the candidate requirement using a global optimization-based search, such as using stochasticsearch within the tool S-Taliro [5] .
3. If the falsification tool finds a counterexample, we add this trace to the existing set of simulation traces, and go to Step 1 of the next iteration. If no counterexample is found, the algorithm terminates.
At the heart of Step 1 is an efficient search over the space defined by the parameters in the PSTL property in order to generate a candidate requirement. For this purpose, we use the Breach tool [9] . If the number of parameters is n, a naïve search strategy in the parameter space would have an exponential cost in n.
However, we observe that the satisfaction values of certain PSTL properties are monotonic in their parameter values. For example, consider the property φ = 3 [0,τ ] (x > π). Suppose that the minimum value of a given trace x(t) is 3, then, starting from a value less than 3, as π increases, the property φ becomes a stronger assertion for the trace x(t), i.e., its satisfaction value decreases. Finally, when π exceeds 3, the satisfaction value becomes negative, i.e., φ no longer holds for x(t). Thus, we can say that the satisfaction value of φ monotonically decreases in the parameter π. Similarly, the satisfaction value of φ monotonically increases in τ . When monotonicity holds, we can get exponential savings when searching over the parameter-space by using methods like binary search. Though syntactic rules for polarity of a PSTL property identified in previous work [6] ensure satisfaction monotonicity, these rules are not complete. Hence, we provide a general way of reasoning about monotonicity of arbitrary PSTL properties using Satisfiability-Modulo-Theories (SMT) solving [7] .
In this paper, we explore two applications for requirement mining. The first application is the obvious one: to generate requirements that serve as high-level specifications for the closed-loop model. In an industrial setting, formalized requirements that can be used for design validation are often unavailable. For example, consider the case of legacy controller code. Such code usually goes through several years of refinement, is developed in a non-formal setting, and is not very easy to understand for any engineers other than its original developers. In this context, mined requirements can enhance understanding of the code and help future code maintenance. The second application explores the use of mining as an enhanced bug-finding procedure. Suppose we wish to check if the model behavior ever has a signal that oscillates with an amplitude greater than a threshold. Considering the huge space of input signals, simply running tests on the closed-loop model is unlikely to detect such behavior. We instead attempt to mine the requirement, "the signal settles to a steady value π in time τ " (roughly corresponding to the negation of the original property). In each step, our algorithm pushes the trajectory-space exploration of the falsification tool in a region not already subsumed by existing traces. Hence, the search for a counterexample is guided by the intermediate candidate requirements. Note that stateof-the-art falsifiers such as S-Taliro would require a concrete STL property encoding the oscillation behavior, which would require tedious manual effort given many possible expressions of such behavior arising from unknowns such as the oscillation amplitude, frequency, and the time at which oscillations start.
To summarize, our contributions are as follows: 1. We propose a novel counterexample-guided iterative procedure for mining temporal requirements satisfied by signals of interest of an industrial-scale closed-loop control model. Specifically, we target the mining of properties expressible in PSTL.
2. We extend Breach to support Simulink models and the falsification of STL formulas. In addition we enhance the Breach tool framework with efficient strategies for synthesizing parameters of monotonic PSTL properties. To extend the range of formulas for which we can prove monotonicity, and hence apply these strategies, we formulate the query for monotonicity in a fragment of first order logic with quantifiers, real arithmetic and uninterpreted functions, and use an SMT solver to answer the query. 
Modeling an Automatic Transmission Controller

A RUNNING EXAMPLE
As an illustrative example throughout the paper, we consider a closed-loop model designed for a four-speed automatic transmission controller of a vehicle (shown in Fig. 1 ). Although this model is not a real industrial model, it has all necessary mechanical components: models for the engine, the transmission, and the vehicle. The transmission block computes the transmission ratio (Ti) using the current gear status, and computes the output torque from the engine speed (Ne), the gear status and the transmission RPM. The other two blocks represent the gear shift logic and the related threshold speed calculation. The model has two inputs: (1) the percentage of the throttle position, and (2) the brake torque.
We are interested in the following signals: the vehicle speed, transmission gear position, and engine speed measured in RPM (rotations per minute). Suppose we want to use this controller to ensure the requirement that the engine speed never exceeds 4500 rpm, and that the vehicle never drives faster than 120 mph. After simulating the closedloop system we can show that these requirements are not met, as illustrated in Fig. 2 .
However, this negative result does not provide further insight into the model. If a requirement does not hold, we would like to know what does hold for the controller, and how narrowly the controller misses the requirement. Such a characterization would shed more light on the working of the system, especially in the context of legacy systems and for reverse engineering the behavior of a very complex system. In the context of this example, it would help to know the maximum speed and RPM that the model can reach, or the minimum dwell time that the transmission enforces to avoid frequent gear shifts. In the next section, we present a technique to automatically obtain such requirements from the model. 
PRELIMINARIES AND OVERVIEW
Signals and Systems
The systems considered in this paper are hybrid dynamical systems, that is systems mixing discrete dynamics (such as the shifting logic of gears) and continuous dynamics (such as the rotational dynamics of the car engine). Additionally, the systems are closed-loop, meaning that they are obtained by composing a controller and a plant in a loop.
1
We define a signal as a function mapping the time domain T = R ≥0 to the reals R. Boolean signals, used to represent discrete dynamics, are signals whose values are restricted to false (denoted ⊥) and true (denoted ). Vectors in R n with n > 1 are denoted in bold fonts and their components are indexed from 1 to n, e.g., p = (p1, · · · , pn). Likewise, a multi-dimensional signal x is a function from T to R n such that ∀t ∈ T, x(t) = (x1(t), · · · , xn(t)). A system S (such as a Simulink model) is an input-output state machine: it takes as input a signal u(t) and computes an output signal x(t) = S(u(t)). It is common to drop time t, and say x = S(u). A trace is a collection of output signals resulting from the simulation of a system, i.e., it can be viewed as a multidimensional signal. In the following, we use interchangeably the words trace and signal.
Signal Temporal Logic
Temporal logics were introduced in the late 1970s [24] to reason formally about the temporal behaviors of reactive systems -originally input-output systems with Boolean, discrete-time signals. Temporal logics to reason about realtime signals, such as Timed Propositional Temporal Logic [2] , and Metric Temporal Logic (MTL) [17] were introduced later to deal with dense-time signals. More recently, Signal Temporal Logic [20] was proposed in the context of analog and mixed-signal circuits as a specification language for constraints on real-valued signals. These constraints, or predicates can be reduced to the form µ = f (x) ∼ π, where f is a scalar-valued function over the signal x, ∼∈ {<, ≤, ≥, > , =, =}, and π is a real number.
Temporal formulas are formed using temporal operators, "always" (denoted as 2), "eventually" (denoted as 3) and "until" (denoted as U). Each temporal operator is indexed by intervals of the form (a, b),
where each of a, b is a non-negative real-valued con-stant. If I is an interval, then an STL formula is written using the following grammar:
The always and eventually operators are defined as special cases of the until operator as follows: 2I ϕ ¬3I ¬ϕ, 3I ϕ UI ϕ. When the interval I is omitted, we use the default interval of [0, +∞). The semantics of STL formulas are defined informally as follows. The signal x satisfies f (x) > 10 at time t (where
The signal x1 satisfies ϕ = 3 [1, 2) x1 > 0.4 iff there exists time t such that 1 ≤ t < 2 and x1(t) > 0.4. The twodimensional signal x = (x1, x2) satisfies the formula ϕ = (x1 > 10) U [2.3,4.5] (x2 < 1) iff there is some time u where 2.3 ≤ u ≤ 4.5 and x2(u) < 1, and for all time v in [2.3, u), x1(u) is greater than 10. Formally, the semantics are given as follows:
Extension of the above semantics to other kinds of intervals (open, open-closed, and closed-open) is straightforward.
We write x |= ϕ as a shorthand of (x, 0) |= ϕ.
Parametric Signal Temporal Logic (PSTL) [6] is an extension of STL introduced to define template formulas containing unknown parameters. Syntactically speaking, a PSTL formula is an STL formula where numeric constants, either in the constraints given by the predicates µ or in the time intervals of the temporal operators, can be replaced by symbolic parameters divided into two types:
• A Scale parameter π is a parameter appearing in predicates of the form µ = f (x) ∼ π,
• A Time parameter τ is a parameter appearing in an interval of a temporal operator.
An STL formula is obtained by pairing a PSTL formula with a valuation function that assigns a value to each symbolic parameter. For example, consider the PSTL formula ϕ(π, τ ) = 2 [0,τ ] x > π, with symbolic parameters π (scale) and τ (time). The STL formula 2 [0,10] x > 1.2 is an instance of ϕ obtained with the valuation v = {τ → 10, π → 1.2}.
Example 3.1. For the example from Sec. 2, suppose we want to specify that the speed never exceeds 120 and RPM never exceeds 4500. The predicate specifying that the speed is above 120 is: speed > 120 and the one for RPM is RPM > 4500. The STL formula expressing these to be always false is:
To turn this into a PSTL formula, we rewrite by introducing parameters π speed and πRP M :
The STL formula ψ expressed in (3.1) is then obtained by using the valuation v = (πspeed → 120, πrpm → 4500). formulation.
Problem 3.1. Given (a) a system S with a set U of inputs, and, (b) a PSTL formula with n symbolic parameters ϕ(p1, . . . , pn) where pi could either be scale parameter π or 
Template Requirement Our focus on "tight valuations" is to avoid mining trivial requirements or requirements that are overly conservative, e.g. "the car cannot go faster than the speed of light." We make this notion more precise in Section 5.
Requirement Mining Algorithm: Overview
Our algorithm for mining STL requirements from the closedloop model in Simulink is an instance of a counterexampleguided inductive synthesis procedure [29] , shown in Fig. 3 . It consists of two key components:
1. A falsification engine, which, given a formula ϕ generates an input u such that x(t) = S(u)(t) |= / ϕ if there exists such a u, and returns ⊥ otherwise. We denote this functionality by FalsifyAlgo.
A synthesis function denoted
FindParam that given a set of traces x1, . . . , x k , finds parameters p such that ∀i, xi |= ϕ(p). We denote this function by FindParam.
FALSIFICATION PROBLEM
Recall that we need to implement a function
such that x is a valid output signal of a system S and x |= / ϕ. Unfortunately, this is an undecidable problem for general hybrid systems; letting ϕ be a simple safety property establishes a reduction from the reachability problem for general hybrid systems, which is undecidable except for subclasses such as initialized rectangular hybrid automata [16] . For the latter subclasses the mining technique can be complete, i.e., absence of a counterexample means that we have found the strongest requirement. However, in general the falsification tool may not be able to find a counterexample though one exists. We argue that a requirement mined in this fashion is still useful as it is something that FalsifyAlgo is unable to disprove even after extensive simulations, and is thus likely to be close to the actual requirement. An alternative is to use a sound verification tool that employs abstraction [15, 30] ; however, in our experience, these tools have not scaled to the complex control systems that we consider here. In this paper, we follow the approach taken by the developers of the tool S-Taliro [5] and propose a falsification algorithm based on the minimization of the quantitative satisfaction of a temporal logic formula.
Quantitative Semantics of STL
The quantitative semantics of STL are defined using a real-valued function ρ of a trace x, a formula ϕ, and time t satisfying the following property:
Quantitative semantics capture the notion of robustness of satisfaction of ϕ by a signal x, i.e., whenever the absolute value of ρ(ϕ,
Without loss of generality, an STL predicate µ can be identified to an inequality of the form f (x) ≥ 0 (the use of strict or non strict inequalities is a matter of choice and other inequalities can be trivially transformed into this form). From this form, a straightforward quantitative semantics for predicate µ is defined as
Then ρ is defined inductively for every STL formula using the following rules:
Then it can be shown [11] that ρ satisfies (4.1) and thus defines a quantitative semantics for STL. Additionally, by combining (4.5), and 2I ϕ ¬3I ¬ϕ, we get
For 3, we get a similar expression using sup instead of inf.
Example 4.1. Consider again the STL property:
It has two predicates, say µ1 : speed ≤ 120 and µ2 : RPM ≤ 4500. To put them into the standard form µi : fi(x) ≥ 0, we define x = (speed, RPM), f1(x) = 120 − speed and f2(x) = 4500 − RPM. From (4.2), we get ρ(speed ≤ 120, x, t) = 120 − speed(t).
Applying rule (4.6) for the semantics of 2, we get:
Similarly for µ2,
Finally, by applying rule (4.4):
Informally, the satisfaction function ρ looks for the maximum speed and RPMs over time and returns the minimum of the differences with the thresholds 120 and 4500.
STL vs. MTL Robust Satisfaction
In this section, we clarify the connection between the quantitative semantics of STL defined above and the notion of robust satisfaction of MTL as defined in [13] and used in S-Taliro. The main difference between STL and MTL lies in the definition of predicates, and so does the difference between quantitative semantics. The robust semantics of MTL is based on the definition of a metric on the state space of signals and the fact that each predicate is identified with a set where it holds true. Formally, let d be a metric on R n with the usual extension to the signed distance from a point x ∈ X to a set X ⊆ X :
For each MTL predicate µ, define its truth set O(µ) as:
and let 
Solving the Falsification Problem
The objective of the falsification problem is: given an STL formula ϕ, find a signal u such that S(u) |= / ϕ. Following the above definitions, this is equivalent to finding a trace x such that ρ(ϕ, x, 0) < 0. Hence FalsifyAlgo can be implemented by solving
Then if ρ * < 0, we return u * = arg min u∈U ρ(ϕ, S(u), 0), otherwise, S |= ϕ. The undecidability of the falsification problem is reflected here in the fact that the minimization problem (4.9) is a general non-linear optimization problem for which no solver can guarantee convergence, uniqueness or even existence of a solution. On the other hand, many heuristics can be used to find an approximate solution. In a series of recent papers, the authors of S-Taliro proposed and implemented different strategies, such as Monte-Carlo [22] , and the cross-entropy method [25] . In our implementation, we first instrumented S-Taliro as a falsification tool (made possible by the connection between STL and MTL described in the previous section) and then extended Breach with a new falsification engine which attacks (4.9) as follows:
1. Define the space of permissible input signals with the help of m input parameters k = (k1, . . . , km) that take values from a set Pu, and a generator function g such that u(t) = g(v(k))(t) is a permissible input signal for S for any valuation v(k) ∈ Pu.
Data: A trace x, a PSTL Formula ϕ, and parameter set P, δ > 0
2. Sample signal-parameters in a uniform, random fashion to obtain Ninit distinct valuations vi(k) ∈ Pu.
3. For i ≤ Ninit, solve min
Mead non-linear optimization algorithm and vi(k) as an initial guess.
4. Return the minimum ρ thus found.
For example, if permissible input signals are step functions, then the input parameters would characterize the amplitude of the step, and the time at which the step input is applied. Note that g does not necessarily generate all possible inputs to the system. However, it is useful in a very generic way to restrict the search space of possible input signals. One motivation for implementing a falsification module in Breach has been to get more flexibility in the definition of input parameters than available in the version of S-Taliro that we used. In the experimental section, we discuss some results using both S-Taliro-based falsification and the above algorithm. We found in particular that the choice of input parameters, of Ninit and the tuning of NelderMead algorithm (which provides a trade-off between global randomized exploration and local optimization) were crucial for the performance of the falsifier.
PARAMETER SYNTHESIS
We now discuss the function FindParam. Recall that given a trace 2 x, we need to find a valuation v for the parameters p1, . . . , pn, of ϕ such that x satisfies ϕ(v(p1), . . . , v(pn)) (abbreviated as ϕ(v) in the following). This problem is a dual of the falsification problem (4.9) formulated as:
However, there is an important difference that the cost function can be expressed as a closed-form expression of the decision variable v whereas for (4.9) as a function of u. By taking advantage of this knowledge, (5.1) can be solved more efficiently, in particular as we will see, if formulas satisfy the important property of monotonicity: Definition 5.1. A PSTL formula ϕ(p1, · · · , pn) is monotonically increasing with respect to pi if for every signal x,
It is monotonically decreasing if this holds when replacing v (pi) ≥ v(pi) with v (pi) ≤ v(pi).
2 We restrict our attention to one trace though in the mining process, FindParam has to work on a set of traces. The generalization to multiple traces is straightforward. In the second part of this section, we characterize this notion more precisely. We impose an additional constraint to the parameter synthesis problem: we require that the STL formula mined be "tightly" satisfied by the system up to a given precision δ > 0. Formally, The rationale is that for a specification to be useful it should not be too conservative. The implication is that it is not enough to find a satisfying valuation, we also need to optimize it for each parameter to get δ-satisfaction. If there is more than one parameter, then the solution is not unique. In fact, all valuations that are at a distance δ from the boundary of the validity domain of ϕ and x (the set of valuations v for which x |= ϕ(v)) are valid solutions. In [6] , the authors note that if the formula is monotonic, then this boundary has the properties of a Pareto surface for which there are efficient computational methods, basically equivalent to multi-dimensional binary search. Here we propose an algorithm (Algorithm 1) for monotonic formulas that takes advantage of this property, and implement it in the Breach tool. Algorithm 1 starts by trying to find a valuation v that satisfies the property and a valuation v ⊥ that violates it in a parameter range P provided by the user. By property of monotonicity, it is sufficient to check the corners of P for the existence of v and v ⊥ . Then, each parameter i is adjusted using a binary search initialized with v (pi) and v ⊥ (pi). The user can choose which parameters to optimize first by specifying a priority ordering for the input parameters. Note that different orderings can give drastically different results. Time is plotted on Fig. 4 . The algorithm will return different values depending on the tightness parameter δ and if we order the parameters as (π,τ ) or (τ, π). Here, the order represents the preference in optimizing a parameter over the other when mining for a tight specification.
SATISFACTION MONOTONICITY
We first show that checking if an arbitrary PSTL formula is monotonic in a given parameter is undecidable.
Theorem 6.1. The problem of checking if a PSTL formula ϕ(p) is monotonic in a given parameter pi is undecidable.
Proof. First, we observe that STL is a superset of MTL. We know from [1] that the satisfiability problem for MTL is undecidable. Thus, it follows that the satisfiability problem for STL is also undecidable. This, in turn, implies undecidability of the satisfiability problem of PSTL with at most one parameter (denoted as PSTL-1-SAT). We now show that PSTL-1-SAT can be reduced to a special case of the problem of checking monotonicity of a PSTL formula.
Let ϕ(p) be an arbitrary PSTL formula where the set of parameters p is the singleton set with one time parameter τ (thus, τ ≥ 0). Construct the formula ψ(p) . = (τ = 0) ∨ ϕ(p). Consider the monotonicity query for ψ(p) in parameter τ :
Consider the specialization of this formula for the case v(τ ) = 0. Note that, in this case, ψ(0) = , and that v (τ ) ≥ 0 for all v . Thus, the query simplifies to ∀v , x : x |= ψ(v (τ )), i.e., checking the validity of the PSTL formula ψ(τ ).
Thus, to check monotonicity of PSTL formula ϕ in one parameter τ one needs to check that the negation of ψ(τ ) is unsatisfiable. Thus the above specialization of the problem of checking the monotonicity of PSTL formulas is also undecidable, implying undecidability of the general case.
Monotonicity is closely related to the notion of polarity introduced in [6] , in which syntactic deductive rules are given to decide whether a formula is monotonic based on the monotonicity of its subformulae. Thus, one way to tackle undecidability is to first query if the given PSTL formula belongs to the syntactic class described in [6] . Unfortunately, the syntactic rules described therein are not complete; there are monotonic PSTL formulas that do not belong to this syntactic class, for instance, formulas with intervals in which both end-points are parameterized, such as the following:
Next, we show how we can use SMT solving to query monotonicity of a formula. If the SMT solver succeeds, it tells us that the formula is monotonic and allows us to use a more efficient search in the parameter space. For instance, we were able to show that the PSTL formula represented in (6.1) is monotonically decreasing in the parameter τ .
Encoding PSTL as constraints. Given a PSTL formula ϕ, we define the SMT encoding of ϕ in a fragment of firstorder logic with real arithmetic and uninterpreted functions. Let E(ϕ) denote the encoding of ϕ, which we define inductively as follows:
• Consider a constraint µ . = g(x) > τ , where x = (x1, . . . , xn). We model each signal xi as an uninterpreted function χi from R to R. We create a new free variable t of the type Real and replace each instance of the signal xi in g(x) by χi(t). We assume that the function g itself has a standard SMT encoding. For example, consider the formula g(x) > τ , where x = {x1, x2}, and g(x) = 2 * x1 + 3 * x2. Then E(µ) is: 2 * χ1(t) + 3 * χ2(t) > τ .
• For Boolean operations, the SMT encoding is inductively applied to the subformulas, i.e., if ϕ = ¬ϕ1, then E(ϕ) = ¬E(ϕ1). If ϕ = ϕ1∧ϕ2, then first we ensure that if E(ϕ1) and E(ϕ2) both have a free time-domain variable, then we make it the same variable, and then, E(ϕ) = E(ϕ1) ∧ E(ϕ2). Note that as a consequence, there is at most one free time-domain variable in any subformula.
• Consider ϕ = H (a,b) (ϕ1), where a, b are constants or parameters, and H is a unary temporal operator (i.e., 3, 2). There are two possibilities:
(1) The SMT encoding E(ϕ1) has one free variable t. In this case, we bound the variable t over the interval (a, b) using a quantifier that depends on the type of the temporal operator H. With 3 we use ∃ as the quantifier, and with 2 we use ∀.
(2) The SMT encoding E(ϕ1) has no free variable. This can only happen if ϕ1 is or ⊥, or if all variables in ϕ1 are bound. In the former case, the encoding is done exactly as in Case 1. In the latter case, the encoding proceeds as before, but all bound variables in the scope are additionally offset by the top-level free variable. Suppose, ϕ = 2 (0,∞) 3 (1,2) (x > 10). Then, the encoding of the inner 3-subformula has no free variable. Note how the bound variable of this formula is offset by the top-level free variable in the underlined portion in E(ϕ) below:
• Consider ϕ = ϕ1U (a,b) ϕ2, where a, b are constants or parameters. For simplicity, consider the case where ϕ1 and ϕ2 have no temporal operators, i.e., E(ϕ1) and E(ϕ2) both have exactly one free variable each. Let t1 be the free variable in E(ϕ1) and t2 the free variable in E(ϕ2). Then E(ϕ) is given by the formula:
If ϕ1, ϕ2 contain no free variables, then t1, t2 are respectively used to offset all bound variables in their scope as before.
Using an SMT solver to check monotonicity. To check monotonicity, we check the two assertions below: Table 2 : Results on mining for the automatic transmission control model. We compare runs of the mining algorithm using either S-Taliro or Breach as falsifiers. In each case and for each template formula, we give the parameters valuations found, the time spent in falsification and in parameter synthesis, the number of simulations and the averaged time spent computing the quantitative satisfaction of the formula by one trace.
If either of these queries is unsatisfiable, then it means that satisfaction of ϕ is indeed monotonic in τ . If both queries are satisfiable, then it means that there is an interpretation for the (uninterpreted) function representing the signal x and valuations for τ, τ which demonstrate the nonmonotonicity of ϕ. We conclude by presenting a small sample of formulas for which we could prove or disprove monotonicity using the Z3 SMT solver [8] in Table 1 . The symbols +, -, and * represent monotonically increasing, decreasing and non-monotonic formulas respectively.
CASE STUDIES
In what follows, we present our evaluation of both STaliro and an extension of Breach for falsification. We also show the performance of the parameter synthesis algorithm implemented with the robust satisfaction engine of Breach. We use the transmission controller model to benchmark the different options within our approach.
Automatic Transmission Model
For the model described in Sec. 2, we tested different template requirements:
1. Requirement ϕsp_rpm(π1, π2) specifying that always the speed is below π1 and RPM is below π2 :
2. Requirement ϕrpm100(τ, π) specifying that the vehicle cannot reach the speed of 100 mph in τ seconds with RPM always below π:
3. Requirement ϕstay(τ ) specifying that whenever the system shifts to gear 2, it dwells in gear 2 for at least τ seconds:
Here, the left-hand-side of the implication captures the event of the transition from gear 2 to another gear. The operator 3 [0,ε] here is an MTL substitute for a next-time operator. With dense time semantics, ε should be an infinitesimal quantity, but in practice, we use a value close to the simulation time-step.
The above requirements have strong correlation with the quality of the controller. The first is a safety requirement characterizing the operating region for the engine parameters speed and RPM. The second is a measure of the performance of the closed loop system. By mining values for τ , we can determine how fast the vehicle can reach a certain speed, while by mining π we find the lowest RPM needed to reach this speed. The third requirement encodes undesirable transient shifting of gears. Rapid shifting causes abrupt output torque changes leading to a jerky ride.
Results on the mined specifications are given in Table 2 . We used the Z3 SMT solver [8] to show that all of the requirements were monotonic. As expected, the FindParam algorithm takes only a fraction of the total time in the entire mining process. For the second template, we tried two possible orderings for the parameters. By prioritizing the time parameter τ , we obtained the δ-tight requirement that the vehicle cannot reach 100 mph in less than 12.2s (we set δ to 0.1). As the requirement mined is δ-tight, it means that we found a trace for which the vehicle reaches 100 mph in 12.3 s. Similarly, by prioritizing the scale parameter π, we found that the vehicle could reach 100 mph in 50s keeping the RPM below 3278 (δ = 5 in that case). For the third requirement, we found that the transmission controller could trigger a transient shift as short as 0.112s. This corresponds to the up-shifting sequence 1-2-3. Using a variant of the requirement (not shown here), we verified that a (definitely undesirable) short transient sequence of the form 1-2-1 or 3-2-3 was not possible.
The comparison between S-Taliro and Breach falsifiers shows better overall performance with the extended Breach-based falsifier, in the sense that it found stronger requirements using less number of simulations and computational time. However, we cannot conclude that the new falsifier will always outperform S-Taliro, due to the stochastic nature of the problem and a lack of thorough comparison with the different flavors of optimization used within S-Taliro. Based on results shown in Table 2 and our experience, we make some observations:
• The space of input signals needs to be parameterized with a sensible number of signal-parameters. If too many parameters are used, the search space is too big and falsification becomes difficult. For instance, the short transient shifting of ϕstay was found by introducing a signal-parameter controlling the time of initial acceleration, and by preventing acceleration and braking at the same time. We remark that extending Breach to enforce such constraints over the input signal space is a key reason for its better performance, and a fair comparison would be possible only after repeating these steps for S-Taliro.
• Requirements involving discrete modes are challenging because they induce "flat" quantitative satisfaction functions that are challenging to optimizers and thus have limited value in guiding the falsifier. This is related to the problem of finding a good metric between discrete states in hybrid systems. This was particularly an issue when mining the ϕstay requirement. We were able to tune our falsifier by turning off its local optimization phase, and using uniform random sampling, which led us to obtaining a tighter requirement than with S-Taliro.
• We found that while both falsifiers are expected to exhibit run-times linear in the size of the traces and the formula [11, 14] , in some cases, Breach runs faster. In particular, S-Taliro is more sensitive to parameter priorities. For the same template ϕrpm100, depending on which parameter τ or π is prioritized, S-Taliro performs differently. This can be explained by the fact that τ affects the horizon of the temporal operator 3. We conjecture that the difference in run-times and mined parameter values for the ϕstay template is due to our inability to express signal-parameterization in S-Taliro, but these comparisons require more dedicated studies on various benchmarks before drawing firm conclusions.
Diesel Engine Model
Next, we consider an industrial-scale, closed-loop Simulink model of an experimental airpath controller for a diesel engine. It has more than 4000 Simulink blocks such as data store memories, integrators, 2D-lookup tables, functional blocks with arbitrary Matlab functions, S-Function blocks, and blocks that induce switching behavior such as levelcrossing detectors, multiports, and saturation blocks. The models takes two signals as input: the fuel injection rate and the engine speed. The output signal is the intake manifold pressure denoted by x. For proprietary reasons, we suppress the mined values of the parameters and the timedomain constants from our requirements. We replace the time-domain constants by symbols such as c1 and c2.
Discussions with control designers revealed that characterizing overshoot behavior of the intake manifold pressure is important. The inputs to the closed-loop model are a step function to the fuel injection rate at time c1, and a constant value for the engine speed. The first requirement is:
This template characterizes the requirement that the signal x never exceeds π during the time interval (c1, ∞), i.e., it finds the maximum peak value (i.e., π) of the step response. Our mining algorithm obtained 7 intermediate candidate requirements that were falsified by S-Taliro, till we found a requirement that it could not falsify in its 8 th iteration. The total number of simulations was 7000 over a period of 13 hours.
Next, we chose to mine the settling behavior of the signal. The settling time is the time after which the amplitude of signal is always within a small error from its calculated ideal reference value. We wish to mine both the error and how fast the signal settles. Such a template requirement is given by the following PSTL formula:
It specifies that the absolute value of x is always less than π starting from the time τ to the end of the simulation. The smaller the settling time and the error, the more stable is the system. We found out from the control designer that a smaller settling time needs to be prioritized over the error (as long as the error lies within the 10% of the signal amplitude), so we prioritize minimizing τ over minimizing π.
After 4 iterations, the procedure stops as the inferred value for τ is very close to the end of the simulation trace, The simulation trace x (in blue) denoting the difference between the intake manifold pressure and its reference value 4 found when mining ϕsettling_time(τ, π) displays unstable behavior. The maximum error threshold is depicted in red.
but the error is still larger than the tolerance. The implication here is that the algorithm pushed the falsifier to finding a behavior in the model that exhibits hunting behavior, or oscillations of magnitude exceeding the tolerance. This output signal is shown in Fig. 5 . This behavior was unexpected; discussions with the designers revealed that it was a real bug. Investigating further, we traced the root-cause to an incorrect value in a lookup table; such lookup tables are commonly used to speed up the computation time by storing pre-computed values approximating the control law.
This experiment demonstrates the use of requirement mining as an advanced, guided debugging strategy. Instead of verifying correctness with a concrete formal requirement, the process of trying to infer what requirement a model must satisfy can reveal erroneous behaviors that could be otherwise missed. In the course of our experiments, we encountered other suspicious (for instance Zeno-like) behaviors, which we suspect to be either an error in the model, or an improper tuning of the numerical solver leading to discontinuities in the dynamics.
RELATED WORK
Mining requirements from programs and circuits is wellstudied in the field of computer science [3, 4, 12, 18, 19, 26, 27, 32] . In computer science, the word "requirement" is often synonymous with "specification". Specification mining techniques vary based on the kind of specifications mined; examples include automata, temporal rules, and sequence diagrams. They also vary based on the input to the miner; techniques based on static analysis or model checking operate on the source code, while dynamic techniques mine from execution traces. Mining temporal rules [3, 27] involves learning an automaton that captures the temporal behavior and typically focusses on API usage in libraries. The individual components within such libraries are often terminating programs, and specification automata encode legal interaction-patterns between components. In contrast to the software world, where most programs have discrete-time semantics, the behavioral requirements that we mine are for systems with both continuous and discrete-time semantics. It may be worthwhile to see if automata-based mining could be adapted to the hybrid systems domain. The work closest to the proposed approach appears in [33] , in which the authors introduce Parametric MTL (PMTL), which adds a single time or scale parameter to MTL formulas. This parameter is then estimated using stochastic optimization within S-Taliro. We remark that we provide a way to reason about monotonicity of PSTL formulas with arbitrary number of parameters, and also allow mining non-monotonic PSTL formulas (albeit less efficiently).
To the best of our knowledge, this work is among the first to address the specification mining problem for cyberphysical systems. From a broader perspective, the literature reports several attempts to apply formal methods to industrial-scale block-based design tools such as Simulink. In [10] , the authors verify simple safety properties using sensitivity analysis. Other approaches try to transform Simulink diagrams into models amenable to formal verification [31, 34] . When successful, such approaches provide very strong guarantees. However the type of blocks that can be handled is usually limited and we are not aware of scalable analysis tools for models representing hybrid systems.
