Model Checking is well established as a veri cation technique for nite-state systems. Several important types of systems, such as protocols parameterized by the number of processes, are however inherently in nitestate, hence Model Checking cannot be applied directly to determine correctness of the system. We present here a case study on the veri cation of such a parameterized protocol, the SAE-J1850 data transfer procotol. This is an standard in the automobile industry, where it is used to transmit data between various sensors and micro-controllers in an automobile. The protocol communicates data over a singlewire bus, and provides on-the-y arbitration between competing transmissions. Our veri cation e ort is interesting from many aspects : it proves correctness for arbitrary instances, is largely automated, and uses abstraction in an essential way. The abstractions used are exact, in the sense that a property is true of the parameterized protocol i it is true of the nite-state abstraction.
Introduction
Model Checking CE 81] (cf. QS 82, CES 86]) is well established as a veri cation technique for nite-state systems. Many communication protocols, however, are parameterized by the number of processes, which induces an in nite family of (usually) nite-state instances. While Model Checking may be used to verify correctness of individual instances, this does not provide any guarantee that the entire family of instances is correct 1 . Thus it is an important research task to develop algorithms and semi-algorithmic procedures to verify parameterized systems. The general problem is known to be undecidable AK 86]; however, algorithms exist for speci c types of systems (cf. GS 92, EN 95, EN 96]), and semi-algorithmic procedures have been proposed to deal with general systems (cf. CG 87, SG 89, KM 89, WL 89, PD 95, CGJ 95]). We present here a case study on the veri cation of an industrial standard parameterized protocol. The protocol is called the SAE-J1850 protocol SAE 92], and is an automobile industry standard for transmitting data between various sensors and controllers in an automobile. The system consists of a single-wire bus, to which several controllers (units) are attached. Since the bus is a single wire, symbols 0 and 1 are transmitted by encoding them by both the length and the value of a bus pulse. For instance, a 0 may be sent with either a long high or a short low pulse. Several units may transmit concurrently; the protocol incorporates a distributed, on-the-y arbitration mechanism which ensures that only the units transmitting the highest priority message succeed. Priority between messages (strings over f 0 ; 1 g) is determined by lexicographic order, given that the symbol 0 has priority over the symbol 1 . The protocol is correct if it ensures that the arbitration mechanism functions correctly. We should note here that the protocol as described in SAE 92] has other higher-level functionality, which we have not considered, in order to concentrate our attention on the core arbitration question. The protocol is further complicated by the presence of arbitrary, but bounded delays in the units sensing changes in the global bus state. These delays have an electrical origin; they arise typically because of delays in the detection circuitry, and the presence of di erent bias voltages at the units. To accommodate these delays, \long" and \short" are actually time intervals, whose length is proportional to the maximum delay. Thus the protocol is parameterized both by the maximum delay, and by the number of units taking part in it. The veri cation of the protocol proceeds by two applications of abstraction, one for each parameter. The rst abstraction theorem shows a delay independence property of the protocol : an instance of the protocol with n processes and maximum delay is correct i the instance with n processes and maximum delay 2 is correct. Thus, correctness need be proved only for the family of instances with maximumdelay 2. The second abstraction uses the algorithm in EN 96] to handle the parameterization over the number of units in a fully automated manner; the algorithm constructs a nite \abstract graph", which represents the entire family of instances exactly, over which properties can be model-checked. In the EN 96] paper, a simpli ed version of this protocol, without the complexity introduced by the delays, was veri ed. The modeling of the delay not only introduces complexity into the behavior of the units, but also introduces additional parameterization into the protocol, which is dealt with using the delay independence theorem. The success of this e ort leads us to believe that careful speci cation of the computational model underlying other protocols will expose constraints that can be utilized, as in this case, for developing decision procedures for large classes of protocols. In addition, this exposes a dire need for developing and popularizing notation for expressing such protocols. Remarkably, the SAE-J1850 document does not contain a succinct protocol description; the development of such a description was a major component of this project. The successful veri cation of the protocol using symbolic methods, despite the theoretical result on PSPACE-completeness of the procedure used EN 96], is reason to believe that fully automated parameterized veri cation is feasible for reasonably sized protocols. The rest of the paper is structured as follows: Section 2 describes the various components of the protocol in more detail. Section 3 discusses the abstractions used for handling the parameterization. In Section 4, we describe the implementation of the EN 96] algorithm, and its application to this protocol. Section 5 concludes the paper and provides comparisons with related work.
Protocol Description
The SAE-J1850 protocol is a data transfer protocol over a single wire bus, which is intended to be used for communication between various sensors and controllers in an automobile. The restriction to a single wire bus reduces wiring complexity. An instance of the system consists of several units connected to a single bus. The units communicate by broadcasting messages (sequences of symbols from the set f 0 ; 1 g) over the bus. Units may transmit concurrently; arbitration takes place during transmission. The arbitration mechanism is de ned in terms of priority among symbols; the symbol 0 has higher priority than 1. The priority order among symbols is extended to messages as lexicographic ordering. The key correctness property of this protocol is that the arbitration mechanism works as follows : whenever several units are sending messages concurrently, the message with the highest priority is placed on the bus. As the bus is restricted to a single wire, symbols are encoded essentially by pulses of di ering length. For instance, a 0 symbol is encoded with either a \long" high pulse, or by a \short" low pulse. The high and low states on the bus are referred to as Dominant and Passive respectively in the SAE-J1850 document SAE 92], so we will use this terminology in the rest of the chapter. The state of the bus is an \or" of the bus states desired by the units. The protocol is further complicated by non-deterministic, though bounded delays in the units when sensing a bus change. This delay is caused either by bias voltages, or by delays in the detection circuitry. To account for these delays, \long" and \short" are not xed numbers in the protocol, but are instead non-empty intervals, whose length is proportional to the maximum delay parameter, which we term . We will continue to use the symbolic names "long" and "short". There are four parameters associated with a symbolic length l : Txmin Notice that the values increase in increments of , and that the least Long value exceeds the largest Short value by . The core of the protocol is the following procedure followed by each unit to transmit a symbol s with symbolic length l at a bus value of b (e.g., 0 as a Long Passive pulse). localbus is the bus value perceived by the unit (which may di er from the actual bus value during a bus transition), request is the bus value desired by the unit at the next cycle, and counter is the internal count of the number of cycles elapsed for this transmission. At the entry to this procedure, localbus = b and counter = 1. 
Correctness Properties
The correctness property was stated informally in the protocol description as : Whenever several units are transmitting messages concurrently, the message with the highest priority is the one placed on the bus. This property can be stated precisely in CTL as follows: Consider n units connected to the bus, indexed by i 2 1; n]. Let M(k) denote the set of message strings (over f 0 ; 1 g) of length k. For each i in 1; n], let msg i denote the message string from M(k) that is associated with unit i. Let max i fmsg i g denote the maximum message, according to the lexicographic order on messages. Let B denote the message that is transmitted on the bus (this may be de ned as an auxiliary variable that records symbols as they are transmitted on the bus). Let tr i be a boolean auxiliary variable that records if unit i is transmitting. The following CTL formula expresses the property above:
This expression is of nite length for xed k. Veri cation of this property for a xed k requires adding state to each unit to store message contents, which makes the state space intractably large. To solve this problem, we modify the environment of the protocol so that the message sent by a unit is generated on the y. At any state, let sent i denote the message sent by a unit. The modi ed correctness property is as follows : (C1) AG((9i tr i )^B = ^(8i : tr i : sent i = ) ) AG((9i tr i^B = sent i )^(8j sent j B)))^:(9v : AG(B = v^(9i : tr i ))) Informally, this property states that starting at any state where both the message on the bus and that at the units is empty, at any point of time the message on the bus is equal to the lexicographic maximum of the messages sent by the currently transmitting processes. Furthermore, the message on the bus B must increase as long as there is a transmitting process. While the new environment is simpler, the statement of the property still involves several (unbounded) auxiliary variables. Instead of checking this property, which refers to the history of a computation, we 4 check several properties that deal with the transmission of a single symbol. We show in lemma 1 that their conjunction implies (C1 A stable state on an execution sequence is the state time units after a bus value change. By the protocol de nition, in this state every process perceives the new bus value. Thus, a stable state is the rst state for which insymbol is true after a bus change. Consider any reachable state where some process is transmitting and B = . We show by induction on the number of stable bus states on any computation from that state that the following claim holds: (IH) At the start of the kth stable bus state, the message on the bus is the empty sequence, if k = 0; otherwise, it is the lexicographic maximum of the messages sent by processes that were transmitting at the start of the k ? 1th stable bus state. All units that have sent the message on the bus are transmitting, and all units sending messages that are lexicographically strictly smaller have stopped transmission. At the kth stable state, there is at least one transmitting process. Basis : k = 0. The message on the bus as well as the message sent by every unit are both the empty sequence, so the claim holds. Inductive step : Assume that (IH) holds at the start of the kth stable bus state. By (IH), there is at least one unit transmitting at the kth stable bus state. There are two cases to consider: Suppose that some unit attempts to transmit 0 at the kth stable state. By (C2a), the next symbol on the bus is 0. By (C2c), any unit transmitting 0 does not fail before the (k + 1)th stable state. By (C2d), all units attempting to transmit 1 fail before the (k + 1)th stable state. Suppose that some unit attempts to transmit 1 at the kth stable state, and no unit attempts to transmit 0 at that state. By (IH), all units with a lower priority pre x among the rst k symbols have failed. By 5 (C2b), the next bus symbol is 1, and by (C2e), every unit transmitting 1 is still transmitting at the (k+1)st stable state. In either case, the inductive hypothesis holds. Furthermore, the proof shows that as long as there is a transmitting process, the bus value changes at the next stable state, hence (C1) is true.
Abstractions
The procotol described above is parameterized by the maximum delay parameter, , as well as the number of processes, N, taking part in it. Let P(N; ) stand for the instance of the protocol with N processes and delay . These parameters make the protocol in nite state, so that Model Checking cannot be applied to determine correctness. In order to apply Model Checking, we apply two abstractions that reduce the problem to an equivalent nite-state problem. The rst abstraction demonstrates a delay insensitivity property of the protocol : for every N, P(N; ) is correct i P(N; 2) is correct. Hence, protocol correctness need be checked for only the set of instances with maximumdelay 2. However, there are in nitely many such instances, so this is still a parameterized problem. This problem can be solved using the algorithm presented in EN 96]. This algorithm constructs a nite "abstract graph", which encodes exactly the instances of the system. Model Checking the abstract graph is thus equivalent to checking the parameterized system. The experimental details are presented in the following section.
Delay Insensitivity
The timing parameters, as noted in the protocol description, are proportional to the maximum delay parameter of the system. In addition, each test of the counter is of the form counter 2 hl ; r i (the angled brackets indicate either a open or a closed end to the interval), and each assignment to the counter is of the form counter := choosehl ; r i, which assigns to the counter a value from the interval. For such a system working with an underlying dense time, it is easy to show that if the intervals hl ; r i are changed to hl; ri (dividing through by ), the resulting un-parameterized system has the same computations w.r.t. the states of the processes as the original one. This is so since global states with identical local states and clocks related by scaling with , are bisimilar. This class of systems thus forms a decidable instance of parameterized real-time reasoning (cf. AHV 93]). Since our model of the bus system is over integer time (each transition takes 1 time unit), we cannot use this result. However, the protocol satis es additional properties that make a similar reduction possible. We show that any execution of P(n; d) (d even and at least 2) can be simulated by an execution of P(n; 2), in the sense that the sequence of symbols on the bus is the same. The following key lemma is needed for the proof of the theorem below :
Lemma 2 Let be an execution of P(n; d) (d even and at least 2). Let l be the length of the time interval between the ith and (i + 1)st stable bus states in (i.e., either Long or Short). Then 1. Every process sending a symbol with a di erent length is aborted by the start of the i + 1st stable state, and 2. Every process sending a symbol with the same length is live at the start of the i + 1st stable state.
Theorem 1 Let be an execution of P(n; d) (d even and at least 2). There is an execution of P(n; 2) such that the sequence of symbols on the bus is identical in and .
Proof.
We construct inductively, such that for each i, i is a sequence ending at the ith stable state, that matches the pre x of up to its ith stable state. Let 0 equal 0 . Suppose that i has been constructed so that i ends in the ith stable state on , the symbols on the bus in i and in the subsequence of up to and including the ith stable state are identical, and the local states of corresponding processes in the ith stable states are the same except for, possibly, the counter values. where l is the length that p sends its symbol at. Txmin(l) = (a=2) , for some a. The order of counter values is the same in the ith stable state in . Hence, in every execution starting at the ith stable state in , process p still is one of the processes that determine the bus change. As the change of bus state occurs at the same multiple of , the length, and hence the symbol is the same. There exists a execution where after the bus change the counter values for processes not aborted are chosen in the order of counter values at the (i + 1)th stable state of . From the previous lemma, the processes un-aborted at the (i + 1)st stable states in and are the same; hence, the inductive assumption holds at the (i + 1)st stable state. We obtain the following theorem as a corollary:
Theorem 2 (Delay Insensitivity) P(n; d) is correct for every even d, d 2, i P(n; 2) is correct. Proof.
The direction from left to right follows by instantiating d with 2. For the direction from right to left, note that if P(n; d) is incorrect for some d, then it contains a computation where the sequence of symbols on the bus is not the lexicographically greatest. By the previous theorem, this computation can be simulated by one in P(n; 2), so P(n; 2) would be incorrect.
Proof of Lemma 2:
Note that at a stable state, all processes have the same requested bus state, although they may be transmitting di erent symbols with di ering lengths. In the interval between stable states, for any pair of processes p; q, jcounter p ? counter q j .
(i) The length of the interval is Long. Let p be the process determining the new symbol. As the bus change occurs when p's counter value equals Txmin(Long), Txmin(Long) ?
counter q Txmin(Long) + , for any process q, i.e., 6:5 counter q 8:5 . If q sends a symbol by a short pulse, as Trmax(Short) < 6:5 , q will have aborted by the time that the bus changes state. If q sends by a long pulse, its counter value remains in the interval Trmin(Long); Trmax(Long)] up to the next stable state, by which time the new bus state is perceived by q. Hence, every process sending a di erent length aborts, and every process sending a symbol with the same length is live at the next stable state.
(ii) The length of the interval is Short. Let p be the process determining the new symbol. As the bus change occurs when p's counter value equals Txmin Short , Txmin(Short) ? counter q Txmin(Short) + , for any process q, i.e., 2:5 counter q 4:5 . If q sends by a long pulse, then as Trmin(Long) = 6:5 , q will have aborted by the next stable state (which occurs in the interval 3:5 ; 5:5 ]). If q sends by a short pulse, its counter value remains in the interval Trmin(Short); Trmax(Short)] up to the next stable state, by which time the new bus state is perceived by q. Hence, every process sending a di erent length aborts, and every process sending a symbol with the same length is live at the next stable state. It is easily seen from the protocol that once a bus value changes, it remains constant up to the next stable state.
Many-Process Veri cation
The delay insensitivity theorem (Theorem 2) shows that it is both necessary and su cient to check every instance with delay 2 in order to check correctness for instances over all other delay values. While this eliminates consideration of the delay parameter, the reduced system is still in nite-state, as it is parameterized by the number of processes (units) taking part in the protocol. Veri cation of this parameterized system can be carried out fully automatically using the algorithm described in EN 96]. This algorithm is based on a synchronous control-user model, where the instances of the parameterized system consist of a xed control process C, and many copies of a xed user process U. The n-process instance can thus be described by C k U 1 k : : : k U n , where k denotes synchronous composition. In the SAE-J1850 protocol, the control process models the behavior of the bus, while the user process models the behavior of a single unit, together with some machinery for modeling the delays in detecting bus value changes. The algorithm of EN 96] constructs, for such a control-user parameterized system, an "abstract graph", which is a nite-state abstraction of the entire family of instances. The states of the abstract graph record only the state of the control process, and whether there exists at least one user process, or no user process, in each user local state. The lemma below gives a way of checking safety properties of the family. Liveness properties may be checked in two ways : (a) As the abstract graph simulates every instance, if the liveness property holds of the abstract graph, then it holds of the family, (b) An algorithm is provided in EN 96] for exactly determining whether the liveness property holds of every instance.
Lemma 3 EN 96] The abstract graph simulates every instance of the family. Every nite path in the abstract graph corresponds to a nite computation of some instance.
The paper also shows how to check properties of the form V i Ag(i) by reducing them, using symmetry arguments (cf. ES 93, CFJ 93]) to checking a property Ag(0) of the control process in a modi ed controluser system, which has the same user process, but has C 0 = C k U as the new control process.
Implementation Details
The behavior of the bus and the units as speci ed in the protocol is coded as a SMV McM 92] program. The transition relation of the abstract graph is generated automaticallyby a program which takes the speci cation of control and user processes (in C), and generates SMV code for the abstract graph transitions. This is done by enumerating the reachable local states for a single user process, then generating each transition of the abstract graph by inspection of the local transitions in the unit. States of the abstract graph are represented by subsets of the local user state space. Each subset indicates the presence of at least one user process in that local state, as discussed in the previous section. Thus, for a local user transition s ?! t, the corresponding abstract graph transition adds t as a member of the next abstract state, given that s is a member of the current one. For the parameterized family checked with maximum delay 2, each unit has 254 reachable states; thus, the number of boolean variables needed to encode an abstract state is also 254 (subsets are encoded as a boolean membership vector). The correctness properties C2(a) -C2(e) were checked together on the abstract graph. Since some of these properties are liveness properties, they were checked on the abstract graph, using the fact that it simulates every instance. Every property succeeds on the abstract graph, so that can infer that properties C2(a) -C2(e) hold of the parameterized system with delay 2, which by Theorem 2 implies that they hold of the completely parameterized system. By Lemma 1, this implies that the desired correctness property, (C1), holds of the completely parameterized system. We did not have to invoke the potentially expensive but exact method for checking liveness properties. These checks take about 8 MB and 35 seconds on an Intel Pentium 133 with 32 MB of main memory. Conjunctive partitioning of the transition relation and pre-computation of the reachable states (the strongest invariant) is used. 24 iterations are needed to compute the reachable state space. Incidentally, checking a 15 process instance takes roughly the same amount of time but less space. 8 give algorithms (i.e., decision procedures) for model checking the parameterized system. These papers demonstrate the methods on simple veri cation examples; we believe that our case study is one of the few examples of veri cation of a large and complex parameterized protocol. It is likely that the delay insensitivity theorem is an instance of a general theorem for such types of systems; given such a theorem, the veri cation of this protocol could be indeed fully automated. We believe that careful speci cation of the computational model underlying other protocols will expose constraints that can be utilized, as in this case, for developing decision procedures for large classes of protocols. There is also a need for developing and popularizing notations for expressing such protocols. Remarkably, in the SAE-J1850 document (over 100 pages), there is no succinct protocol description; the description given in Section 2 had to be culled from the entire text. The successful veri cation of the protocol, despite the theoretical result on PSPACE-completeness of the procedure EN 96], is reason to believe that fully automated parameterized veri cation is feasible for reasonably sized protocols.
