Abstract. In this paper we argue that the semantic issues of discrete vs.
Introduction
The analysis of discrete systems such as programs or digital circuits, while taking into account the temporal uncertainty associated with transition delays, is a very challenging and important task. In MP95] and elsewhere, it has been demonstrated that reasonable models of digital circuits with uncertain delay bounds can be translated systematically into timed automata AD94], which c a n then be analyzed using various veri cation tools. However, this remains a theoretical possibility as long as the performance bottleneck for timed veri cation remains (see the discussion in BMPY97] a s w ell as TKB97]). During the last decade the ? This work was partially supported by the European Community Esprit-LTR Project 26270 VHS (Veri cation of Hybrid systems) and the French-Israeli collaboration project 970maefut5 (Hybrid Models of Industrial Plants).
Verimag laboratory has been engaged in development o f the timing analyzer KRONOS DOTY96] and in attempts to improve its performance using various techniques.
Timed automata operating on the dense real time axis constitute an instance of hybrid automata and their analysis confronts researchers with problems usually not encountered in \classical" nite-state veri cation. These problems, such a s Zeno's paradox (the possibility of making in nitely-many steps in a bounded interval) or the representation of an uncountable number of states, which are related to the foundations of mathematics, sometime give a false impression on the essense of timing analysis. We argue that this essence does not depend on the time domain, and that the di erence between dense and discrete time semantics is \epsilon", so to speak.
We demonstrate these claims by applying discrete and dense techniques to a non-trivial case-study where we obtain the best performance results achieved so far for timed automata. More precisely, w e report the application of two t e c hniques: the BDD-based veri cation using the discrete time semantics ABK + 97] BMPY97], and the \standard" DBM-based method using variable-sized matrices based on clock activity analysis DY96], to a real hardware design, the STARI chip due to M. Greenstreet Gre97] . This chip is an asynchronous realization of a F I F O bu er, composed of a sequence of stages, each consisting of two M u l l e r C-elements and one NOR gate. According to the principles laid out in MP95] , and similarly to TB97], each s u c h stage is modeled as a product of 3 timed automata, each with 4 states and one clock. The (skewed) transmitter and receiver are modeled as two timed automata using a shared clock.
We have modeled the intended behavior of the FIFO bu er operationally as an automaton, and were able to verify that an 18-stage implementation (55 clocks) indeed realizes the speci cation. These are, to the best of our knowledge, among the best performance results for timed automata veri cation, and they show that some real circuits behave better than arti cial examples of the kind we used in BMPY97].
The rest of the paper is organized as follows. In section 2 we g i v e a v ery informal survey of timed automata, their veri cation techniques and their discrete and dense semantics. In section 3 we describe STARI and its desired properties, which are then modeled using timed automata in section 4. The performance results are reported and analyzed in section 5, and future work is discussed at the end.
2 Veri cation using Timed Automata
Timed Automata
Timed automata can represent systems in which actions take some unknown, but bounded, amount of time to complete, in a rigorous and veri able manner. They are essentially automata operating on the continuous time scale, employing auxiliary continuous variables called clocks. These clocks, while in a given state, keep on increasing with time. Their values, when they cross certain thresholds can enable some transitions and also force the automaton to leave a state. Temporal uncertainty i s modeled as the possibility to choose between staying in a state and taking a transition during an interval l u]. , is a nite-state automaton with a combination of discrete transitions and abstract \time-passage" transitions, indicating the temporal evolution of clock values from one equivalence class to another.
Veri cation tools for TA, either construct rst the region automaton (whose size is O(mk d d!) and then use standard discrete veri cation algorithms, or calculate the reachable con gurations successively while representing them as unions of polyhedra of certain restricted form. These \symbolic states", which a r e g e nerated by the equivalences classes of , are polyhedra which can be represented by combinations of inequalities of the form x i c or x i ;x j c where x i , x j are clock v ariables, is either < or , and c an integer constant i n f0 1 : : : k ;1g.
For convex regions, there is a canonical form based on an irredundant set of inequalities, which can be e ciently represented using an O(n 2 )-sized integer matrix, known as the di erence b ounds matrix (DBM). The main computational activity i n T A v eri cation is the storage and manipulation of sets of these matrices during xed-point computations. The major bottleneck is due to the fact that the number and size of DBMs grows exponentially with the number of clocks and the size of k (roughly, the size of the largest constant in the TA after normalization). Moreover the representation of non-convex polyhedra as unions of convex ones is not unique. There have been many attempts to break the computational bottleneck associated with the manipulation of DBMs, such as WD94,H93,B96,DY96,LLPY97,DT98], to mention a few, and to be able to verify larger timed automata. One approach, DY96], is based on the observation that not all clocks are active a t a n y con guration (see also SV96]). A clock x i is inactive in a con guration (q x) if it is reset to zero before any future test of its value. In that case its value can be eliminated from the state description of the system. Consequently one can use variable-sized DBMs restricted to the relevant clocks in every region of the TA. In section 5 we will report the results of clock activity analysis of STARI and the performance of the variable-sized DBMs.
The Joy o f D i s c r e t e Time
There is, however, an alternative semantics for TA based on discrete (and in fact, integer) time, which has already been discussed in early works about real-time logics (see the survey AH92]). According to this view, time steps are multiples of a constant , a n d a t e v ery moment the automaton might c hoose between incrementing time or making a discrete transition. Consider the fragment of a 2-clock timed automaton depicted at the left of Under this interpretation clocks are nothing but bounded integer variables, whose values are incremented simultaneously by time transitions and some of them are reset to zero by certain discrete transitions. Such systems are nitestate, but some components of the state-space, namely the clocks, have additional structure (addition and linear-ordering of clock v alues), which can be exploited by v eri cation algorithms. In particular, any representation scheme for the dense semantics which is based on clock inequalities can be specialized for the discrete semantics. Since on discrete order, a strict inequality of the form x i < c can be written as the non-strict inequality x i c ; 1, discrete regions can be expressed using exclusively non-strict inequalities. Hence even DBM-based methods can be tuned to work better on discrete time since the space of DBMs is smaller. A typical step in the iterative calculation of reachable states is depicted in Figure 2 for the dense (left) and discrete (right) semantics. In addition to these methods one can take a d v antage of the nite-state nature of discrete TA and apply techniques which cannot be applied directly to dense time. One possibility is to push clocks values into states and transform the TA into a nite automaton (either o -line or on-the-y). This provides for depthrst traversal of the state-space, as well as other search regimes. An alternative approach is the one advocated in ABK + 97,BMPY97] i n w h i c h the clocks values are encoded in binary and subsets of them are written as BDDs. The advantage of this approach is that it gives a canonical representation for any subset (convex or not) of the state-space, and that it combines naturally with BDD-based representation of the control states. Most of this paper is a report of one success story of this approach, where a non-trivial system with 55 clocks has been veri ed. However, before that there is one little point t o b e clari ed: although discrete time veri cation is inherently more e cient than dense time, it is certainly less expressive, and one might w ant t o k n o w what is sacri ced in order to improve performance. Our answer, based on the results in HMP92,AMP98], which we explain below is: \not much". parts of x 1 and x 2 is the same in x and x 0 they might reach di erent squares as time goes by. Only if they belong to the same triangular subset of X, namely a set of the form fx : hx 1 i h x 2 ig where hx i i denotes the fractional part of x i , they will meet the same squares during time evolution (Figure 3-(b) ). Combining these facts we obtain the equivalence relation on X which guarantees that all the members of an equivalence class can exhibit essentially the same behaviors.
Why Discrete Time Su ces
This simple (and simpli ed) story becomes more complicated if transition guards and invariants are allowed to contain strict inequalities. In that case some transitions might be enabled in the interior of a region but not in its boundaries, and the region graph becomes a more involved mathematical object with elements of all dimensionalities from 0 to n. If, however, timing constraints are restricted to be closed (i.e. non-strict) every boundary point satis es all the constraints satis ed by the regions in its neighborhood. In particular the set of integral points, the grid f0 1 : : : k ; 1g n \covers" all X in the sense that it intersects the boundaries of all open full-dimensional regions and satis es all the constraints that they satisfy (Figure 3-(c) ). Hence these integral points can be taken as representatives and all the (qualitative) trajectories starting from them cover all the possible behaviors of the system. To be more precise, a discrete run might be a slight variation of some of the dense runs it represents: it may sometimes have few transitions taken simultaneously while in the dense run these transitions are separated by small amount of time. Nevertheless, the following results HMP92,AMP98] underlie the soundness of discrete veri cation:
Theorem 1 (Emptiness of Closed TA). The set of dense behaviors of a closed T A is non-empty i it contains a discrete run.
Combining this with the fact that untimed properties can be expressed as 3 STARI Description STARI (Self-Timed At Receiver's Input) Gre97] is a novel approach t o highbandwidth communication which combines synchronous and self-timed design techniques. Generally speaking, a transmitter communicates synchronously with a receiver through an asynchronous FIFO bu er (see Figure 4) . The FIFO m a k es the system tolerant to time-varying skew between the transmitter and receiver clocks. An internal handshake protocol using acknowledgments prevents data loss or duplication inside the queue. The functioning of STARI is based on a rather intuitive idea. The FIFO m ust be initialized to be half-full. During each period of the clock o n e v alue is inserted to the FIFO b y the transmitter and one value is removed by the receiver. Due to the complementary nature of these actions no control is required to prevent queue under ow or over ow. Short-term uctuations in the clock rates of the transmitter and the receiver are handled by inserting or removing, more items to or from the queue.
Following the STARI model proposed by Tasiran and Brayton in TB97], which di ers slightly from the original description in Gre97], we represent t h e boolean values true and false by dual rail encoding (see Figure 5 ). An auxiliary empty value is needed to distinguish between the case of two consecutive identical values and the case of one value maintained during more than one clock cycle. The transmitter is constrained to send sequences of true and false where each two occurrences of these values are separated by an occurrence of empty. The STARI chip consists of a linear array o f n identical stages, each capable of storing a data value X.
The following two properties need to be proved to ensure the correct operation of the STARI circuit: We specify the desired behavior of an n-stage STARI as an ideal FIFO b u e r combined with a receiver and a transmitter respecting the abovementioned convention (see Figure 6 ). Note that in this speci cation, every transition is labeled with a pair of put and getactions, with the intended meaning that they can occur in any order including simultaneously. The goal of the veri cation is to show that if we h i d e t h e i n ternal operations of STARI, the realizable sequences of put's and get's conform with this speci cation.
The operation principle of a stage k can be summarized as follows: it may copy its predecessor value (X k := X k;1 ) when its successor has already copied (and acknowledged) its current value (X k = X k+1 ). Using the dual rail encoding of data values, such a behavior can be achieved using two Muller C-elements that hold the X:t and X:f components, and one NOR gate for computing the acknowledgment (see Figure 7) .
A Muller C-element w orks as follows: when the two inputs become identical, after some delay the output takes on their value, otherwise the output maintains its previous value. Consider, for example, a situation where stages k and k + 1 hold the empty value, stage k ; 1 t h e true value and Ack k+1 = 0 . W h e n Ack k+1 becomes 1, the C-element f o r X k :f remains unchanged at 0 because its inputs are di erent (i.e. Ack k+1 = 1, X k;1 :f = 0). However, both the inputs of the C-element for X k :t are equal to 1 (Ack k+1 = X k;1 :t = 1), and after some delay, it will switch t o 1 . T h i s w ay the true value has been copied from stage k ; 1 t o stage k.
Modeling STARI by Timed Automata
The correct functioning of STARI depends on the timing characteristics of the gates (the time it takes, say, for a C-element to switch) and its relation with the central clock period and the skew between the receiver and transmitter. We model the uncertainty concerning the delay associated with gates using the bi-bounded delay model, that is, we associate with eve r y g a t e a n i n terval l u] stage. Its state is characterized by t wo boolean variables X k :t, x k :t, the former stores the gate output and the latter stores the gate internal value, i.e. the value to which the gate \wants" to go after the delay. The stable states are those in which X k :t = x k :t. The conditions for staying and leaving stable states are complementary and do not depend on clock v alues: for example, the automaton leaves state (0 0) and goes to the unstable state (0 1) exactly when both its inputs are 1. During this transition the clock v ariable C k :t is reset to zero. The automaton can stay a t ( 0 1) as long as C k :t < u C and can change its output and stabilize in (1 1) as soon as C k :t l C , where l C u C ] is the delay interval associated with a C-element. The automaton for the X:f component (Figure 9 ) is exactly the same (with di erent inputs) and the automaton for the NOR gate (Figure 10 In addition to the automata for modeling the stages, we need three other automata for modeling the transmitter, the reciever and their clock cycle. The global clock cycle is modeled by a simple timed automaton using one clock v ariable C. Whenever C reaches the cycle size p it is reset to zero. (see Figure 11 ).
The transmitter is modeled as a 3-state automaton (Figure 11 ). At each clock cycle it puts a value at the input ports of the rst stage (X 0 :t and X 0 :f), according to the convention that every pair of data items is separated by an empty item. Moreover, the transmission can be done with some skew with respect to the clock cycle, bounded by the s T constant, that is, the actual time of transmission can be anywhere in the interval p ; s T p ].
The receiver is a 1-state automaton (see Figure 11 ) which reads the current output value (i.e. X n :t and X n :f) and acknowledges the reception by modifying Ack n+1 according to whether or not X n is empty. As in the transmitter, a skew bounded by s R is allowed.
Note that the receiver and transmitter skews cannot accumulate during successive cycles. They always range in an interval depending on the (perfect) global clock cycle. However, each one can vary non-deterministically from one cycle to another. This is more general than assuming a xed skew given in advance, or a xed skew chosen at start-up from a given interval. The transitions of these automata are annotated by action names such as put and get whose role is explanatory { they have no e ect on the functioning of the system.
Veri cation Results and Performance Analysis

Discrete Time and BDDs
We have modeled various instants of STARI, each with a di erent number of stages. For each instance we h a ve composed the timed automata and then min- imized them by hiding the unobservable transitions. In Figure 12 one can see the automaton obtained for three stages, where in addition to the put and get actions, we left also the tick action which indicates the end of the global clock cycle. After hiding the tick we obtain a realization of the ideal FIFO a s s p e c i e d in Figure 6 . We were able to prove that each STARI model with n 18 stages, right initialized with m distinct values (m n=2) simulates an ideal bu er of size m.
Moreover, we v eri ed that the transition graphs of the implementation and the speci cation are equivalent with respect to the branching bisimulation vGW89], if we consider only the reading and writing to be observable. The equivalence is veri ed symbolically using the method described in FKM93]. The time and the memory needed 1 to perform this veri cation are presented in Figure 13 . 
Variable-dimension DBMs
We h a ve a l s o v eri ed STARI, interpreted over dense-time, using the DBM representation of Kronos and the forward-analysis technique of DT98]. To o vercome the explosion associated with the size and number of DMBs we h a ve u s e d t h e techniques of DY96,DT98], based on the notion of active and inactive c l o c ks.
As one can see in Figure 8 , the basic building block w h i c h i s u s e d t o m o d e l a timed gate is a four-state automaton with one clock which is active only in the unstable states. So a-priori, each c l o c k i s a c t i v e in half of the global control states. However, in real designs, especially when there is some natural order in which information ows in the circuit, the average over the reachable states of the number of active clocks can be much smaller.
The information concerning clock activity has been extracted automatically from the TA description and,using the variable-dimension DBM library of KRO-NOS, we w ere able to verify STARI with up to 8 stages (27 clocks). The main reason for the relative inferiority compared to the BDD approach i s the large size of the discrete state-space (2 24 ): using DBMs, discrete variables are enumerated, whereas using discrete time and BDDs all variables (including clocks) are handled uniformly, which results in more compact representation. Future techniques, combining BDDs for the state variables and DBMs for the clocks might improve performance signi cantly. Figure 14 shows the time performance and the number of symbolic states (discrete variables plus DBM) generated for number of stages. We have also measured the number of active clocks in each symbolic state and the results con rm our expectations that only a small fraction of clocks are active a t a n y time. For instance, in the case of 8 stages, out of 27 clocks at most 9 were active, and this in less than 4% of the total number of DBMs generated (see diagram on the right of Figure 15 ). In more than 85% of the symbolic states, only 6 to 8 clocks were active. The distributions have the same shape for other STARI con gurations. 6 Discussion Our performance resuts are signi cantly better than those reported by T asiran and Brayton TB97], from whom we h a ve adopted the model. They prove, using techniques developed in TAKB96], that every stage can be abstracted into a timed automaton having 5 states and only one clock. Using this abstract model and the tool Timed-Cospan they were able to verify an 8-stage bu er, while using the detailed model they could not verify more than 3 stages. Another attempt to verify STARI was reported by Belluomini and Myers BM98] who model the circuit using a variant of timed Petri nets which they verify using the tool POSET which employs partial-order methods. The largest example they reported was of 10 stages. Yoneda and Ryu YR99] improve these results signi cantly using circuit-speci c heuristics.
We h a ve demonstrated that a rather large example can be veri ed by t o o l s based on timed automata, and we hope that this will contribute to the wide adoption of timed automata as a model for quantitative timing analysis. Our results indicate that in certain cases, discretized BDD-based approaches outperform other techniques. In the future we will try to characterize the class of systems for which this is the case. It is clear, however, that in examples where large constants (or equivalently, smaller time granularity) are involved, discrete time becomes less attractive.
