In this paper, we propose a technique for hardware implementation of protocol specifications in LOTOS. For the purpose, we define a new model called synchronous EFSMs consisting of concurrent EFSMs and a finite set of multi-rendezvous indications among their subsets, and propose a conversion algorithm from a subset of LOTOS. The derived synchronous EFSMs can be easily implemented as a synchronous sequential circuit where all the modules corresponding to the EFSMs work synchronously with the same clock. By applying our technique to the Abracadabra protocol, it is confirmed that the derived circuit handles multi-rendezvous efficiently.
INTRODUCTION
Due to the growth of computer networks, efficient implementation of communication protocols has been needed. Thus, the techniques for implementing protocols as hardware circuits have been stressed in recent years.
To specify hardware circuits formally, the description techniques LOTOS [1, 12] , Estelle [15] and SDL [13] have been proposed. With these techniques, we can easily describe schemes for hardware circuits using predefined component libraries, and can verify/validate them. However for rapid prototyping, synthesis techniques from the specifications are desirable. Several ideas for hardware synthesis from formal specifications have been proposed [6] . For example, [15] has proposed a synthesis technique from Estelle. However, the technique does not deal with the highly structured specifications containing synchronization among concurrent modules like multi-rendezvous. In [7] , although a technique to convert timed LOTOS specifications to VHDL specifications has been proposed, only two-way rendezvous between two processes is implemented. [11] has also proposed a technique to synthesize hardware circuits from LOTOS specifications, but focuses only on Basic LOTOS. Since LOTOS specifications are structurally composed of multiple sub-processes which are dynamically invoked, participants of each multi-rendezvous may be decided dynamically each time. In general, for efficient implementation it is desirable that we can transform complicated LOTOS specifications into ones on a flattened model like an EFSM which has no child-parent relationships between processes. It is also important to calculate in advance all the information about the combinations of synchronizing processes, the tuples of synchronizing events and their execution conditions. For the purpose, [10] has proposed a method to derive all possible multi-rendezvous instances from a LOTOS specification. However, the method requires the complete reachability analysis among all parallel processes, which needs time proportional to the product of the numbers of events in those processes.
In this paper, we propose a technique for hardware implementation of protocol specifications in a subclass of LOTOS where choice, synchronous/asynchronous parallel, interruption, sequential composition and dynamic process instantiation are specified with data. For the purpose, first we propose a new model called synchronous EFSMs [16] for representing LOTOS specifications. Synchronous EFSMs consist of concurrent EFSMs and a finite set of multi-rendezvous indications for their subsets (we call the set a multi-rendezvous table). In general, if we use all possible rendezvous instances (i.e., tuples of synchronizing transitions) as the multi-rendezvous table, the number of elements will be quite large. In our model, we reduce the number by composing each rendezvous indication of the tuple of possible event sets. Next, we give an algorithm to derive synchronous EFSMs from a protocol specification in our subclass of LOTOS. In the algorithm, we transform a given LOTOS specification to the parallel composition of EFSMs by introducing internal signals and by replacing each process instantiation with the corresponding behavior expression. We get a multirendezvous table statically from the information about transitions in each EFSM, and parallel operators specified among EFSMs by calculating all combinations of EFSMs synchronizing at each gate and by extracting all possible tuples of transitions.
To implement synchronous EFSMs as a synchronous sequential circuit, we compose a module to evaluate whether each rendezvous indication has an executable transition tuple or not. If several mutually exclusive multi-rendezvous become executable simultaneously for some combinations of EFSMs, we select one of them according to a priority order given in advance.
In Sect. 2, we introduce system design in LOTOS and give the definition of synchronous EFSMs. An algorithm to derive synchronous EFSMs is given in Sect. 3. Sect. 4 and 5 present a hardware implementation technique and its evaluation.
LOTOS AND SYNCHRONOUS EFSMS

System design in LOTOS
In a LOTOS specification, we specify a behavior expression of the protocol consisting of events and their temporal order. To specify the temporal order of events, we Figure 1 Network switch use several operators in LOTOS such as action prefix (which combines events in sequential order) as well as choice, parallel, sequential and disabling between any two sub-behavior expressions. In a LOTOS specification, we can replace any sub-behavior expression by a process instantiation. It means that we can compose the whole behavior expression as the set of structured modules. (See [3] for details of LOTOS)
Especially, parallel operators and process instantiations make it easy to describe system specifications both structurally and simply. Multi-rendezvous enables group communication among any subset of concurrent processes, which drastically reduces communication actions in specifications. Multi-rendezvous also enables system specifications in constraint/resource oriented style [14] where we can design a system as a set of simple modules, and develop each module independently of the others.
For example, let us design a network switch in LOTOS which has the following requirements: the switch has three ports a; b and c which are connected to different networks A, B and C, respectively (see Fig. 1 ). Each data packet arriving at the switch should be forwarded to the appropriate network based on its destination. Here, we route the packets based on the intervals: i.e. if the packet's destination is in the interval 1; N 1, the packet should be directed towards network A (via port a); if in the interval N1; N 2, towards network B; if in N2; 1, towards network C. If the switch receives the packet with the destination 0 (i.e. broadcast), the packet should be broadcasted to all ports except for its reception port. In addition, we suppose networks A and B use the same protocol (e.g. IP), but network C uses a different protocol (e.g. AppleTalk)
. That means the switch should have the facility for protocol conversion of each packet from either A or B to C (and vice versa). It is desirable to design the behavior of each port independently of the others. For a better response time, input and output behaviors for each port should be able to work in parallel. In addition, in order to allow asynchrony among ports we use a FIFO queue shared among them. Here, we introduce three internal ports qi;qoand m for the access to the queue. We suppose a new packet can be added to the queue via qi and the packet in the queue can be taken via qoin FIFO manner (see Fig.1 ). Although m is used in the same way as qo, it is dedicated to the purpose of broadcast.
According to the above discussion, LOTOS processes (P1 and P2) for ports a and b are described as follows: Next, we need a coordinator for the FIFO queue. We describe the queue and its operations by an ADT in LOTOS. The coordinator stores a new packet coming to qi to the queue, or outputs to port either qoor m the last entry in the queue based on its destination. The process C r dfor the coordinator can be described as follows: Finally, we specify the interaction among the above processes. In general, we have to design the mutual exclusion mechanism for the queue since some parallel processes may access it at the same time. However, in LOTOS, we can simply describe such a mechanism with multi-rendezvous as follows (here,INF denotes 1):
In the above specification, one of P1, P2 or P3 synchronizes with Crd to store/get a packet to/from the queue via internal ports qi=qo. When the packet's destination is 0, all of P1, P2 and P3 get the packet at the same time by multi-rendezvous on m.
Synchronous EFSMs
Synchronous EFSMs are the model where any subset of concurrent EFSMs can communicate with each other via gates by multi-rendezvous [16] .
Synchronous EFSMs are given as a set of EFSMs fefsm 1 ; :::; efsm n g and a multirendezvous table R. We suppose that each EFSM can have a finite number of registers, that a certain execution condition called a guard expression can be specified to each transition (i.e. edge), and that each transition can perform several substitutions for the registers in parallel.
In LOTOS, multi-rendezvous is specified by just giving abstract relationships among concurrent processes by parallel and other operators. For the efficient implementation of multi-rendezvous, we should calculate in advance the information about the combinations of synchronizing EFSMs, the tuples of synchronizing transitions (synchronization tuples) and their execution conditions. If we represent the multi-rendezvous simply by the set of all the combinations of transitions in synchronizing EFSMs, the number will be Ok n where n and k are the number of EFSMs and the number of transitions in an EFSM, respectively.
Therefore, in our model, we represent all possible multi-rendezvous instances by a set of rendezvous indications where each indication is a tuple of transition sets on a gate for a combination of synchronizing EFSMs. Here, every combination of transitions (called synchronous tuple) in the sets has the possibility to be executed by multi-rendezvous (i.e. every synchronous tuple satisfies the condition in Table 1 ).
We denote each rendezvous indication by hE 1 ; ; E m ; A 1 ; ; A m i where E 1 ; ; E m is a tuple of synchronizing EFSMs, and each A i is the synchronous transition set which contains transitions executed in E i for the rendezvous. We represent elements of A i as the triples a; p; I. Here, a is the transition name consisting of a gate name and input/output parameters, p is a guard expression, and I is the set of substitutions to undefined variables.
Criteria for each rendezvous indication
To implement multi-rendezvous efficiently, we adopt the following criteria for each The above criteria are not restrictions since we can automatically get the rendezvous indications satisfying them as we will explain in Sect. 3. By using the above technique, each rendezvous indication hE 1 ; ; E m ; A 1 ; ; A m i can represent a maximum of Q m i=1 jA i j rendezvous instances (here, jA i j means the number of elements in A i ). Consequently, the number of elements in the multi-rendezvous table and the time to calculate the table are bound to Op k n where p is the maximum number of combinations of the synchronizing EFSMs on a gate (usually p can be considered as a constant) and k n is the sum of the numbers of the output transitions with different values in EFSMs (in the worst case, k n may be the number of all transitions in EFSMs). For example, for EFSM1j a; b jEFSM2 in Fig. 4 , ten rendezvous instances could be produced. With the above technique, we can reduce the number of tuples to three as shown in Table 3 .
Behavior of synchronous EFSMs
We call each transition e in an EFSM E i an asynchronous transition if e 6 2 A i for every rendezvous indication hE 1 ; :::; E m ; A 1 ; :::; A m i 2 R. We define an asynchronous transition to be executable when the current state has the corresponding outgoing edge. Here, we explain how synchronous EFSMs work in cooperation, using an example in Fig. 2 . In Fig. 2 , the dotted line shows that one of EFSM1, EFSM3 and EFSM5 can synchronize with EFSM7 on gate qiat the same time (one of EFSM2, EFSM4, EFSM6 on gate qo); the solid line shows that EFSM2, EFSM4, EFSM6 and EFSM7 can synchronize with each other simultaneously on gate m. In the initial state s 1 ; s 1 ; s 1 of EFSM1, EFSM5 and EFSM7, they have the outgoing transitions a?idt; c?idt and qi?data size queue M A X , respectively. The first two transitions are asynchronous, so they are executed independently of the other EFSMs when the input data come to the gates. When a?idt is executed in EFSM1, the current state is changed to s 2 ; s 1 ; s 1 . In the state, EFSM1 and EFSM7 have the outgoing edges qi!idt and qi?data sizequeue M A X , respectively. Since queue contains nothing initially, the execution condition sizequeue M A Xholds. Therefore, the tuple q i!idt; qi?data sizequeue M A X can be executed by the rendezvous indication (1) of Fig. 2 . When the tuple is executed, the value of idt is assigned to the undefined variable data, and the current state is changed to s 1 ; s 1 ; s 2 in EFSM1, EFSM5 and EFSM7.
In some state, there may be several synchronous tuples to be executable simultaneously. For example, in Fig. 2 No. tuple of EFSMs tuple of synchronous transition sets Figure 2 Example of synchronous EFSMs
DERIVING SYNCHRONOUS EFSMS
Preliminaries
In this paper, we consider any LOTOS specifications represented in the class of Table 2 although we impose the following restrictions on process instantiations.
Recursive processes are allowed when they are tail recursion (e.g. P := B P ).
Recursive processes which may produce infinite behavior such as P := B1 P B2 exit or P := B o p P op 2 f ; j G j; g, are not allowed. However, if the recursive process call is guarded and the guard expression can be evaluated statically (e.g. Px : = Bjjj x 100 , P x + 1), we treat such a process.
Mutually recursive processes are allowed as long as process calls are guarded and the guard expressions can be evaluated statically.
Let P a r B be the function that represents how many parallel processes can be activated at the same time in a behavior expression B. We say a behavior expression B is a sequential behavior expression (SBE) if P a r B = 1 and B includes 
Transformation algorithm
We transform B main to the parallel composition among SBEs by applying the following operations recursively to its sub-behavior expressions: (1) replace each process instantiation with its behavior expression unless the instantiation appears as a tail recursion (i.e. P := B P ); (2) transform each action prefixed sequence B act such that P a r B act 1 into the parallel composition among sequential behavior expressions (SBEs); (3) transform each choice/disabling/sequential composition among sub behavior expressions into either an SBE or a parallel composition among SBEs. The sub-procedures used above are given below. In choice, any pair from different groups cannot be executed at the same time.
Therefore, we can assign B cho to mx SBEs. Although there are various ways of assignment, here we extract an SBE from each group j1 j n, and compose a new SBE of the choice among the extracted n SBEs. For the sake of simplicity, we assign, to each new i-th SBE, i-th elements of all groups (if there is no i-th element, exit is used instead). In Fig. 3 and those in B 2 to be executed simultaneously. So, we assign to a new SBE each pair of the i-th elements in B 1 and B 2 . In each new SBE, we combine sbe 1;i and sbe 2;i with choice operators so that events in sbe 1;i should be disabled if an event in sbe 2;i is executed. Similar to the case of choice expressions, each SBE must be able to detect if such a disabling event is executed in other SBEs. To do so, we add to each SBE extra events for detecting the disabling events in S t B 2 , and specify multirendezvous so that those events should be synchronized among the new SBEs. By the above assignment each disabling expression is converted to the parallel composition of maxP a r B 1 ; P a r B 2 SBEs.
For the sequential expression B 1 B 2 , similarly we extract each pair of SBEs from B 1 and B 2 and compose a new SBE. Here, we introduce internal signal to indicate that all events in B 1 have been executed and to activate the behavior corresponding to B 2 . With the multi-rendezvous of , we make the new SBEs finish the execution of B 1 at the same time as starting the execution of B 2 . B 1 B l can be transformed in the same way.
If the sequential expression has tail recursion in the process instantiation such as P G X : = B P G V and if P a r B 1, all the SBEs extracted from B
have to get to their initial states when goto(P[G],X:=V) is executed. In that case, for each SBE, we add the transition to the initial state from the state after executing with the corresponding subset of the substitutions of X:=V.
The details of the transformation algorithms and a simplified proof for its correctness are given in [16] .
Calculation of Multi-Rendezvous Table
Only synchronization operators are specified among EFSMs after applying the transformation algorithm. In this section, we give a technique to get the multi-rendezvous table from transitions in each EFSM and operators specified among EFSMs. From the syntax tree of the operators among EFSMs and gate names used in each EFSM, we can get the combinations of synchronizing EFSMs on each gate. If a multi-rendezvous is specified among a subset of EFSMs E on gate g, we denote that by RendE; g . The finite set of all synchronous tuples as well as their execution conditions is statically determined if RendE; g is given for all combinations of E and g. Let RI be the union of RendE; g for all E and g. We calculate the set as follows.
For each RendE; g 2 RI;E = fE 1 ; ; E m g : (1) extract all transitions on gate g, for each E i 2 E (let sync evE i ; g be the extracted transitions for E i ); (2) calculate the set of tuples fe 1 ; ; e m je i 2 sync evE i ; g g where each tuple can satisfy the synchronization condition in Table 1 (let TuplesE; g denote the set).
Next, we convert the synchronous tuples to rendezvous indications as follows, so that they satisfy the criteria in Sect. 2.2. For each TuplesE; g : (i) calculate the set of output values OV where each value is assigned to undefined variables by the synchronization.
(ii) for each E i 2 E and each v 2 OV, calculate the set of transitions A i where each transition satisfies the synchronization conditions in Table 1 transition. For example, a!x j a j a!y j a j a?z is transformed to a!x j a j a?w w = y j a j a?z.
The above procedure could produce some rendezvous indications which contain impossible synchronous tuples since we do not use any reachability analysis technique. A synchronous tuple can be executed as long as the synchronization condition of the tuple holds for a rendezvous indication. Not all synchronous tuples in each rendezvous indication can be executed. That means the multi-rendezvous table itself is a sufficient condition for implying the possible multi-rendezvous instances. In our model, however, only the valid synchronous tuples become executable since the executability of each rendezvous indication is evaluated at each reachable state of EFSMs.
Example of conversion to synchronous EFSMs
After the algorithm Trans is applied to the main behavior expression of Switch in Sect. 2.1, seven SBEs are derived. We show part of them below: In the above SBEs, process instantiations in the form of tail recursions are converted to the appropriate goto transitions. Each derived SBE can be converted to an EFSM easily. Fig. 2 depicts the resulting EFSMs and the multi-rendezvous table. Here, EFSM1 -EFSM7 are converted from S B E I1 -S B E Cr d , respectively.
HARDWARE SYNTHESIS FROM SYNCHRONOUS EFSMS
In this section, we give the technique to convert given synchronous EFSMs into a synchronous sequential circuit (our preliminary work can be found in [5] ). Hereafter, we suppose the modules corresponding to EFSMs work synchronously with the same clock. In each clock cycle, each EFSM can execute a transition as long as its execution condition holds. We assume the components corresponding to ADT functions (e.g. guard expressions) are provided as combinational logic circuits and they can output the resulting values within a clock cycle. The circuit for each EFSM can be implemented easily by well-known techniques [9] . So, here we concentrate on the implementation of multi-rendezvous among EFSMs.
Given EFSMs and a multi-rendezvous table, we implement multi-rendezvous among EFSMs as the multi-rendezvous circuit consisting of the following three sub-parts: (1) executability check part checking whether there exist executable synchronous tuples for each rendezvous indication at each state; (2) data transfer part transferring the required data from a certain EFSM to the other EFSMs so that each EFSM can calculate the execution condition (guard) of its transition. (3) conflict avoidance part selecting a synchronous tuple among some mutually exclusive synchronous tuples; Hereafter, we suppose that synchronous EFSMs are given as hEFSM;Ri where each rendezvous indication r 2 R is represented as hE 1 ; :::; E m ; A 1 ; :::; A m i where each E i 2 E FS M.
Constructing executability check and data transfer parts
For the executability checking part, every EFSM in each rendezvous indication must check whether some transitions in its synchronous transition set are executable at the current state. So in each E i , for every r 2 R , we provide a circuit generating an output signal r i ok which becomes true (i.e. 1) only when a transition in A i becomes executable. Consequently, for the rendezvous indication r there exist some executable synchronous tuples if and only if r 1 ok, ..., r m ok (denoted by r ok) are true.
For the data transfer part, EFSMs with input transitions and an EFSM with output transitions can be determined statically for each rendezvous indication by the criteria in Sect. 2.2. Hence, we provide a path D r among EFSMs for each r so that an EFSM outputs an appropriate value to the path and the others obtain the value.
Constructing conflict avoidance part
The conflict avoidance part generates the signal r en which becomes true when r has the right to execute its synchronous tuple, avoiding conflicts between r and other exclusive rendezvous indications. Although there can be some policies for avoiding conflicts, we adopt a policy that gives a priority (or total order) among rendezvous indications and selects a rendezvous indication by the priority. Any synchronous tuples of r cannot be executed when another conflicting rendezvous indication with priority higher than r is ready to execute a synchronous tuple. Consequently, we construct r en as follows:
r en = r 1 ok r m ok^pri r where pri r = :r 1 en : r h en (Here, fr   1 ; ; r h g are the rendezvous indications with higher priorities than r which conflict with r. prir means whether r has the right to execute its synchronous tuple or not).
Although we have introduced a priority based method, a module generating random numbers can also be used to select one of the conflicting rendezvous.
An example of derived circuit
In this section, we explain how we can derive the circuit in Fig. 5 from the synchronous EFSMs in Fig. 4 and the multi-rendezvous table in Table 3 .
Hereafter, we denote the output signal from E FS M j for the rendezvous indication r i as r ij ok. At the initial state s 1 ; s 1 , EFSM1 first calculates the output value r 11 ok for the rendezvous indication r 1 as follows. As EFSM1's current state is s 1 , EFSM1
calculates the execution condition px 1 _ qx 2 for the transitions a?x 1 px 1 and a?x 2 qx 2 which are transitions in its transition set and that of r 1 respectively. Furthermore, since px 1 and qx 2 need external values to calculate the conditions, 
Further optimization
Clock frequency is an important factor for efficient circuits. To what extent the frequency can go up depends on the critical path of the circuit, which is the most time consuming path of the logic gates used in a clock cycle.
In our technique, for each rendezvous indication r i , (i) r i ok and (ii) r i en have to be evaluated in a clock cycle. The evaluation for (i) requires the transfer of data values among the related EFSMs and the evaluation of the guard expression in each EFSM. The evaluation for (ii) requires the logic gates of loghdepth where h is the number of conflicting rendezvous indications. To shorten the critical path, we can take the following approach: (1) divide each complicated ADT function into several sub-modules with registers to calculate the result in several clock cycles based on the technique in [2] . (2) solve each conflict among multiple rendezvous indications in several clock cycles, for example, by dividing them into several groups.
Another topic of the optimization is to reduce the number of data paths to simplify the resulting circuits. In general, several rendezvous indications can share a data path as long as they do not conflict with each other or cannot be executed at the same time. To share the data path, we allocate a bus available to the related EFSMs for some rendezvous indications. With this technique, all the required data path can be implemented by just N data buses where N is the maximum number of conflicting/simultaneous rendezvous indications.
EXPERIMENTAL RESULTS AND DISCUSSION
We have applied our technique to the LOTOS specification of Abracadabra protocol [4] and constructed the hardware circuit to show that the constructed circuit is reasonably small and fast. We have used a hardware synthesis system PARTHENON [8] which has been developed by NTT.
In general, the modules for ADT functions (e.g. execution conditions) depend on how to implement the data types in the target circuit (e.g. the size of each data). Therefore, in this experiment, we have mainly evaluated the derived multi-rendezvous circuit without the modules for calculating ADT functions in each EFSM.
The greatest effect in the performance will be the maximum depth of the logic gates in the circuit. Eight EFSMs were derived from the LOTOS specification of Abracadabra protocol using the algorithm in Sect. 3 (the derived synchronous EFSMs can be found in [5] ). The number of rendezvous indications was 85, and the maximum number of the depth of the logic gates for the multi-rendezvous circuit was 7. The maximum depth became six after some optimization.
Another criterion should be the size of the resulting circuit. The size of the multirendezvous circuit grows in proportion to the number of rendezvous indications since each rendezvous indication has its own module. The time for selecting a synchronous tuple set grows in proportion to the maximum number of rendezvous indications which may conflict with each other. In general, the number of the logic gates in the circuit depends on the size of data.
We have synthesized the whole circuit with ADT data/functions using 8 bit data. We have used several hardware modules for implementing ADT functions such as integer comparison (e.g. =; ) and addition (e.g. inc; +).
In that case, the whole circuit obtained has about 5000 gates: about 500 gates for ADT functions; about 300 gates for the multi-rendezvous circuit; and the remainder for registers, selectors and control signals for EFSMs.
The maximum depth of logic gates for ADT functions was 15. In the experiment, our technique requires additional 6 logic gates in depth for the multi-rendezvous circuit. That means the whole circuit could work with the clock frequency at least 70 % as high as in hardware circuits without multi-rendezvous. In fact, as we explained in the previous section, the multi-rendezvous circuit could be optimized further for practical use as well as other modules. We can approximately estimate the optimized performance from the circuit automatically derived with our technique. According to the above discussion, we think our technique can be used for rapid prototyping.
CONCLUSION
In this paper, we have proposed a hardware implementation technique from LOTOS specifications. In the technique, by composing each rendezvous indication of the tuple of event sets, we can keep the information about all possible rendezvous instances in a reasonable space. It is important that our conversion algorithm does not require any reachability analysis among parallel processes. Such analysis needs plenty of time proportional to the product of the number of events in parallel processes. Through the experiment for the Abracadabra protocol, we have confirmed our technique can be used for the rapid prototyping. We are going to evaluate our technique by implementing various protocols/systems, and possibly develop the optimization techniques for synthesized circuits for their practical use. To apply the technique to a time extension of LOTOS is part of future work.
