In this paper, we present a new algorithm that performs automatic interface synthesis between two synchronous hardware modules with incompatible data communication protocols. We introduce the Data Path State Machine (DPSM) which captures data path dependencies. This allows control logic for data paths to be synthesized which is optimized for bandwidth over multiple transactions.
INTRODUCTION
The recent advent of Systems-On-Chip products, the growing complexity of designs and stringent time to market pressures are all factors for the so called design productivity gap. Unsurprisingly, pressures have therefore been exerted on EDA companies to develop tool environments to encourage the reuse of previous designs. The increasing reuse of RTL hardware blocks makes the interfacing of RTL hardware blocks important. Communication between these blocks is made passible if proper interface circuits are introduced. Manually adapting these interfaces is a tedious and error prone process. Instead, methods and algorithms to automatically synthesize interfaces need to be developed.
The problem can be expressed as: given rke producer and consumer dara communication prorocolx and a descripfion of fke dora park rhar inrefaces fke M O sides, generate an opfimal (in rems of peformance) inferface machine nuromaricolly that will synchmnize and preserve rke meaning of fke dma between fke rwo sides.
Passerone [PRSV981 showed that if protocols are represented naturally as Deterministic Finite Automata (DFA), the product FSM can be pruned to implement interface control logic. Passerone has argued that DFAs are not as easy for designers to use for protocol specification as regular expressions. In this paper we will assume that the protocol specifications can be translated automatically into equivalent DFAs as shown by [PRSV98] . We will address the problem of automatic interface synthesis from protocol DFAs.
Passerone's algorithm uses acyclic DFAs to simplify synthesis, and so does not optimize latency and bandwidth over multiple transactions. Furthermore it does not consider data path issues, We present a new algorithm that deals with more realistic interface synthesis in which protocols are represented by cyclic DFAs. The introduction of a Data Path State Machine (DPSM), capturing data path dependencies, allows control logic for data paths to be optimized for bandwidth.
The rest of the paper is organized as follows: Sect.2 gives a brief description of previous and related work, Sect.3 presents terminology required in the rest of the paper, Sect.4 presents the pro-
Producer Consumer
Figure 1: Problem Definition tocol specification and DPSM formalisms, Sect.5 presents the algorithm. Finally, Sect.6 describes the results and Sect.7 concludes the paper and presents direction for future work.
RELATED WORK
Interface synthesis has been addressed in a broad range of literature. The STG is introduced in [Bor88] as a means to establish synchronization between synchronous andlor asynchronous components. However the protocol specifications are too low level (timing diagrams) and the correspondence between the different pieces of data items are not resolved automatically. The authors in [AM911 describe the protocols using 2 verilog FSMs and a non-deterministic Cartesian product is obtained which forms the interface. This is determinized by using a 3rd machine called the C-machine which describes the intended behavior of the interface. The method does not solve the data correspondence problem mentioned above and does not consider any data path issues.
Passerone et al. [PRSV98] describe the protocols using Regular expressions. These are translated into finite automata which are then synthesized into FSMs using a product algorithm which resolves the pseudo non-determinism that arises by making the composition causal, non-deadlocking and optimal in terms of its latency. It solves the data correspondence problem but is limited in form of communication i.e. only a single transaction, point-topoint communication and common clock are assumed.
More recently there have been efforls by [PCPKOO] to generate hardware interfaces with both sides operating at different clock frequencies by inserting additional states and edges to the product V-6 13 0-7803-7761-31031$17.00 02003 IEEE FSM. The authors in [SCOZ] have recently proposed an interface architecture with 3 FSMDs (one for each of the producer, consumer and queue) and a data path consisting of a queue which they believe is general enough to accommodate any component protocols. The protocols are specified using FSMDs and the synthesis algorithm is responsible for mapping these onto the FSMDs on the target architecture. The algorithm does not address the data correspondence problem.
PRELIMINARY TERMINOLOGY
If c , , cz, ..., c, are the control ports associated with a certain protocol, assuming values from the sets 01, m~, ... , U", the control set of the protocol is defined as the product C = n:=, ai. Elements from the control set are called control symbols.
If di , dz, . .. , d, are the data ports associated with a certain protocol, assuming values from the sets p~, p z , ..., pp. the data set of the protocol is defined as the product D = ne, pi. Elements from the data set are called data symbols.
The alphabet C of the protocol is defined as the product C x D. The elements of the alphabet are called symbols. A formal language over C is a set of strings of symbols from C. A protocol is a formal language over C. In other words a protocol is a set of strings of symbols from C where each string of symbols represents a legal manifestation of a certain transaction or behaviour.
Elements from the protocol set are called tokens. A token represents a complete communication between the producer and consumer. The set of data symbols associated with the token are known as the data type. In a bus transaction a token is broken down into a series of sub-tokens. Each sub-token consists of a string of data and control symbols. The data symbols are associated with the data bus and the control symbols are associated with the control signals. figure 3 . The serial protocol initially waits for the environment to set its input control signal to 1. An associated data symbol d l is waiting to be placed onto the data bus. Once a 1 is received from the environment, the producer puts the data onto the bus. The control signal then goes to 0 one clock cycle later and another data symbol d2 is placed onto the bus. The last data symbol d4 in the data type, is placed onto the data bus two clock cycles later. The 4-phase handshake protocol initially waits for a request signal from the environment. After the environment sets the request signal, it waits for the hardware block to assign the acknowledge signal. After the acknowledge signal is received, the environment puts the data onto the bus. The hardware block reads the data from the bus until the environment drops the request signal.
Definition 2 A Data Path State Machine (DPSM) is a directed acyclic graph defined by the tuple D P S M :=< R,6,C,ro,F > where R denotes the set of protocol states, with ro E Y being the initial state and F E S the final state, 6 2 S x S is the set of state transitions, and C is the set of all possible DFS conditions for the producer and the consumer. The states in the DPSM emphasize
V-614
the acceptance or rejection of a sequence of DFS conditions. The edges are labelled with these conditions. serve the meaning of the data according to their LNDFSM specifications. The LDFSM coupled with the abstract register of figure 5 will form the interface.
Definition 3 Figure 4 . We assume that the register is tied to the producer and that the various data is clocked into the register when it is first made available. State 1 in the DPSM to the register and is waiting for the consumer to relinquish the use of the data contents in the register. If at anytime during which the data path is in state I , the producer writes a new set of data symbols to the register, the contents of the register will be corrupted and the consumer will thereafter read incorrect data.
A possible refined version of the abstract register is shown in Figure 5 . It will consist of D modified registers where D represents the size of the data type. These registers are modified to UPtimize latency. X and Y represent the input and Output pofl sizes respectively in terms of the number of data symbols that can be associated with them.
is entered upon when the producer has written the entire data type data communication as input, The DPSM is used by the of the DPSM, It
The Explore function returns one of three objects: Success, 
SYNTHESIS ALGORITHM
The inputs lo the synthesis algorithm are the LNDFSMs for the producer and consumer data communication protocols and the DPSM for the abstract register. The resulting composition is the Labelled Deterministic Finite State Machine (LDFSM), a subset ofthe canesian product of the 2 LNDFSMs and DPSM. The LDFSM will control the data flow between the producer and consumer, and preFail, Loop. The explore function returns Success for a transition that will definitely lead to a successful transfer of data and Loop if the state already exists on the stack and has already been visited. Fail is returned if the interface is in a stale where the data transfer is non-causal, the buffer has overRown or is uncontrollable (i.e. will lead to either one of these states in an unfriendly environment). The product is computed by performing a depth first recursive search with backtracking on all possible states in the product machine. The required subset of the product machine is constructed by starling from no states and then adding states. Each new state in the tree is explored and the pseudo non-determinism that arises is resolved by choosing the transitions which make the resulting composition causal, controllable and optimal in terms of its cycle length. In particular, we define a final state as a state which contains backedges to states previously marked on the stack and the minimum cycle length of all the states are computed with reference to these final states. If a final state contains multiple backedges which are non-deterministic, the backedge which results in the smaller cycle length is chosen.
EXPERIMENTAL RESULTS AND DISCUSSION
In the first experiment, a non-slallable serial protocol is interfaced to a 4-phase request acknowledge protocol (see figure 3) . In the second experiment, the same protocols are interfaced to one another as in the first experiment only this time the order in which the producer sends the data symbols to the consumer is reversed. In the third experiment. the 4-phase request acknowledge protocol is
V-615
Figure 6: Explore Function Pseudo Code interfaced to the non-stallable serial protocol. This involves translating the 4-phase request acknowledge protocol input signal ock (as observed by the interface) into an output. and the output signal req into an input. The non-stallable serial protocol transfers one data symbol at a time without interruption until the entire token has been transferred. The start of the next transaction is regulated by the interface. The request acknowledge side uses a bus four time larger in size which is regulated by the request-acknowledge signals.
As expected, the resulting controller FSMs are cyclic. For the same examples, Passerone's algorithm [PRSV98] produced acyclic FSMs because only a single transaction is considered. The main results are summarized in figure 7. The first experiment contains many states because the output of the data is concurrent to the input of the data. The possibility to stop and resume the protocol anywhere during the data transfer gives rise to a large number of states. Also note that concurrency leads lo an exponential increase in the number of state explorations with increasing data type size. This problem can be overcome and is currently being dealt with.
Clearly, one way to significantly reduce the number of explorations is tu record the explored successful product states along with their minimum cycle length whilst performing the recursive search. In this way re-exploration of the same states can be avoided. Despite the current large no. of explorations, the generated controller is optimal in terms of bandwidth.
In the second experiment, the interface will have to wait to the end of the input phase to begin the output phase. Less choice leads to fewer states than in the first experiment. Also note that there are far fewer state explorations. This is because all the illegal states are found to be close to the initial state. The existence of a protocol violation in the third experiment reduces the number of choices fur the interface machine which results in far less states than the first machine. The resulting interface is non-optimal in terms of the bandwidth. To optimize the bandwidth, the capacity of the interconnecting buffer was increased to store two tokens. The number of states increases with increasing buffer size. This is because increasing the buffer size increases the state space and also reduces the prospect for a protocol violation.
CONCLUSIONS AND FUTURE WORK
We presented a novel extension to Passerone's algorithm [PRSV98] which supports multiple transactions. The algorithm finds the optimal solution in terms of the bandwidth. Still, there are a number of interesting extensions. An obvious extension to this work is to devise a DPSM formalism to synthesize optimal control logic for different register configurations. to allow more complex data path synthesis. Another interesting extension would be to extend the approach to optimize the interface for data transfer latency and to determine a means to generate the best interface in terms of both data transfer latency and bandwidth.
