Abstract -A hardware logic simulation engine based on decision diagrams is presented. For the data structure of the engine, we propose PMDDs (Paged reduced ordered Multi-valued Decision Diagrams). A unit of this engine consists of memory (RAMs) and control circuits: RAMs store the PMDD data, and the control circuits trace the edges according to the input vectors. The engine consists of several units, and is accelerated by pipelining. Experimental results using a prototype are shown.
I. INTRODUCTION
In this paper, we propose a cycle-based hardware logic simulation engine, or an engine for short. The paper is organized as follows: Section 2 introduces logic simulation based on decision diagrams. Section 3 presents an engine based on a PMDD (Paged reduced ordered Multivalued Decision Diagrams). Section 4 shows performance evaluation and a preliminary experimental result.
11. LOGIC SIMULATIONS BASED O N DECISION DIAGRAMS From here, we will review simulation methods based on DDs (decision diagrams). These methods are theoretically much faster than LCC-based ones. In a BDD (binary decision diagram) [2] , each node corresponds to a variable, and edges labeled with 0 and 1 represent low (v) and high(v), respectively. We only consider ordered decision diagrams, where the input variables appear in a fixed order on all the paths from the root node to a terminal node.
We can evaluate the function by traversing the BDD from the root node to a terminal node. Clearly, the evaluation time for an n input logic function is O ( n ) . Because the simulation time of an LCC-based gate level simulator is O(n2), a DD-based one is O ( n ) times faster than an LCC-based one. By using an MDD (multi-valued decision diagram) [6] , we can make the simulator [3] Then, we have the MDD (2) in Fig. l(b) . B y using MDD (2) , the simulation time is reduced into a half, because the path-length from the root node to the terminal nodes is a half of the BDD.
I

HARDWARE SIMULATION ENGINE
To speed up the engine, we use the following three met hods:
1. Use the DD-based simulation. By this, the engine will be O ( n ) times faster than gate level logic simulators. 2. Use MDD(L)s instead of BDDs. By this, the engine will be k times faster.
3.
Use a pipeline of r processing units. By this, the engine will be r times faster.
than one based on a BDD. An MDD(k) is derived from A. Operation of Simulation Engine the corresponding BDD easily [3] . Fig. 1 shows the concept of the simulation system. The Example 1 Fig. l ( a ) shows the BDD for an 8-input proposed engine consists of memories and control circuits. ,%output function. Partition the input variables into The host computer prepares the data for MDDs represent-(X1,X2,X3,X4), where X1 = ( 2 1 , 2 2 ) , X2 = ( 2 3 , q), ing the simulation target, and send them to the memories of the engine. The host computer also generates input vectors, and send them to the engine. The engine traverses the MDD according to the values of input vectors, and returns the value of the function to the host computer. Although the simulation speed is bounded by the communication speed between the engine and the host computer, we will not consider it here. First, we will illustrate the operation of the BDD-based engine using only one processing unit. Fig. 2 shows the single-unit engine. The RAMS store the BDD data: Each non-terminal node has its index and two next addresses for the 0-edge and the 1-edge.
For an m-output function, we have to traverse the BDD m times. Thus, the engine shown in Fig. 2 requires the memory accesses of O ( n . m).
B. Speedup of the Engine
In this part, we will show two methods to speed up the engine. First, by using MDD(k), we will make it k times faster. Second, by using T processors, we will make it T times faster.
B.l. MDD(k)
The data for an MDD(k) are stored in the memory similarly to the case of BDDs. Because the memory access in an MDD(k) is reduced by a factor of k, the engine will be k times faster than the BDD-based one. However, in general, MDD(k)s require more memory than BDDs. Fig. 3 shows the MDD (2) 
Example 1
However, each non-terminal node in MDD(k) requires 2k
B.2. Speedup by Pipelining
When only one memory system is used to store a decision diagram, we can evaluate the value for only one input vector at a time. In this part, we propose a PMDD(k, r ) (Paged reduced ordered Multi-valued Decision Diagram), which is an MDD(k) partitioned into r pages. Fig. 4 shows the concept of the engine having T processing units. In the PMDD-based engine, each processing unit has an independent memory system and a control circuit. Since the number of nodes in each page is smaller than the case of the single MDD(k), the memory for storing the index and the next addresses can be reduced. Since these units work in parallel, the engine consisting of r processing units has the r-fold throughput. Fig. 5. PMDD(2,2) for the function in Fig. l ( b ) 
A PMDD(k,r) denotes an M D D ( k ) partitioned into r pages. Note that a P M D D ( 1 , l ) is an BDD, while a PMDD(1,n) is a QROBDD (Quasi-Reduced Ordered B-DD). I n the QROBDD, every variable appears along every path f r o m the root node to the constant nodes [7]. A P M D D ( k , r ) is the MDD(k) consisting of r pages where nodes always exist in the 1st level of each page.
Theorem 1 s i z e ( P M D D ( 1 , l ) , f) 5 s i z e ( P M D D ( 1 , r ) , f) -< s i z e ( P M D D ( 1 , n ) , f), where 1 5 r 5 n.
A P M D D ( k , r ) has the following merits:
1. It is a data structure suitable for pipelining. 2. Since the next addresses are limited to a page, the size for the next addresses and the index can be smaller. Fig. 1 (b) into two pages. By using an PMDD(IC, T), we can make the simulation T . IC times faster than an PMDD(1,l).
Example 2 Let us partition the the MDD(2) in
B. Prototype of the Simulation Engine
We developed a prototype of an engine based on P-MDD(1, 2)s. Each unit consists of 160 kilo bytes of S-RAM and control circuits implemented by XILINX CPLDs (XC95108-1OPC84). The benchmark functions are represented by PMDD( 1,2)s. Input vectors are random patterns generated by an LFSR (linear feedback shift register). The prototype engine works at 18MHz. Table 4 compares performance of the engine with the software 
V. CONCLUSION
In this paper, we proposed a hardware logic simulation engine based on PMDDs. It is a pipelined MDDs with control circuits, and can be faster than the corresponding software implementation. The preliminary experiment using a prototype showed promising results. With these results, we are building a larger scale engine. Experimental results for larger configurations will be reported in the final paper.
