Abstract-An Muser B-server synchronous arbitration circuit is built on a single chip using NMOS Technology. The VLSI layout is modular which consists of 3 basic blocks: Type-1, Type-2, and Type-3. Having VLSI layout of each block, one has to do a few interconnections in order to build M-user B-server arbiter on a single chip. We have justified this statement by building arbiters for 16-user 4-server,4-wr 2-server, and 8-user 2-server. This arbiter design on a single chip can considerably reduce the space for total arbitration circuit in any multiprocessor system. At the same time it is faster and consumes less power.
I. INTRODUCTION
In any hardware design, some of the basic design criterias are speed, power consumption, space, and cost. We always try to minimize these parameters. These parameters or the efficiency of hardware design not only depends on the algorithm and logic behind the hardware, but also on the present technology which is used to realize hardware. Recent development in VLSI has greatly reduced all these. parameters. In this paper we propose a VLSI implementation of an &user 4-server (M-user B-server can be build very easily) arbitration circuit presented in [l] . NMOS and CMOS technology has its own advantages. One of the major advantages of CMOS technology is less power consumption, whereas one of the major advantages of the NMOS technology is high speed. Our aim is to provide fastest arbitration circuit, in order to reduce mean arbitration time. That is the reason why we selected later one to implement the arbitration circuit. One more important aspect of any design is its modularity. If design consists of small number of independent modules, then it can be reconfigured very easily. Our VLSI design of an 8-user 4-server arbitration circuit is modular.
It consist of 3 basic blocks named Type-1, Type-2, and Type-3. Logic diagram of each block is shown in Figures 4 and 5 which were originally presented in [l] . If one wants to make VLSI design of an M-user B-server arbitration circuit, all he has to do is pick up the appropriate types and make connections either manually or using routing tool in VLSI Software.
ARBITER
Generally there are two types of arbiters : synchronous and asynchronous. In a synchronous arbiter, the arbitration process repeats (if there is any request) after a certain interval of time, whereas in an asynchronous arbiter the arbitration process starts as won as a request for the shared resource(s) is made by a device. We are presenting a VLSI design of synchronous arbitration circuit for multiple bus multiprocessor system. A number of arbiter designs for multiprocessor systems have been reported in literature [14]. For a multiple bus multiprocessor system with N processors, M Memory Modules, and B buses the complete arbitration hardware can be realized with M arbiters of the N-to-1 type and one arbiter of the M-user B-server type. An N-to-1 arbiter can be realized by a binary tree of depth [log2 N]C built from 2-to-1 arbiters.. A 240-1 arbiter is shown in Figure. 1. The notation [XIC indicates the ceiling of X, that means [XIC is equal to the smallest integer which is greater than or equal to X. Our M-to-B arbiter is constructed by using suitable combination and interconnection of the three basic types of arbiter blocks : Type-1, Type-2, and Type-3 blocks. The interconnection mechanism among the basic arbiter blocks looks like a multistage interconnection network of a multiprocessor system, but the logic circuit and the control strategy of the basic arbiter blocks are very simple as compared to those of a switching box of a multistage interconnection network.
OPERATION OF THE AREITER:
A multiple bus multiprocessor system with N processors, M Memory modules, and B Buses generally requires two types of arbiters : M number of N-to-1 arbiters to select among N request inputs each associated with a processor and one M-to-B arbiter to assign buses to the memory modules which have been successfully accessed by the. requesting processors.
CH 3381-1/93/$01.00 01993 IEEE Figure 1 . Logic diagram of a 2-to-1 arbitw N-to-Z ARBITER: An N-to-1 arbiter design consists of a binary tree of 2-to-1 arbiters. The circuit diagram of a 2-to-I arbiter is shown in Figure 1 which has two request lines RO and R1, two grant lines GO and G I , a cascaded request output Rc, and a cascaded grant input Gc. If a request is made by raising an input request line, say RO, then the corresponding grant line (GO) will be high if the cascaded grant input Gc is high, otherwise GO will be low.
M-lo-B ARBITER:
The M-to-B arbiter is constructed as a multistage arbitration network, where a stage is built using a number of arbiter blocks of the same type. Figure 2 shows the design of an 8-to4 arbitration circuit, where the lowest stage is built using Type-I blocks, next two stages are built using Type-2 blocks, and the highest stage is built using Type-3 blocks. The description of the Type-], Type-2, and Type-3 arbiter blocks are given below. The block diagrams of the Type-1, Type-2, and Type-3 arbiter blocks with input and output signals are shown in Figure 3 .
Type-Z Blocks :The circuit diagram of this block is shown in Figure 4 , which consists of a state flip-flop plus additional logic gates to handle the grant lines and granted bus number. Each block has two request input lines RO and RI, two grant input lines : a primary grant input line Gp and a secondary grant input line Gs from the next higher stage of the network. There are four main output lines from each block : a primary request line Rp, a secondary request line Rs and two grant lines : GO and G1. In addition to the above mentioned inputloutput lines,there are two inputs (BO and B1) and two output (Bp and Bs) ports to transmit the bus numbers received from the last higher stage to the next lower stage. If a request for a shared resource is made by raising only one of the two lines : RO and R1, then this request is always transmitted to the next higher stage through the higher priority line Rp. In case of two simultaneous requests (i.e. RO=1 and Rl=l), the transmission of the request signals to the next higher stage will depend on the status (Q) of the stage flip-flop. If both RO and R1 are high and Q is reset, then RO will get priority over RI, that means RO will go through the output line Rp, and R1 will go through the output line Rs. BUI if Q is set, then R1 will get priority over R1. If a request propagates through the line Rp then the corresponding grant output will be q u a l to the grant input Gp.
For example, if RO goes through the output line Rp then GO will be e q d to Figure 3 . Type-1, Type-2, and Type-3 arbiter blocks Type-2 Block: The basic difference between a Type-] block and a Type-2 block is that a Type-2 block contains only combinational circuits as shown in Figure 5 . The input and output lines of this block are the same as that of a
Type-] block. In a Type-2 block, request RO always has the higher priority than request R1. That means, if both RO and RI are high then RO will go through the output Line Rs. But, if a request is made only through one of the two lines : RO and RI, then that request always goes to the next higher stage through the higher priority line Rp. If a request goes through the output line Rp then grant signal of that request is equal to Gp, otherwise the grant signal is equal to Gs. The Boolean equations of the output lines Rp and Rs of this block are the same as those of a Type-1 block, while the equations of the lines GO and G1 are given below.The Boolean equations of the bus port bits can be derived from the above equations following the procedure described for the Typ-1 block. Type-3 Block A Type-3 block is similar to that of a Type-2 block except that it contains one request output line, one grant input line and one input port to transmit the available bus number. The governing equations of this block are similar to those of a Type-2 block with Gs and Bs(i), for all i, equal to zero. The circuit diagram of this block is shown in Figure 3 .
GS

I
VLSI IMPLEMENTATION
We have made V U 1 design for 8-user 4-server, 16-user 4-server, 4-user 2-server and %user 2-server arbiter. The layout was made by adopting the Mead-Conway design rules for 1 -m channel silicon gates and with single metal layer technology. We used VLSI CAD tool h.IAGIC to develop layout. The layout for Type-1, Type-2, and Type-3 are made separately. These basic modules are used to build M-user B-server arbiter for different values of M and B. Figure 6 shows an NMOS circuit design for Type-2 block. Similar NMOS design is made for Type-1,Type-2, and Type-3 blocks. Figure 7 shows how to build a 16-toA arbiter using an 8-104
arbiters and an 8-to-2 arbiter using a 4-to-2 arbiter. Since we have not used any pass transistors, pull-up to pull-down ratio ( Z p a p d ) for all the gates is 4:1 in our design. Figure 8 displays some of the performance parameters of each arbiter. This performance parameters includes number of Transistors, chip area and mean arbitration time. To measure mean arbitrationtime we need to know how long it lake to get a grant after a request is made by any device. This delay consist of two parts: Once the device makes the request, the amount of time it takes to reach the request to the granting device. Once the granting device grants the signal, the amount of time it rakes to reach grant signal to the requesting device.
We figured out that such delay shown by VLSI simulator IRSIM is more compared to delay shown by the logic level simulation of the same circuit using VERILW. Most of the delay in the arbiter is due to long interconnection paths. In the recent advance of technology the interconnections in VLSI chips is the main hurdle in the speed of the chip. Of course scaling does not affect the delay due to interconnections. So design with 1 pm channel and .6 p n channel will give almost the same delay in interconnection paths. Lets look at how much delay is contributed in by interconnection path in our VLSI layout. For example using Magic simple 2 input Nand gate gave a delay of 1.5 ns and a single 2 input Nor gate gave a delay of 1.3 ns. But when we have a long interconnection path connecting output of these gates to input of another gates then typical value of delay at the end of interconnection path is 13 ns. Our design has several stages to connect several Type-1, Type-2, and Type-3 blocks. So the most contribution to the mean arbitration time comes from delay due to interconnection path. One easy remedy for this is to widen the interconnection path, but it will require larger transistor to drive interconnections. C Svensson and M. Afghahi[4] have suggested a separate metal layer dedicated for interconnection paths, Lot of research is going on to reduce the delay in the interconnection paths. If some solution comes in this direction, we can drastically reduce mean arbitration time. But right now we have to bear with interconnection path delay which still gives a very impressive mean arbitration time. Figure 8 shows the worst case arbitration delay for all 4 arbiters, which were simulated by IRSIM. A sequence of random requests were generated and were fed to MAGIC simulator IRSIM. The output of IRSIM was scanned by a C Code to fmd the average arbitration delays for all 4 arbiter design. These average arbitration delays are shown in Figure 8 . Average Here we have shown request and grant signals for the case when request pattem makes transition from 1101 11 11 to loo001 11. At t = 200.0 ns, request pattem is 11011111. For an 8-to-4 arbiter, four requests are granted at any time. Here lucky requests are R1, R3, R5, and R7 which can be seen by high G1, G3, G5, and G7. At t = 40011s request pattem switch to 1oooO111. Here we have only four requests, so all four will be granted. This can be seen in the timing diagrams. The grant signal for someone, who has not made a request is assumed don't care. This reduces the hardware at no extra harm. We have also plotted timing diagrams for a 4-to-2, 8-to-2 and 16-to4 arbiters. Figure 8 shows the total number of transistors used for each arbiter design. We have also calculated approximate chip area for each design without considering the pad area. Suppose input request pattem is oooO1111 at any time. Grant pattem will be 11 11 for sure. Now suppose input request pattem changes from oooO1111 to 1 I1 1oooO. In this case grant pattem should stick to pattem I1 11, but it takes some time for grant pattem to seule down at 11 11. The origin of this time is due to gate delays and interconnection path dealys. This time could be very less if we did not have contributions in delay from interconnection paths. Typical value of this time for 840-4 arbiter is 113 ns. When the circuit was simulated with VERILOG we did not get much delay. But when the circuit was simulated using IRSIM of MAGIC CAD tool, this delay was more than that with VERILOG, just because interconnection delay did not come in picture when using VERILOG. _, ---
IV. CONCLUSION
We presented VLSI design of an arbitration circuit in the paper. We also presented simulation results which mainly shows arbitration time which is nothing but a speed of the circuit. The maximum synchronous frequency at which circuit can be operated is calculated. We also indicated limitation which gives little bit larger delay, that is the delay in the interconnection path between different modules. But that is what inevitable in any VLSI layout. Design presented in this paper gives smaller arbitration time and it saves place for any multiprocessor arbination circuit.
