Conventional delay-insensitive (DI) data encodings require 2N+1 wires for transferring N-bit. To reduce complexity and power dissipation of wires in designing a large scaled chip, a DI data transfer mechanism based on current-mode multiple valued logic (CMMVL), where N-bit data transfer can be performed with only N+1 wires, is proposed. The effectiveness of the proposed data transfer mechanism is validated by comparisons with conventional data transfer mechanisms using dual-rail and 1-of-4 encodings through simulation at the 0.25-µm CMOS technology. Simulation results with wire lengths of 4 mm or larger demonstrate that the CMMVL scheme significantly reduces delay-power product values of the dual-rail encoding with data rate of 5 MHz or more and the 1-of-4 encoding with data rate of 18 MHz or more. key words: delay-insensitive data transfer, globally asynchronous locally synchronous system, current-mode multiple valued logic
Introduction
In an SoC design based on a globally asynchronous locally synchronous (GALS) system, some data transfer mechanisms such as bundled data protocol [1] , assuming matcheddelay lines, can cause fatal malfunctions in large scaled chips. A large amount of wires is required for communications among massive function blocks and it is nearly impossible before placement and routing to estimate respective delays of the signals between modules. To make matters worse, the high design complexity arising from complicated wire interconnection generates a wide range of signal delays. Therefore, one desirable property with data transfer mechanisms for various wire lengths is delay insensitivity. Such delay-insensitive (DI) data transfer mechanisms, dualrail data encoding [2] and 1-of-4 data encoding [3] have been studied in the past. However, these methods, in general, suffer from the increased wire cost and power dissipation due to 2N+1 physical wires for transferring just N-bit data.
The data transfer based on multiple-valued logic (MVL) [4] is an effective choice to reduce the number of wires. It is a widely accepted merit of MVL circuits that the same amount of data symbols can be transferred over a lower number of internal interconnections. The MVL circuits, however, have critical drawbacks such as slower switching speed caused by increased complexity, and larger Manuscript received August 13, 2004 On the other hand, the CMMVL circuits [5] can guarantee stable operation even at the lower supply voltage by effectively controlling the amount of internal current. In this paper, the CMMVL circuits are discussed in detail. In order to reduce the wire cost and power dissipation of the conventional DI data transfer mechanisms, a CMMVL-based global interconnection scheme where N-bit data is transferred with only N+1 wires is suggested. In addition, performance and power consumption of the proposed CMMVL circuits are compared with those of the conventional DI data transfer mechanisms, specifically, dual-rail and 1-of-4 encodings.
Proposed Encoder and Decoder Based on CMMVL Circuit
Encoder and decoder modules in CMMVL circuits convert a voltage to a current value and a current to a voltage value, respectively. In general, in each local module of a GALS system, all data are formatted to fit to a single-rail. Thus, interfacing signals of the proposed CMMVL circuits with external environment are based on the single-rail bundled-data handshake protocol [1] . Due to the complexity in implementing completion detection, conventional DI data transfer schemes, dual-rail and 1-of-4 encodings employ 4-phase signaling. In this paper, the 4-phase signaling between encoder and decoder modules is used for DI data transfers. Actually, the CMMVL circuits proposed in this paper are based on the Quasi-Delay-Insensitive (QDI) model [6] . A QDI circuit is considered to have a delay-insensitivity. Both gate delay and wire delay are presumed to be unbounded and an assumption of isochronic forks is augmented to that. Therefore, the difference in delays between destinations is assumed to be negligible in all forking wires of proposed encoder and decoder.
Basic circuit elements such as current source (CS), current mirror (CM), voltage switch current generator (VSCG), [5] for implementing CMMVL circuits. Various current-mode circuits have been proposed using these basic circuit elements [7] , [8] .
Proposed encoder and decoder modules for one-bit DI data transfers employing basic circuit elements are shown in Fig. 1 . Figure 1(a) shows a schematic of the encoder module. Transistors P0 and N0 in a CS generate constant current Is. Since the voltage V GS between gate and source is identical to the voltage V DS between drain and source, P0 and N0 always operate in the saturation region and the constant current Is flows through their drains. The current Is is duplicated to drains of P1 and P2 working as a CM. This drain current can be scaled by varying size of P1 and P2. The encoder's two binary logic signals, req in and data in, are inputs to a VSCG composed of pass transistors N1, N2, and N3. These transistors determine the current levels which are mapped to the combination of input signals by selecting the drain current of the CM. The current levels used for the CMMVL circuit are shown in Table 1 . The data in signal is valid, only when the req in signal keeps logical '1' value in the 4-phase bundled protocol. Therefore, current levels are assigned to 0 and 2I according to logical '0' and '1' values of the data in signal, respectively. Current level I is mapped to return-to-zero phase that is initiated by logical '0' of the req in signal. Following the current mapping in Table 1 , the designer should make the current through N1 and N2 roughly twice as large as that flowing through N3. Those two individual current levels through N2 and N3 are added together to form the combined current Iout.
The decoder's schematic is illustrated in Fig. 1(b) . The three-valued input current Iin is applied to N3. The transistor N3 then copies Iin to drains of N1 and N2 which jointly act as a CC through the coupling with P1 and P2. A current mirror with P1 and P2 should create threshold currents 0.5I 
Implementations
With the simulation environment involving the CMMVL circuit in Fig. 2 , the delay-power product (D*P) is employed as the performance metric and is measured over various wire lengths. The data type used for simulation environment is 2-bit which is the minimal data size to check the delayinsensitivity and the simulation environment is composed of two distinct pairs of encoder and decoder modules, a 2-input C-element for the completion detection of data transfer, and a latch block in a receiver. The ack signal is for the acknowledgement of receiving data and transmitted to the sender module which initiates a data transfer by creating req in signal and data in signals. The relevant wire model for simulations is distributed RC model composed of five cascaded RC cells at the 3rd metal layer of ANAM 0.25-µm technology [9] , [10] . From the parameter values in [10] , resistance and capacitance per 1 mm wire in metal line are calculated as 81.25 ohm and 10 fF, respectively. These values are used for modeling various lengths of wire.
In the transistor level, the average delay from the data in signal to the data out signal was measured by making use of HSPICE and Root Mean Squared (RMS) dissipated power was observed by using NanoSim tool.
To minimize the static power caused by the wasted current in CS blocks in both encoder and decoder modules, CS keeps generating current Is as small as possible. However, CM should scale up Is to make reference currents I and 2I. In addtion, since the CMMVL circuit initially guarantees stable operation at low supply voltage, 2.5 V at the given 0.25-µm technology was not needed. In this paper, we use 2 V as supply voltage to the area of the CMMVL circuit. D*P values were measured while tuning internal transistors was carried out repetitively and the minimal D*P value of each wire length was obtained, when about 54 µA reference current I was generated at 2 V supply voltage. Simulation results of delay and power are explained in Sect. 4 in detail. The measured static power is about 560 µW, regardless of data transfers.
To verify the correctness of the CMMVL circuit in the environment with power supply noise, delay and reference current were observed as the supply voltage of the encoder was changed from 1.7 V to 2.3 V by a step increment of 0.1 V. In Table 2 , the current I depends on the supply voltage. The delay in Table 2 is varied because of the changing differential current due to varying Iin (I or 2I) of the decoder. The overall delay of the CMMVL circuit in Fig. 2 is critically determined by the delay of the decoder in Fig. 1(b) and the C-element. The delay from the decoder to the output of the C-element is mainly decided by the minimum differential current driving the gates connected to the output of the decoder. The minimum differential current is the smaller one out of two available differential currents made by input current Iin and two threshold currents (0.5I and 1.5I). The delay from the decoder to the output of the C-element is getting increased(decreased) as the minimum differential current is decreased(increased). With a voltage range from 1.7 V to 2.3 V, the minimum differential current of 2 V is larger than any other minimum differential current corresponding to other voltage levels. As a result, the delay in case of 2 V supply voltage is the smallest in the voltage range considered.
The current I values between 1.8 V and 2.2 V satisfy the condition of 27 µA < I < 81 µA to guarantee correct operations of the CMMVL circuit, and the CMMVL circuit does not operate correctly with supply voltages of 1.7 V and 2.3 V. This 0.2 V voltage margin for the original supply voltage (2 V) is of reasonable range with the given 0.25-µm technology, according to the experiments in [11] .
The timing waveform of the CMMVL circuit is depicted in Fig. 3 . Two-bit input data (data in0, data in1) is Fig. 3 Results obtained from the simulation environments in Fig. 2 .
transformed into current levels (Iin0, Iin1) and all the original and available 2-bit data patterns, i.e., '00,' '01,' '10,' and '11,' are reconstructed correctly.
Comparisons with Other Schemes
Two DI schemes such as the dual-rail encoding and the 1-of-4 encoding schemes are compared to evaluate the effectiveness of the CMMVL circuit. It is also assumed in Fig. 4 that the transferred data width is 2-bit for the simulation environment of dual-rail and 1-of-4 schemes. The s2d module converts single-rail 2-bit data symbols into DI 4-bit dual-rail or 1-or-4 data encodings. The d2s module, converting back to single-rail 2-bit data symbols, includes a 2-input C-element for the completion detection of each 2-bit data. Note that the number of required wires in Fig. 4 is twice as many as that of the CMMVL circuit in Fig. 2 for the same data size. The area (width * length) of three schemes except for commonly used blocks such as wire model, latch block and delay cell, was measured for 2-bit data. The area of the dual-rail, the 1-of-4, and the CMMVL scheme is 18.65 µm 2 , 24.78 µm 2 , and 27.04 µm 2 , respectively. Design parameters for the CMMVL circuit include 2 V supply voltage, I=54 µA current, and 0.5I=27 µA/1.5I=81 µA threshold currents. The area of the CMMVL scheme is the largest among the three schemes. However, this numerical value is not very large compared with the 1-of-4 scheme, since the area of the CMMVL scheme is only 1.09 times larger than that of the 1-of-4 scheme.
The plots of Fig. 5(a) illustrate the simulation results regarding the delay of the CMMVL scheme (delay mvl), the dual-rail scheme (delay dual), and the 1-of-4 scheme (delay four), according to the variation of wire lengths ranging from 0 mm to 10 mm by a step increment of 2 mm. The CMMVL scheme demonstrates superior delay performance over the others with wire lengths of 2 mm or larger. The slope of delay variation of the CMMVL circuit is seen to be smaller, due to the smaller variation of the current according to varying wire length. This is mainly due to the much larger resistance of current mirror of the encoder in the CM-MVL circuit, as compared to the equivalent resistance of the encoder in voltage-mode circuits.
Power dissipation according to various charging and discharging rates was measured to compare the CMMVL scheme with the voltage-mode schemes the dual-rail and the 1-of-4 schemes. Average dynamic power Pd can be com-
where C L is a load capacitance, V dd is a supply voltage, N is switching activity, and f is a clock frequency [12] . Since the clock frequency is not specified in asynchronous circuitry, the data rate is defined here as the product of N and f (N × f ), meaning the average frequency of the request signal. shows the results of power dissipation of dual-rail (power dual), 1-of-4 (power four), and the CM-MVL scheme (power mvl) with a range of data rate from 2.5 MHz to 100 MHz and a wire length of 10 mm. With 10 mm wire, the delays from data in to data out of the dualrail and 1-of-4 schemes are 5.34 ns and 5.21 ns, but the real turnaround time between req and ack signals is less than 5 ns, because the data in signal is generated 0.5 ns earlier than the req signal in simulations. These measurements confirm the correct operations of circuits with up to 100 MHz data rate of the 4-phase handshake protocol.
The power consumption of the dual-rail scheme is larger than that of the 1-of-4 scheme over all ranges of data rate, since the dual-rail scheme requires twice the signal transitions for the data conversion than the 1-of-4 scheme. The variation of power dissipation in the CMMVL scheme, in general, is smaller than those of the two other schemes.
This observation is due to much less significant portion of the dynamic power to the total power of the CMMVL scheme, contrary to the dual-rail and 1-of-4 schemes. The portion of the dynamic power in total power consumption of the CMMVL scheme varies according to the data rate. For example, the portions of the dynamic power are about 14.2% and 58.0% with the data rate of 2.5 MHz and 100 MHz, respectively.
With data rates of 10 MHz or higher and 33 MHz or higher, the power consumptions of the CMMVL scheme is smaller than those of the dual-rail and 1-of-4 schemes, respectively, as shown in Fig. 5(b) . Consequently, the CM-MVL scheme is superior to the dual-rail and 1-of-4 schemes with 10 mm wire in terms of D*P values over data rates of 5 MHz and 18 MHz. See Fig. 6(f) .
Various comparisons of D*P values among three schemes with other lengths of wire are presented in Fig. 6 and, with these comparisons, the minimal data rates for lower D*P values of the CMMVL scheme compared with two other schemes are listed in Table 3 over different wire lengths. In addition, although it is not easy to estimate switching activities between IPs in a GALS system without a target application, the minimal switching activities are calculated based on 183 MHz which is the operational clock frequency of state-of-the-art on-chip bus [13] , [14] .
There is no benefit of the CMMVL scheme over all the range of data rate with 0 mm wire or 2 mm wire compared with dual-rail and 1-of-4 schemes. However, with input data in every 2.44 cycles (switching activity =1/2.44), the CMMVL scheme is superior to the dual-rail scheme over the wire length of 2 mm. In the 1-of-4 scheme, when the switching activity is about 1/3.98, the gain of the CMMVL scheme can be obtained with 4 mm wire lengths or more. The amount of D*P reduction of the CMMVL scheme depends on the wire length and data rate. For example, with 10 mm wire and 100 MHz data rate, the CMMVL scheme reduces the D*P value of dual-rail and 1-of-4 schemes by 51.4% and 27.0%, respectively.
Conclusions
A DI data transfer mechanism for GALS based SoC design is proposed, using CMMVL circuits with N+1 wires for transferring N-bit data. With conventional DI data transfer mechanisms, 2N+1 wires are required. The proposed CM-MVL circuits are compared with traditional dual-rail and 1-of-4 DI schemes through the delay and power simulations at the transistor level.
By decreasing transmission wires, the simulation results show that the proposed CMMVL scheme is superior to the conventional DI scheme in terms of the D*P and this improvement of the D*P depends on the wire length and data rate. Simulation results for 2-bit data with wire lengths over 4 mm demonstrate that the CMMVL scheme reduces the D*P of dual-rail and 1-of-4 encoding schemes by a maximum of 51.4% and 27.0%, respectively.
