## High-speed completion detection for current sensing on-chip interconnects

## E. Nigussie, J. Plosila and J. Isoaho

A novel completion detection technique for delay insensitive current sensing on-chip interconnects is presented. The scheme is based on sensing currents on the data wires and comparing the sum of these currents to an appropriately set reference. The goal is to solve the performance bottleneck caused by conventional voltage-mode detection methods. With the channel width of 64 bits, the proposed method is 4.65 times faster and takes 36% less area than the voltage-mode scheme. Furthermore, its speed does not degrade when increasing the channel bit width. It is implemented in a 65 nm CMOS technology.

Introduction: The semiconductor industry has moved from single-core to multi-core systems and from core-centric to interconnect-centric design flow emphasising the importance of high-performance and reliable on-chip inter-module communication. Process variations in sub-90 nm technologies result in considerable delay variations and thereby difficulty in guaranteeing timing closure. At run time, power supply variation and crosstalk cause additional delay unpredictability on communication channels. Timing uncertainties on channels can be efficiently tackled by using self-timed delay insensitive signalling instead of synchronous (clocked) signalling based on the worst-case timing approach. A delay insensitive channel assumes nothing about the delays in the wires and logic except that they are finite and positive, and therefore the reliability of communication is unaffected by the delay variations. Validity of the data is encoded within the data itself at the transmitter, and the validity test, i.e. completion detection, as well as data decoding is performed at the receiver. The 1-of-2 (dual-rail) and 1-of-4 (quad-rail) codes are the simplest and most commonly used delay insensitive codes, requiring two signal wires per each transmitted bit [1]. Conventionally, completion detection is carried out by sensing either voltage transitions or levels on each data wire. This requires logic circuitry, the delay of which increases drastically when the channel bit width increases, making delay insensitive interconnects problematic for high performance systems. Optimal repeater insertion together with pipelining is the method to achieve high throughput in long voltage-mode on-chip interconnects. In addition, if such a pipelined channel is delay insensitive, each pipeline stage (including the receiver) contains area, power and time consuming completion detection logic. The larger the channel bit width is, the longer the overall time spent in completion detection, because each detection circuit becomes a tree of logic elements. A delay insensitive current mode channel does not require repeaters nor pipelining to boost its throughput, indicating that completion detection is carried out only once at the receiver, and therefore it achieves higher performance and better power efficiency than a pipelined voltage mode interconnect. This has been shown in [2-5] where wire currents are first converted to voltages, and the actual completion detection is carried out in the voltage mode, resulting in a significant speed penalty.

Fast current sensing completion detection: The proposed completion detection technique directly uses the current on each data wire and carries out completion detection in the current mode. The idea is to sum the currents on all the data wires of a channel and then compare this sum current to a reference current. Implementation requires only current mirrors, a current source and a current comparator. The comparator takes, as inputs, the sum current and the reference, and outputs as a full-swing completion detection signal. This signal becomes high when the sum current is greater than the reference current, indicating the validity of every received data signal. Unlike with the conventional voltage-mode scheme, the speed of the proposed scheme is not affected by the channel bit width, because the current summation is carried out by wiring and its delay is only due to comparing currents.

The completion detection circuit is shown in Fig. 1. It supports detection of 1-of-2 and 1-of-4 encoded data, both of which use 2*N* wires to convey *N*-bit data, so that the number of active wires per transmission is *N* in the 1-of-2 case and *N*/2 in the 1-of-4 case. The diode connected transistors Mw(1)–Mw(2*N*) (one transistor per wire) are used to input the currents on the wires and mirror them to the transistors Ms(1)–Ms(2*N*), respectively, which are connected together to generate the sum current I(sum) = Code  $S^{-1}NI(w)$ . Here I(w) is the nominal

current on a single data wire, N is the number of bits, S is the current down-scaling factor ( $S \ge 1$ ) indicating the current drive ratio between the transistors Mw(i) and Ms(i), and Code is either 1 (1-of-2 code) or 0.5 (1-of-4 code) indicating the number of active wires per transmitted bit. By using a scaling factor (S) larger than 1 the power consumption of the circuit can be efficiently reduced. The reference current  $I(ref) = \text{Code } S^{-1}(N - 0.5)I(w)$  is generated using an addition based process invariant current source [6]. The comparator transistor MpC1 mirrors I(sum) to MpC2, and MnC1 mirrors I(ref) to MnC2. The comparator output becomes high (low otherwise) when the current of MpC2 is greater than that of MnC2.



Fig. 1 Proposed completion detection circuit

Owing to process and supply voltage variations, the sum and reference currents may vary from their nominal values affecting the reliability of completion detection. Correct operation requires that the sum of variations in I(sum) and I(ref) is less than  $S^{-1}I(w)/2$ . For I(sum) the variation can be expressed as Code  $S^{-1}NI(we)$ , where I(we) is the worst-case variation of the current on a single wire. For I(ref), according to an extensive analysis of the circuit for N = 2 to 64 bits, I(w) = 200 to  $300 \ \mu$ A, and S = 1 to 5, it is safe to assume that the variation is always less than  $S^{-1}I(w)/6$ . Requiring that the sum of these variations is less than  $S^{-1}I(w)/2$  and solving for Nyields the following constraints for the 1-of-2 (Code = 1) and 1-of-4 (Code = 0.5) codes

$$N < \frac{1}{3} SNR \quad 1 \text{-of-}2 \tag{1}$$

$$N < \frac{2}{3}SNR \quad 1 \text{-of-4} \tag{2}$$

where SNR = I(w)/I(we) is the signal-to-noise ratio of a single data wire. Hence, the higher the SNR is, that the larger number of bits (*N*) can be reliably transmitted and detected. Furthermore, for a given SNR, a 1-of-4 encoded channel can be twice as wide as a 1-of-2 encoded channel, because the number of active wires is half that of the 1-of-2 case. The relation between *N* and SNR for the 1-of-2 and 1-of-4 encoded channels is shown in Fig. 2.



Fig. 2 Bit-width against SNR of wire for reliable detection

Simulation results: The performance of the proposed current sensing completion detection circuit was examined along with a voltage-mode reference design for different channel bit widths. The simulation was

carried out using 65 nm CMOS technology from STMicroelectronics in Cadence Spectre with a 1 V supply voltage. The data wire current I(w)was set to 210  $\mu$ A and scaled down by five (S = 5) in the detection circuit. The reference current I(ref) was set to  $0.5 \times 5^{-1}(N - 0.5) \times$ 210  $\mu$ A corresponding to a 1-of-4 encoded channel (Code = 0.5). The results are shown in Table 1. The delay of the circuit was measured to be constant 52 ps for all considered channel bit widths from 2 to 64 bits. The wider the channel is, the faster the proposed solution is compared to the voltage-mode reference design. For instance, it is 4.65 times faster in the case of 64 bits. Moreover, an area saving of 36-51% is gained. As a trade-off, the power consumption is greater depending on the bit width (as shown in Table 1) and the selected scaling factor. However, for a global on-chip interconnect this overhead is tolerable, since power is mainly consumed in driving the data wires rather than in completion detection. As an example, for a 64-bit channel of 3 mm length ( $I(w) = 210 \mu A, S = 5$ ), the proposed detection circuit consumes only 6.83% of the total power.

 Table 1: Simulation results: delay and power consumption

| N [bits] | Delay (ps) |       | Gain     | Power (mW) |       | Power        |
|----------|------------|-------|----------|------------|-------|--------------|
|          | Prop.      | Conv. | in speed | Prop.      | Conv. | overhead (%) |
| 2        | 52         | 100   | 1.92X    | 0.108      | 0.067 | 61.2         |
| 4        | 52         | 139   | 2.67X    | 0.206      | 0.137 | 50.4         |
| 8        | 52         | 168   | 3.23X    | 0.396      | 0.278 | 42.4         |
| 16       | 52         | 194   | 3.73X    | 0.790      | 0.558 | 41.6         |
| 32       | 52         | 217   | 4.17X    | 1.575      | 1.120 | 40.6         |
| 64       | 52         | 242   | 4.65X    | 3.050      | 2.242 | 36.0         |

*Conclusion:* An ultra-high-speed completion detection technique, based on sensing and summing wire currents, was presented for global current mode on-chip interconnects. Its speed is independent of the channel bit width, which is not the case with conventional voltage-mode completion detection schemes. The proposed detection circuit clearly outperforms its voltage-mode counterparts enabling area

efficient implementation of high speed delay insensitive communication. As a trade-off, it introduces a power overhead which, however, represents only a fraction of the power consumed by the whole signalling circuitry.

Acknowledgment: This work is partially supported by a Nokia Foundation Scholarship

© The Institution of Engineering and Technology 2009 12 February 2009

doi: 10.1049/el.2009.0403

E. Nigussie, J. Plosila and J. Isoaho (Department of Information Technology, University of Turku, Joukahaisenkatu 3-5 B, 20014 Turun Yliopisto, Finland)

E-mail: ethnig@utu.fi

## References

- 1 Martin, A.J., and Nyström, M.: 'Asynchronous techniques for system-onchip design', *Proc. IEEE*, 2006, **94**, (6), pp. 1089–1120
- 2 Nigussie, E., Lehtonen, T., Tuuna I, S., Plosila, J., and Isoaho, J.: 'Highperformance long NoC link using delay-insensitive current-mode signaling', *J. VLSI Des.*, 2007, 2007, pp. 13
- 3 Oh, M.-H., and Har, D.-S.: 'A novel mechanism for delay-insensitive data transfer based on current-mode multiple valued logic'. PATMOS, 2004, Santorini, Greece, pp. 691–700
- 4 Takahashi, T., and Hanyu, T.: 'Implementation of a high-speed asynchronous data-transfer chip based on multiple-valued currentsignal multiplexing', *IEICE Trans. Electron.*, 2006, **E89-C**, (11), pp. 1598–1604
- 5 Nigussie, E., Plosila, J., and Isoaho, J.: 'Area efficient delay-insensitive and differential current sensing on-chip interconnect'. IEEE Int. SoC Conf., California, USA, September 2008, pp. 143–146
- 6 Pappu, A.M., Zhang, X., Harrison, A.V., and Apsel, A.B.: 'Processinvariant current source design: methodology and examples', *IEEE J. Solid-State Circuits*, 2007, 42, (10), pp. 2293–2302