Abstract-This paper describes a novel global on-chip interconnect scheme, in which a one UI-delayed symbol as well as the current symbol is sent for easing the sensing operation at receiver end. With this approach, the voltage swing on the channel for reliable sensing can be reduced, resulting in performance improvement in terms of power consumption, peak current, and delay spread due to PVT variations, as compared to the conventional repeater insertion schemes. Evaluation for on-chip interconnects having various lengths in a 130 nm CMOS process indicated that the proposed on-chip interconnect scheme achieved a power reduction of up to 71.3%. The peak current during data transmission and the delay spread due to PVT variations were also reduced by as much as 52.1% and 65.3%, respectively.
I. INTRODUCTION
As silicon technology continues to scale, the dimension of devices and interconnects has been steadily miniaturized [1] . Smaller devices in a scaled technology have reduced parasitic capacitances, resulting in a decrease of logic gate delay [2] . Since technology scaling also leads to an increase in chip density, local on-chip interconnects get smaller and shorter, allowing their speed relatively constant [3] . But, the dimension of global on-chip interconnects has not been scaling down accordingly since overall chip size tends to be constant or marginally increasing due to increasing number of functional modules in a chip. With this trend, we see an ever-increasing disparity between the delays of logic gates and global interconnects, increasing the possibility that global on-chip interconnects determine the overall speed performance of ICs. Therefore, handling the latency problem of global on-chip interconnects is becoming a challenge in future system-on-a-chip (SoC) design [1, 4] .
Conventional repeater insertion techniques have been used for relieving the latency problem of on-chip interconnects [4] [5] [6] [7] [8] . Properly inserted repeaters in a global on-chip interconnect allows the quadratic relationship between the delay and length of the interconnect to be linear by trading off energy and area [9, 10] . The size and the number of inserted repeaters can be optimized for minimum delay, achieving lower latency and higher throughput. However, it is well known that the optimal number of repeaters to achieve minimum delay increases as technology scales down. So, repeater schemes with multiple uniform large-sized repeaters along the wire lead to critical issues in terms of power consumption, silicon area, and design effort. With plain buffers as repeaters inserted closely to optimize the latency cause a larger switching current due to increased parasitic capacitance and the short-circuit current of repeaters, resulting in a significantly increased power consumption. Full-swing of repeaters also causes a large amount of peak current as signal propagates through interconnects. Closely spaced repeaters also increase the delay variation of global interconnects due to process, voltage, and temperature (PVT) variations, limiting the achievable data rate. Bidirectional repeaters used to communicate in both directions also require additional power and area.
In this paper, a novel global on-chip interconnect scheme based on delayed symbol transmission is proposed [11] , which provides higher data rate with lower power consumption, reduced peak current, and smaller delay variation than conventional repeater insertion schemes. The rest of the paper is organized as follows. Section II describes the proposed on-chip interconnect scheme. The results and comparisons are discussed in section III. Finally, the conclusions are presented in section IV.
II. PROPOSED GLOBAL ON-CHIP INTERCONNECT SCHEME
Signals propagating through a global on-chip interconnect usually have distortion due to band-limiting effect such as inter-symbol interference (ISI). Fig. 1 shows signal waveforms at the transmitting and receiving ends of a 10-mm point-to-point on-chip interconnect. As expected, the symbol response at the receiving end is severely distorted due to ISI. Assuming that the signal on the receiving end is compared with a reference voltage for an accurate sensing, they always have to make transitions across the reference level with a sufficient voltage margin in each symbol cycle. In case where the latency of the interconnect is large, the receiver can fail to sense the signal when the voltage difference between the reference and the received signal is smaller than the sensitivity of the receiver, as shown in Fig. 1(a) . To overcome this problem and achieve reliable sensing at higher data rates, a one-cycle-delayed symbol as well as the original symbol can be sent along the interconnect. The signal waveforms at both ends of the interconnect for this case are shown in Fig. 1(b) , where signal waveforms in solid lines represent the original symbols and those in dotted lines represent delayed symbols. Then, since the delayed signal working as the reference voltage for receiver sensing is not fixed but follows the original signal with one symbol time difference, relatively constant voltage difference appears between these signals, fostering a reliable sensing on the receiver side.
The structure of the proposed global on-chip interconnect scheme is shown in Fig. 2 . It consists of a transmitter pair, a pair of interconnects, and a hysteresis receiver. The transmitter pair consists of a delay element for delaying the signal to be sent by one symbol time and a pair of drivers to let the current and delayed symbols driven into the interconnect pair. The global on-chip interconnects are resistively terminated to VDD at receiver side to increase the bandwidth of the interconnection. The hysteresis receiver senses the received signal by comparing the voltages on the two interconnects, where one is having the current symbol and the other the delayed symbol.
The structures of transmitter and receiver are shown in Fig. 3 . The transmitter in Fig. 3 (a) is structured as two simple inverters, whose transistor size is relatively small since the hysteresis function at the receiver well cancels the channel ISI. The hysteresis receiver in Fig. 3 into the gate of transistors MP1 and MP2. The receiver makes the speed and offset voltage independent of input common-mode voltage. When the voltage difference between the received current and delayed signals is smaller than a specific amount, the receiver considers the previously received data as the current data. In the other cases, the received data is determined by the signal values on the channels.
III. COMPARISON AND DISCUSSION
To evaluate the performance of the proposed global on-chip interconnect scheme, a set of global on-chip interconnects having multiple line lengths adopting the conventional and proposed schemes were designed in a 130 nm CMOS technology. In the conventional repeater insertion scheme, inverters are inserted as repeaters at regular interval along a single-ended wire, as shown in Fig. 4 . The size, number, and position of the inverters inserted are optimized for minimum delay [5] , so that a 10-mm wire is divided into ten segments each having an nMOS driver size of 20 m. The layout picture of the test chip is shown in Fig. 5 , where the on-chip interconnects, whose lengths are 5-mm, 10-mm and 15-mm, are made to be driven either by the conventional repeaters or by the proposed transmitters and receivers described in Section II. Metal4 is used for interconnect wires whose For the proposed scheme, with transmitter swing of 1.2 V, the voltage swing at the receiver end is limited to be at 300 mV. Fig. 6 shows the signal waveforms of the on-chip interconnect adopting the proposed scheme, indicating reliable current and delayed symbols transmission and receiver detection with hysteresis. Fig. 7 shows the power consumption of the conventional and proposed schemes for various line lengths. These results are obtained for 5-mm, 10-mm and 15-mm onchip interconnects whose data rates are 2 Gbps, 1 Gbps and 666 Mbps, respectively, when the data switching activity is 100%. For each line length, the size of transistors is individually optimized to provide higher data throughput. The power consumption of the conventional repeater scheme will be proportional to wire length due to increased wire parasitic capacitance and repeater count for a longer wire. It will also be proportional to operating frequency due to increased amount of switching current for a higher data rate. In our evaluation, since the product of the line length and the operating frequency are set to be a constant for different configurations, the amount of power consumption of the conventional repeater schemes remains constant with increasing line length and decreasing data rate. Meanwhile, for the proposed scheme, the power consumption decreases with increasing wire length. This is mainly due to the reduction of the static current, which is invoked by an increase of channel resistance. As described in the previous section, since interconnects in the proposed scheme are resistively terminated, there is a static current through the wire, whose amount decreases with wire length. A smaller overall power consumption of the proposed scheme is achieved by a substantial reduction of voltage swing with no additional parasitic line capacitance contributed by extra circuits like repeaters. The decrease of the static current of the proposed scheme for a longer line also plays a major role on reducing the overall power consumption. Combined by these effects, the power consumption of the proposed scheme having 5-mm, 10-mm and 15-mm wire lengths are reduced by 39.9%, 60.9% and 71.3%, respectively, as compared to the conventional repeater scheme having identical wire lengths, as seen in Fig. 7 . To see a more detailed picture for the power saving, Fig. 8 shows the power consumption of the conventional and proposed on-chip interconnect schemes depending on data switching activity for various line lengths and data rates. Due to full-rail swing and large parasitic capacitance contributed by the lines and the repeaters, the power consumption of the conventional repeater scheme is highly proportional to switching activity factor. Meanwhile, reduced voltage swing with no additional parasitic capacitance on the wire allows the power consumption of the proposed scheme less dependent on switching activity. As in Fig. 7 , the amounts of the power consumption of the conventional scheme at the maximum switching activity for three line lengths are the same to each other, whereas those of the proposed scheme decrease with increasing the line length for the same reason as described before. For the conventional scheme, almost no power is consumed for zero switching activity. For the proposed scheme, there is some amount of power consumption at zero switching activity due to the static current. For the 5-mm wire operating at 2 Gpbs (Fig. 8(a) ), the proposed scheme has a smaller power consumption than the conventional scheme for data switching activities above 0.5, and achieves the maximum power reduction of 39.9% at the maximum data switching activity. As shown in Fig. 8(b) and Fig.  8(c) , where wire lengths are 10-mm and 15-mm and operating frequencies are 1 Gbps and 666 Mbps, power cross points occur at lower switching activities, and larger power reductions of up to 60.9% and 71.3% are achieved, respectively.
The simulated current profile during the transmission operation of on-chip interconnects are shown in Fig. 9 . The result is obtained for 10-mm interconnects operating at 1 Gbps data rate with 100% data switching activity. For the conventional repeater scheme, a large amount of current with repeated high peaks occurs due to full-rail line swing using a number of repeaters. In contrast, for the proposed scheme, the peak current, occurring twice due to the operations of transmitter and receiver, respectively, becomes much lower, resulting in a peak current reduction of as much as 52.1%. Table 1 shows the delay spread of two schemes Power dissipation (mW) Fig. 7 . Power consumption of proposed scheme and conventional repeater.
depending on PVT variations. Since the conventional repeater scheme has many inverters along the wire, it has larger delay variation, and the variation increases for longer wire length. On the other hand, the proposed scheme uses fewer transistors, and so, has less delay variation due to PVT variations, indicating that the delay spread is reduced by 48.7%, 58.9% and 65.3% for 5-mm, 10-mm and 15-mm wires, respectively.
IV. CONCLUSIONS
A novel on-chip interconnect scheme based on delayed symbol transmission is presented. With this approach, the voltage swing on the channel can be reduced, resulting in substantial improvements on power consumption, peak current, and delay spread due to PVT variations. According to the evaluation results in a 130 nm CMOS process for global on-chip interconnects having various lengths, the power consumption was reduced by up to 71.3% as compared to the conventional repeater insertion scheme. The amount of the peak current during data transmission and the delay spread due to PVT variations were also reduced by as much as 52.1% and 65.3%, respectively. Won-Hwa Shin received the B.S.,
ACKNOWLEDGMENTS

