Abstract : This paper discusses ule signaling performance of wave pipelining over on-chip transmis sion lines comparing conventional sigoaling with CMOS static repeater insertion. We experimentally reveal that the wave pipelimng over onchip l"ission lines is about ten times superior in the maximum throughput, latency and dissipates several times less energy per bit compared with the conventional signaling, whereas the required interconnect resource is comparable.
Introduction
With advances in LSI fabrication technology, circuit operating frequency is predicted to increase continuously, and local clock frequency will be I5GHz at 4Snm technology no& [I] . Although interconnect performance will improve due to lower dielectric constant of interlevel metal insulator, global signaling suffers from increase of RC time constant of long wires.
High-speed and large capacity signal transmission is a big challenge in the near future. Recently to at!ack this problem, high-speed signaling and throughjmt driven interconnection are becoming a hot research topic both in design and EDA communities [2], and flip-Ilop (latch) pipelining [3, 4] is %died. The problem of Bip-flop pipelining is large latency and power dissipation. Another approach to increase signaling throughput is wave pipelining, which is widely used in chiptochip and cable serial communication IS].
In this paper, we focus on wave pipelining, and evaluate the performance of onchip global signaling. References [6, 71 discuss the limitation of interconnect pdormance, however the signaling performance with driver and receiver is not
shown. The intercorme ct moddi~ig is widely studied both theoretically and experimentally, such as Ref. We assume a 4Smn process in Roadmap [I] . Figure 1 is the interconnect structure used for Repeater case, and Figure 2 is the structure for Single-End and Differential. MI0 means the tenth metal layer and we assume MI 1 and MI2 are the special thick layers for on-chip tranmuss ' ion lines or power/ground wires. In Figure 1 , there are four signal lines ("S") and and five ground wires("' ).
A shield ground wire is inserted between signal wires to suppress crosstalk noise. In M9 layer, we assume that there are orthogonal interconnects m all wire tracks. In Figure 2 , there eight 4pm width signal lines in MI2 layer and twenty ground wires in MI 0 used for power grid. Shielding ground wires are placed for every eight signal wires in MI 2 layer. When Figure 2 is used for Single-End, S2, S4. S6 and S8 wires are used as shielding wires. As for differential signaling, S1-S2, S3-S4, S 5 -S and S7-8 are differential pairs. The ground wires in the lower layers also affect the characteristics of the signal wires due to return cunent distribution. Therefore tht ground wires in M10 layer are taken into consideration. The interconnect characteristics are modeled by a frequency dependent coupled transmission-lie model [ 131 implemented in a circuit simulator [14] .
In interconnect modeling, we do not consider shunt conductance G, because dielectric loss of insulator is negligible. The attenuation constant of RLGC transmission line is expressed as follows when R < wL, G < wC [ 151.
repeater insertion is 0.Smm. (Figure 3) CMOS static XI invertex. (Figure 4) terminations. The receiver is the same with the driver. where R, L and C are resistance, inductance and capacitance p a unit length. The first term corresponds to conductor loss and t h e second term is dielectric loss. Figure 6 shows the conductor loss and the dielectric loss versus xkquency. The evaluation Situation is also shown in Figure 6 . We here assume that resistance R is expressed as follows for simplicity.
R N Rds + &a, &: fittingcoe5cient, G = t a n q x w c , when is the dc resistance, f is t k p n c y and w = 2n f . tan 6 is the loss tangent of the insulator, and we assume ,that it is a constant of 0.0006, though it depends on ffequency, because the magnitude of tan 6 varies at most two or three times. In on-chip situaticm, we can see that the conductor loss dominates the dielectric loss, and the conductor loss is over one hundred he.9 larger than the dielectric loss even at ITHz. We therefore do not consider the shunt conductance in the experiments shown in the next section We use a transistor model of 5Omn technology based on the lTRS roadmap [ 161. The supply voltage is 0 . N We evaluate the eye diagram at the receiver output. Each receiver has output loading of fanout 2. The input pulses of signal wires are random nonrrtum-zero pattems that are independent of each other. The pulse shape is trapezoidal pulse with pulse period T andIransition time T/10.
In this paper, we assume that 0.7T eye opening in time is necessary for all signaling. and 0.15Vdd eye opening in voltage is required for differential signaling. We evaluate the energy per bit dissipated inside the dashed squares in Figures 3-5 . We also measure the latency from the driver input to the receiver output. Figure 7 shows the maxi" throughput of three signaling methods per channel (one signal line in Single-End and Re peater or one differential pair in Differential). The throughput of Differentlai is the largest and it is 40Gbps. The boughputs of Single-End and Repeater are 2OGbps and 4Gbps respectively. The eye diagrams of each signaltng method at the "Ithroughput are shown inFigures 8,9 and IO. The performance ofthe signaling over repeater-inserted interconnects is poor, and it is not sufficient for 45mn technology, whose predicted local clock 6equeney is 15GHz [ 11. Every repeater injects crosstalk noise through mutual capacitance and inductance, and jitter accumulates though each signal line bas shielding wires. The limitation of Differential comes from the attenuation of on-chip transmission lines. Inter-symhol interference @SI) gets severe, and the eye closes. There are some techniques to reduce ISI, such as pre-emphasis, equalization, which are common in chip-to-chip or cahle signal transmiss ion, although we do not evaluate then in this paper. As shown in &.(I), the attenuation of +ansmission lines depends on capacitance C. Therefore the reduction of the dielectric constant helps to improvethe Signalingthroughputin Differential andsingle-End. Iftherelative dielectricconstantremaim.4.1 ofpureSi02, we expaimentaly observe that the throughputs of Differential and Single-End decrease to 2OGbps and lOGbps respectively. Figure 11 shows the latency of three methods. We can see that the latency of Repeater is over 70Ops and it is about 10 times larger than those of Differential and Single-End. From Figures 7 and 11 , global signaling based on repeater insertion Figure 12 shows the relation between energy per bit and throughput. Differential dissipates static power, which is decided by the current sources of CML driva and receiva, and hence energy per bit becoms large when the throughput is low.
Experimental Results
H o w m as the throughput i n c m energy per bit gets small, and at 4OGbps signalihg it is becomes 0.073 pJhit, which is less than Repeater case of 0.13 pJibit at 4Gbps, since energy per bit is constant in Repeater case. The energy e5ciency of Repeater is worse as well as the maximm throughput. In power dissipation, signaling of Single-End is the most c5cient. The energy per bit is 0.03 1 pJhit. The end of the transmission line is open-end, andhence static powa is not consumed. Also as throughput increases, the energy pa bit decreases, because the wire is not charged fully at every transition [9, 11] .
We next discuss the required interconnect resource. Suppose we need to design 16oGbps i n t e x c~~e d i~n between memory and a processor F i p e 14 shows the comparison in intexconnect resource. In the case of Differential, four differential pairs are necessary to achieve 160-signaling, andhence the total width of 7 2 m which includes metal width and spacing.
is necessary. Single-End requires eight channels, and the total width becomes 1 4 4~ It is twice of Differential signaling. As for Repeater case, it becomes 1 5 3 . 5~. Although it is dimcult to compare the required interconnect resource due to the End, 20Gbps).
in the metal thickness, the signaling over on-chip transmission lines is not so expensive even in wire resource.
Figures IS, 16 and 17 show the current supplied by the power supply. The current in Differential signaling is almost constant. On the other hand, the current in Repealer and Single-End varies drastically, which causes severe di/& noise.
We can see that Differential signaling is much friendly to power supply network, and it may mitigate a potential problem of on-chip simultanmus switching noise.
Conclusion
In this paper, we experimentally evaluate and compare the performance of tbree on-chip global signaling methods. Signalmg o v a on-chip m "
' 'on lines by wave pipelining is about ten times superior in the maximum throughput and latency to the conventional signaling with CMOS static repeater insertion. Also in the required energy pez bit, signaling over onchip transmission lines is several times better, and the required interconnect resource is comparable. The results reveal that wave pipelining using on-chip transmission lines should replace the conventional global signaling method based on npeata insertion. In the comparison of single-end signaling with CMOS static driver and receiver without termination to the differential signaling with CML driver and the single-end termination, the f o m is superior in energy per bit where as the laner has a good characteristics of Bat current consumption.
