As the geometries of integrated circuits continue to shrink into the deep nanometer regime, the impact of on-chip interconnects is dominant on the overall system performance. This paper explores the power-delay trade-off in alternate repeater insertion techniques. The repeaters are placed along global on-chip interconnects to compensate the loss in the wires and to regenerate the signal strength. All the repeater insertion techniques with 3-pi RC distributed interconnect model are implemented at 45nm and 180nm technology with supply voltage operated at 1GHz. The performance metrics considered to compare the alternate repeated interconnects are power dissipation, propagation delay and power-delay-product (PDP).
Introduction
Due to aggressive scaling of the VLSI circuits, the overall system performance is being increasingly dominated by the on-chip interconnects. In VLSI design, power and delay are the figures of merit during the selection and implementation of a device in chip fabrication. As VLSI technology is upgrading, the number of transistors per chip is swiftly increasing. This rise in the transistor count institutes routing complexity in interconnect, which leads to an increase in the length of interconnect. The numbers of repeaters required is proportional to the length of interconnect. The designing of repeaters is of more importance in the VLSI chip design. The repeater should be able to operate at higher frequencies, dissipate less power as well as introduce less delay for better performance of a chip.
Repeater insertion has become an increasingly common design methodology for driving long resistive interconnects. Since the propagation delay has a square dependence on the length of an RC interconnects line, subdividing the line into shorter sections by inserting repeaters is an effective strategy to reduce the total propagation delay. A second important advantage of using repeater insertion techniques within interconnect structures is to decouple a large capacitance from the critical path in order to minimize the overall delay of the critical path (Y.I. Ismail et.al, 1999) .
For all the designs considered in this paper, we consider the following performance metrics: power dissipation, propagation delay and power-delay-product (PDP). Power dissipation is an important performance metric of a design that affects feasibility, cost and reliability. The power consumed by the driver, interconnect segments, repeater and receiver need to be optimized. Propagation delay is also of prime concern to speed up the system. The delay can be reduced by introducing accelerators or repeaters along the global RC wires. The power-delay-product specifies the energy consumed by the interconnect system. These performance metrics specify the overall performance of the interconnect system. This paper is organized in this way: Section 2 describes the test bench architecture along with the modeling of distributed 3-pi RC interconnect model and its physical properties. Section 3 presents alternate repeater circuits and their importance in on-chip interconnects. Section 4 illustrates the simulation results in terms of performance metrics like power dissipation, propagation delay and power-delay-product for variable lengths of interconnects. It also presents the discussions on how the performance metrics varies with the variation of supply voltage and temperature. Finally section 5 concludes the paper with the description of the outcomes achieved out of this work.
Testbench Architecture and Distributed RC Interconnects
For presenting a fair comparison for the interconnect system with various repeater insertion techniques that are presented in this paper requires a common and fair testbench architecture. Fig. 1 illustrates the schematic of our benchmark interconnect circuit (Jose C. Garcia Montesdeoca, et. al. 2009 ). It consists of the driver; interconnect with repeaters inserted and the receiver. The driver converts a full-swing input into a reduced-swing interconnect signal, which is converted back to a full-swing output by the receiver. The interconnect line is a metal-3 layer wire with various interconnect lengths of 3mm, 9mm, 12mm and 15mm modeled by a 3-pi distributed RC interconnect model (Rw = 300Ω/mm and Cw = 0.23 pF/mm) with an extra capacitive load C L of 0.25 pF/mm length of wire distributed along the wire (for fanout). To fairly compare the delays of the different schemes, we deliberately add an inverter prior to the driver and an inverter after the receiver with 20 fF capacitive load. The total energy shall include the contributions from both the driver and receiver. 
Alternate repeater circuits

Buffer insertion
Buffer insertion technique (S. Dhar et. al. 1991 ) is shown to be an effective technique for interconnect delay optimization. When the resistance of an interconnect is comparable to or larger than the on-resistance of the driver, signal propagation delay increases proportionally to the square of the interconnection length because both capacitance and resistance increase linearly with interconnect length. Thus, reducing the interconnect length leads reducing the interconnect delay. In buffer insertion techniques, this principal is used by sampling an interconnect into small pieces and separated them by CMOS buffers.
Schmitt Trigger Insertion
Schmitt trigger insertion (Sandeep Saini, et. al. 2009 ) is an alternative to buffer insertion. With shrinking technology, power consumption is increasing in all CMOS devices and hence low voltage and low power designs of Schmitt trigger have been proposed. Schmitt trigger is replaced with conventional buffer due to the following reasons: 1) Schmitt trigger can act as a signal restoring circuit. 2) Lower threshold voltage of the Schmitt trigger allows the reduction in rise time and hence saves the total delay. 3) Bus coding techniques can be eliminated with the use of Schmitt trigger insertion.
Swing Limited Interconnect Circuit (SLIC)
The swing limited interconnect accelerator (Vishak Venkatraman, 2006 ) has a three stage cascaded inverter configuration with keepers. The two keepers M1 and M2 limit the swing on the interconnect from full-rail to reduced-rail. Inverter 1 performs the first stage of level restoration by converting the reduced-rail swing on its input. Also, inverter 1 output provides the necessary operating voltage for M1 and M2. Inverter 2 restores the signal level to full-rail.
The swing-limited interconnect accelerator works by restricting the swing at the input to inverter 1, which is also the output of the interconnect by using M1 and M2. M1 is a NMOS device tied to supply Vdd and M2 is a PMOS device tied to the ground. The need for a NMOS being tied to Vdd and PMOS to ground is to use them to hold inverter-1 input to just around the switching threshold. When the output of the interconnect is pulled up, the output of inverter-1 is pulled down. This turns on the PMOS device which pulls down the output of interconnect. When the output of interconnect is pulled below the midpoint of supply by the PMOS device, the output of inverter 1 is pulled up which turns on the NMOS device. This in turn pulls up the output of interconnect. Thus the output of interconnect in never allowed to discharge and charge to its full-rail. The swing-limited interconnect accelerator is used in long global on-chip interconnects to reduce delay. 
Transient Sensitive Accelerator Insertion
Transient Sensitive Accelerator (TSA) (Tomofumi Iima, et. al. 1996 ) is used to address both the delay time and crosstalk voltage in the event of a highly resistive interconnect. The TSA is connected to the interconnect at some point, It receive data from this point and, after some processing, returns some information to the same point. The interconnect voltage is assumed to have a long rise and fall time stemming from the large RC constant. The TSA contains transient sensitive trigger (TST) circuit. The TST senses the transition of the interconnect voltage at an early stage while the Schmitt-trigger detects the change at the end of the transition. This timing difference in the sensing can be utilized as an acceleration period if we design a circuit that turns on during the interval. This would cause the voltage level of the wire to rise to the final value more rapidly during the transition period. In the same way, a circuit that maintains the interconnect voltage during a steady state can be designed. Since the main purpose of this circuit is to prevent TST from reacting to a small voltage fluctuation, it slightly delays the signal detection of the TST.
The repeater insertion of TSA reduces power consumption as compared to Schmitt trigger, Buffer, SLIC and DTSL. The advantage of TSA is a self-timed operation and further advantage is that it can also be applied to bi-directional communication.
Dynamic Threshold Swing Limited (DTSL) Repeater
The modified version of swing limited interconnect circuit is dynamic threshold swing limited circuit. In this design, dynamic threshold voltage MOS (DTMOS) transistors (Fariborz Assaderaghi, et. al. 1997 ) are used to reduce the power consumption. In DTMOS the floating body and gate are tied together. The DTSL interconnect circuit is shown in Fig. 3 . The DTSL repeater circuit is designed used DTMOS transistors so as to reduce the power as compared to the repeater circuit using conventional MOS transistors. 
Transparent Repeater Buffer
The transparent repeater buffer (Radu M. Secareanu, et. al. 2000) is an amplifier circuit designed to minimize the delay in interconnect lines. This circuit operates as a controlled current source which sources or sinks current on the interconnect lines at specific insertion point. The following requirements are necessary for a TR buffer: 1) the input of the TR buffer is connected to the output, 2) the output is driven in the same sense as the input transition as a response to any input transition, sourcing or sinking current on the interconnect line at the insertion point, 3) the output must auto tri-state after a delay from when the output is driven so that the output does not create a conflict with the following signal transition, 4) the buffer should have minimal delay from the input to the output to increase the insertion efficiency, and 5) the buffer should detect the input transitions at low threshold voltages.
The output is tri-stated by a delayed input signal. This simple circuit has the disadvantage that the output is tri-stated by the input signal, and there is no control on the signal propagation inside the buffer. This disadvantage may activate the output uncontrollably, either for too long a period of time creating a conflict between the output and the next transition, or too short a time, being insufficient for a full output transition. The correct timing is controlled through proper transistor sizing.
Simulation Results
The performance of alternate repeaters discussed in section 3 along with segmented RC wires of variable lengths is analyzed. The designs including various repeater insertion techniques like buffer, Schmitt trigger, transientsensitive-accelerator, swing limited interconnect circuit, dynamic threshold swing limited circuit incorporated with segmented RC wires of 3mm, 9mm, 12mm and 15mm lengths are compared to obtain their performance metrics. The target technology is GPDK 45nm and 180nm CMOS implemented in Cadence analog design environment.
The table I, II and III shows the performance comparison of power, delay and PDP for various repeater insertion techniques operated at 1.0V, 1.8V supply voltage, 1GHz frequency using GPDK 45nm and 180nm technology nodes. Table IV and V shows the comparison of various repeater insertion techniques for varied supply voltage and temperature conditions respectively. From the table I, it is evident that the transient sensitive accelerator and transparent repeaters are power efficient compared to the conventional buffer insertion and Schmitt trigger insertion techniques. From the table II, it is observed that the swing limited interconnect accelerator is better in terms of its delay constraint as compared to the buffer and Schmitt trigger repeater insertion techniques. From the table III, it is noted that the transparent repeater is the most energy efficient than the conventional schemes. 
