In this paper, the impact of the wire grid size on the power-delay-area tradeoff of VLSI digital circuits with differential routing is analyzed. To this aim, the differential MOS current-mode logic (MCML) is adopted as reference logic style, and a complete differential design flow is used. Analysis shows that the choice of the grid size in differential routing has a much stronger impact on the power-delay-area tradeoff, compared to the usual single-ended case. Hence, the grid size is an important knob that must be carefully selected when differential routing is adopted. The dependence of power, delay and area on the grid size is discussed in detail through simple models, and introducing appropriate metrics. To validate the analysis and show basic dependencies in practical circuits, 30 benchmark circuits with an in-house designed MCML cell library were synthesized and routed in 0.18 mm CMOS technology.
Introduction
Interconnects heavily influence the power-delay-area tradeoff in deep-submicron VLSI digital circuits, due to the strong contribution of their parasitics. The impact of interconnects is usually managed with automated CAD tools that perform interconnect-aware physical synthesis and place and route [1, 2] . Such automated design flows are usually available for singleended logic styles, whereas differential logic styles are not explicitly supported [3, 4] . Accordingly, the adoption of differential logic styles requires further work to properly adapt commercial tools.
Until now, differential logic styles such as MOS current-mode logic (MCML) have been widely recognized to provide considerable advantages in terms of power supply noise compared to conventional CMOS logic [5] . From an application point of view, the reduced supply noise in MCML circuits enables a number of applications, such as digital signal processing or error correction in high-accuracy mixed-signal circuits, where substrate noise reduction is key to improving the dynamic range of noise-sensitive analog circuits. As another example, the low supply noise feature is very useful also in cryptographic devices with high level of security, since it makes differential power analysis (DPA) attacks much harder, thereby considerably increasing the level of protection of the secret key [6] . In these applications, the advantage offered by the MCML logic style over standard CMOS circuits has been experimentally demonstrated to be in the order of 2-3 orders of magnitude at least, although this comes at the cost of a power and area penalty [5, 6] . In addition, to make MCML a practical option for commercial chips, the design effort has to be kept close to that of standard CMOS circuits, hence manual design of MCML digital blocks is not a viable approach. Accordingly, the use of standard-cell based automated design methodologies for MCML circuits is mandatory.
In differential logic styles such as MCML, each signal is carried by a pair of wires that switch in opposite directions, thus canceling out the power supply and substrate noise to a large extent [5] [6] [7] [8] [9] [10] [11] [12] [13] . The maximum benefits are obtained when each differential signal pair is routed as a bundle (usually named ''fully differential pair''), in which the two complementary wires have exactly the same length [3] [4] [5] [6] [7] 11] . Until now, a few methodologies have been developed to allow the implementation of fully differential logic circuits with standard CAD tools [3] [4] [5] [6] [7] . In the first step, these methodologies rely on a fictitious single-ended representation of differential signals, in order to allow for using commercial CAD tools. Then, in a post-processing step, the fictitious single-ended cells and wires are turned into the fully differential and logically equivalent counterparts. In such methodologies, it was shown that timing integrity throughout all steps of the design flow requires a fully differential routing, which matches the lengths and parasitic of the two wires belonging to the same differential signal pair. In other words, the two wires belonging to the same pair must be always routed in parallel to each other, as will be discussed in detail in Section 2.
As is well known, automated routing of VLSI circuits is efficiently performed by restricting the possible decisions that the tool can make. In particular, the tool is allowed to place and route wires only at discrete positions in the die, according to a routing grid [8, 9] . In single-ended design flows, the wire grid pitch (i.e., the grid step) is often set to the minimum value allowed by the technology in order to provide maximum integration density. Nevertheless, non-minimum wire grid pitch can bring limited benefits (in the order of 10%) in terms of speed and power consumption [10] , since coupling capacitances between adjacent wires are reduced when their distance is increased. Moreover, current routing tools are able to automatically spread neighboring wires apart when routing space is available. Therefore, the choice of the wire grid pitch is not critical in the case of single-ended routing, and can bring only a modest improvement compared to the case of minimum pitch.
As opposite to single-ended design flows, the impact of wire grid pitch in differential design flows is expected to be strong, since wires belonging to the same differential pair are forced to be close to each other by necessity, and tools are not able to freely adjust their spacing. In addition, wires within the same pair always experience opposite transitions, hence their effective coupling capacitance is always increased by a factor of two due to the Miller effect [2, [11] [12] [13] . For these reasons, the choice of the wire grid pitch is expected to be a critical design variable in differential design flows, and further investigation is needed.
In this paper, the impact of the wire grid size in fully differential design flows is analyzed. In particular, the impact of the wire grid pitch on the power-delay-area tradeoff is analyzed in detail through simple models and design considerations, adopting a differential MOS current mode logic standard cell library and a previously developed fully differential design flow. Simple design metrics to optimize the grid pitch are also introduced. According to the above premises, our analysis is focused on local wires that connect standard cells within the same module, hence effects typically associated with global wires (e.g., wire inductance) will not be considered. 2 Analysis of 30 benchmark circuits in 0.18-mm technology is performed to validate the above considerations. Results show that the proper choice of the wire grid pitch in differential design flows significantly reduces power and area for a given delay constraint. Interestingly, the optimum wire grid pitch was found to be almost independent of the specific circuit under design, hence pitch optimization can be performed only once and used for a large number of different designs. The paper is structured as follows. In Section 2, a complete fully differential design flow is introduced. Qualitative considerations on the impact of the wire grid pitch and comparison between differential and single-ended routing are reported in Section 3, whereas a design metric is derived in Section 4. Validation and simulation results are discussed in Section 5, and conclusions are discussed in Section 6.
Review of a fully differential automated design flow
In order to implement circuits based on differential logic styles, the two wires belonging to the same differential pair must be routed as a bundle [3] [4] [5] [6] [7] , i.e. they must be routed in parallel to ensure that they have the same length and parasitics. This fully differential routing approach has obvious advantages in terms of signal integrity, which is an important aspect in nanometer technologies, especially in the case of low-swing differential logic styles with reduced noise margin [4, 11] . However, the main reason for using fully differential routing is related to timing analysis. Indeed, in fully differential logic styles, the switching of logic gates is triggered by the variations in the differential input voltage; hence the timing arcs should relate input and output differential voltages during the timing analysis of the circuit. Unfortunately, current commercial timing analyzers are not able to model timing of differential signals, as they support only single-ended timing relationships.
The problem is illustrated in Fig. 1 , where the switching of a pair of complementary signals is represented in the case of independently routed wires, i.e. with different length and parasitics, thereby violating the premise of fully differential design flows. Because of the difference in the parasitics associated with each wire, the transitions of the driving gate have different time constants. In other words, from Fig. 1 the two complementary signals OUT and OUT 0 cross the 50% threshold at different points in time, and the point in time where the difference of the two signals crosses the 50% threshold is located somewhere in between (in fact, it can be easily seen that it is close to the average of the two individual points, for small differences of the two time constants). Therefore, if the input-to-output delay is evaluated as the delay at only one of the two single-ended outputs (as allowed by current CAD tools), it underestimates or overestimates the actual delay evaluated on the differential waveform. These timing errors can accumulate and lead to considerable error when evaluating a path delay. Clearly, such timing errors are not acceptable in high-speed applications, since it is likely that the speed constraint will not be actually met. Analogously, such errors are not acceptable in lowpower applications, since the delay overestimation clearly leads to a circuit overdesign, thereby degrading the power efficiency.
According to the above considerations, commercial CAD tools can accurately estimate the delay of differential logic gates only if the two outputs of a differential pair have exactly the same delay. This can be achieved by balancing the parasitics of the two wires Fig. 1 . 50% crossing points for two wires belonging to the same differential pair and the corresponding differential voltage. 2 Observe that issues related to global interconnects are completely different from local (intra-module) interconnects both in terms of the impact of wire parasitics and design issues. Indeed, local interconnects are mainly capacitive and easily prone to routing congestion, whereas global wires exhibit also resistive/ inductive behavior and typically do not suffer from serious congestion [20] .
within that same pair as much as possible, i.e. routing them as a bundle.
A complete fully differential design flow was recently developed, based on the above discussed concept [4] . With no loss of generality, in the following this design flow will be applied to a standard cell library based on MOS current-mode logic (MCML) style [14, 15] . The design flow for differential MCML standard cells is briefly illustrated in the flowchart in Fig. 2 . Essentially, two different views of the cell library are necessary: a logical view, where each pair of complementary signals (differential inputs and output of the cell) is represented as a single port, and a physical view which includes both polarities for each signal. Once the cell layouts are created, they are characterized for timing and power. Logical and physical models are generated for simulation, synthesis (timing library) and placement and routing (abstracts). Then, a number of variants are generated for each cell by inverting the inputs and output in all possible combinations, to take advantage of the free signal inversion available with differential cells (in differential cells, logic inversion is performed by simply swapping pins).
In the circuit automated design, wire capacitance values are properly evaluated to reflect the higher effective capacitance seen in differential wires (more details are provided in Section 3), in order to ensure accurate timing analysis throughout the flow. Based on a standard-cell logic library and a standard HDL description, circuits are then synthesized, placed and routed using standard tools. The resulting circuit is made of fictitious singleended cells and wires, where each wire actually represents a pair Fig. 2 . Flowchart of the fully differential design flow [4] . of complementary signals, according to appropriate design rules that accommodate for the increased wire width. Then, a script translates the single-ended design into a physical equivalent differential design, by splitting each wire into a differential pair, and replacing each cell by its physical counterpart. To correctly connect each wire to the corresponding cell pins, the resulting design is fed back to the router to complete the connections.
Summarizing, thanks to the joint adoption of commercial CAD tools and appropriate scripts, the above design flow permits the automated design of differential digital circuits from their VHDL/ Verilog description to their detailed physical-level design.
Understanding the impact of the routing grid pitch
When using a design flow that includes automated place and routing, the designer has to preliminarily choose the wire grid pitch. Unfortunately, until now no criteria or guidelines have been provided to assist this choice. For this reason, in the following the impact of wire grid pitch is analyzed in detail for fully differential routing, highlighting the interdependence of fundamental design parameters, such as speed, power consumption and area.
Analysis of the power-delay-area tradeoff versus wire grid pitch
In any type of automated routing, as shown in Fig. 3 the wire grid size is set by the pitch P, which is defined as the distance between the middle sections of the adjacent wires. In the same figure, capacitance C coupling,INT schematizes the intrinsic coupling capacitance between the considered wires, whereas C GND represents the grounded capacitive contribution at each wire (i.e., the contribution of the bottom plate, as well as the fringing capacitance to ground of the lateral faces). For a given wire width, C coupling,INT and C GND are proportional to the wire length L via the capacitance per unit length c coupling and c GND , respectively (i.e., C coupling,INT ¼c coupling L, C GND ¼c GND L). Analogously, the external capacitance towards the adjacent wires C coupling,EXT in Fig. 3 is proportional to the overlap length L ov (i.e., the length of the overlapping section of the adjacent wires) via the capacitance per unit length c coupling (i.e., C coupling,EXT ¼c coupling L ov ). In deep-submicron technologies, the capacitances associated with the lateral face (i.e., C coupling,INT and C coupling,EXT ) are well known to dominate over the grounded capacitance C GND , as the lateral face area tends to down-scale slowly compared to the bottom face of wires [2] .
It is useful to observe that the wires belonging to the same differential pair always experience opposite transitions, hence the in-between coupling capacitance C coupling,INT is always affected by the full Miller effect, i.e. it can be modeled as a grounded capacitance (in parallel to C GND ) equal to 2C coupling,INT [11] . On the other hand, the full Miller effect takes place between the considered wire and the adjacent ones only if they switch at the same time, whereas no effect is observed if they switch in different points of time. Hence, the capacitive contribution between each wire of the differential pair and the adjacent one can be schematized as a grounded capacitance equal to a Miller C coupling,EXT , being a Miller the well-known Miller effect coefficient that results to 2 if full Miller effect takes place, and is lower than 2 if this effect occurs only partially [2] . Accordingly, the overall capacitance C wire to ground associated with each wire of a differential pair is proportional to L via the wire capacitance per unit length c wire , according to
Relationships (1a) and (1b) can be used to understand the impact of the wire grid pitch P on the wire capacitance, which is related to performance and power, and area. If the grid size P is small (i.e., close to its lower bound P min set by the technology), c coupling and hence c wire tend to be very high due to the short distance between adjacent wires, thereby degrading speed and power efficiency. At the same time, under low values of P, the maximum possible integration density is obtained. When P is increased with respect to P min , capacitance c coupling tends to decrease. As an example, this is shown by the plot of c coupling versus P/P min in Fig. 4 , where the contribution capacitive contributions of intermediate-level (metal 2-4) layers in 0.18-mm CMOS technology is considered. This is easily explained by considering that an increase in P tends to spread the lateral faces of two adjacent wires apart, thereby reducing the capacitance associated with the parallel plates of the capacitor C coupling,INT . At the same time, the small increase in P does not significantly affect the routing density, as long as no congestion occurs in routed wires, hence the wire length L is roughly unaffected 3 by P. Accordingly, from (1a) and (1b) the net effect of a moderate increase in P is a reduction in C wire , which in turn improves both speed and power efficiency.
On the other hand, if P is strongly increased with respect to P min , the distance between differential wires becomes so high that the routing density is severely degraded and routing congestion occurs. Due to congestion, wires follow longer paths than necessary, hence their length L tends to rapidly increase when increasing P. Hence, despite of the small reduction in c wire (since c coupling p1/P slowly reduces for high values of P), the fast increase in L determines an increase of C wire , according to (1a) and (1b). This effect is further emphasized for high values of P, as the increase in C wire forces the synthesis tool to increase the cell strength for a targeted speed, which in turn further increases the circuit area and hence the wire length.
The above discussed dependence of C wire on the pitch P is summarized in Fig. 5 , from which it is apparent that there is an optimum grid size P opt that minimizes C wire . Observe that this optimum choice of grid size improves speed and power at the same time, and can also slightly reduce the area occupied by the circuit (as was observed in note 1). In other words, speed, power and area are not conflicting requirements in the optimum choice of the grid size P: indeed, the optimum grid size improves the routing efficiency, thereby bringing benefits to speed, power and area at the same time.
Single-ended and differential routing: qualitative considerations and differences
Until now, some results have been published on the impact of the wire pitch only in the case of single-ended routing [10, [16] [17] [18] . In particular, at the best of the authors' knowledge, only [10] explicitly discusses the optimization of the wire grid pitch herein considered. More specifically, [10] shows that an optimum pitch exists, and a modest improvement in power consumption and performance can be achieved (within 10%). Moreover, the optimum pitch is shown to significantly depend on the specific circuit under design. On the other hand, papers [16] [17] [18] do not explicitly consider the wire grid pitch optimization, but they target the design of interconnect hierarchy at the process level, and propose guidelines to select geometrical dimensions of wires. Results in these papers agree well with the qualitative considerations reported in the previous subsection, but they do not provide any information on how to size the wire grid pitch in differential routing, once the process is defined.
In general, it is expected that fully differential routing can also take advantage of the wire grid pitch optimization, although no work in the literature has been devoted to this particular case until now. To understand the differences with respect to the single-ended case, let us observe that the intrinsic coupling capacitance (i.e., the second term in (1b)) dominates over the external coupling capacitance (i.e., the third term in (1b)), since L ov 5L in well-designed circuits. 4 Physically, this is because the external contribution is due only to the generally short overlap between adjacent wires belonging to different pairs, whereas the intrinsic contribution has the largest possible value (since every wire within a differential pair runs parallel to the complementary wire for its entire length). Interestingly, the intrinsic contribution is constant in the design since c coupling depends only on the process, whereas the external contribution depends on ratio L ov /L, which clearly depends on the specific design. Since the latter contribution is negligible, it is expected that the wire capacitance per unit length in differential routing is almost design-independent; hence the wire grid pitch optimization impacts the capacitance of all wires almost in the same way, regardless of the considered design. In other words, the wire grid pitch optimization is expected to be almost unaffected by the specific Fig. 3 . Cross section of a differential pair of wires (dark grey) and two adjacent wires (light grey). Fig. 4 . Capacitance contributions per unit length as a function of the routing pitch P normalized to the minimum allowed by technology P min .
design, i.e. a design-independent optimum pitch can be found. This qualitative result will be shown to agree well with simulation results in Section 5.
From the above considerations, the intrinsic contribution in differential routing is significantly greater than that of singleended wires, whereas the grounded contribution (i.e., c GND in (1b)) is almost the same in both cases. Hence, the reduction of c coupling obtained with the pitch increase (see Fig. 5 ) has a stronger effect on c wire when considering differential routing. Hence, the power/ delay improvement achieved with the pitch optimization in differential routing is expected to be much greater than that of single-ended wires. This consideration will also be validated through comparison with simulations in Section 5.
Summarizing, the wire grid pitch optimization in differential circuits is expected to significantly impact the power-delay-area tradeoff, and the resulting optimum pitch is expected to be roughly design-independent, in contrast to previous results on single-ended routing.
Metrics to estimate the impact of grid size
As discussed in Section 3, a tradeoff between the wire capacitance C wire and area exists in the choice of the wire grid pitch P. In the following, simple metrics that provide information on this tradeoff are discussed.
A reasonable metric that can express the capacitance-area tradeoff should include the product of capacitance and area, or a power of them if we want to put more weight on one of them. To achieve a general metric that permits to find the optimum wire grid pitch P opt that leads to the best capacitance-area tradeoff (see Fig. 5 ), it is sufficient to derive a simple expression of capacitance and area that is valid for P lower than (or comparable to) the optimum pitch P opt , according to Fig. 5 . As was discussed in Section 3.1, in Fig. 4 , for PrP opt the wire length L is roughly constant, hence the dependence of C wire on P in (1) is approximately due only to factor c wire . In regard to area, from Fig. 3 the area occupied by a pair of differential wires is proportional to the grid pitch P and wire length L, the latter of which can be again assumed to be approximately independent of P when evaluating P opt . According to these considerations, the dependence of the capacitance (area) on P is simply captured by c wire (P). Hence, a suitable metric to describe the capacitance-are tradeoff is c i wire P, where exponent i is set to a value greater (lower) than unit if capacitance is more (less) important than area. In this regard, observe the wire area in the region of interest where PrP opt is not a serious issue, since from Fig. 5 the wire length is independent of P, whereas reduction in c wire is crucial. For this reason, more weight should be put on capacitance in the capacitance-area metric. This can be done by introducing an exponent i¼2 in the term c wire , thereby yielding the following capacitance-area figure of merit (FOM)
In (2), the dependence of the wire capacitance per unit length c wire on P can be easily extracted from technology data or from simulations on 3-D field solvers [2] . For example, the dependence of c wire on P is shown in Fig. 4 for the considered 0.18 mm CMOS technology, which has P min ¼0.72 mm, and c wire ¼0.24 fF/mm (0.19 fF/mm) for the differential (single-ended) routing under P¼P min (this difference is due to the additional coupling capacitance contribution between the differential wire pair).
The resulting metric in (2) is plotted in Fig. 6 versus P/P min for the differential and single-ended routing case. In this figure, c wire and P are normalized to the values obtained for the minimum grid size P min allowed by the technology. Fig. 6 reveals that the differential routing can provide significantly higher benefits from pitch optimization, compared to single-ended routing. This observation confirms that the optimization of P is crucial in differential routing, and agrees well with qualitative considerations that are reported in Section 3.2. Inspection of Fig. 6 also shows that the figure of merit in (2) for the differential routing has a slightly flat minimum between 1.5P min and 1.6P min , hence it is reasonable to set P to P opt ¼ 1.5P min in circuits implemented with the considered technology. This flat minimum around P opt ensures that designs around the optimum grid pitch are robust against moderate process variations. In Section 5, it will be shown that this value of P opt agrees well with the optimum found experimentally in several designs.
Finally, it is interesting to compare results obtained for the differential routing with the single-ended case. From Fig. 6 , FOM under single-ended routing is apparently less sensitive to P, i.e. the choice of the grid size in differential routing is more critical than in the single-ended case. This is due to the increased coupling capacitance associated with each differential wire pair, as discussed in Section 3.1, and agrees well with the qualitative considerations in Section 3.2. For the same reason, P opt for singleended routing is lower than that of differential case (P opt E1.2P min from Fig. 6 ), and is close to the minimum value allowed by technology.
Analysis of test circuits and validation
In order to evaluate the impact of routing grid size P on the power-delay-area tradeoff, 30 circuits (ISCAS 85 and 89) taken from the IWLS'2005 benchmark suite [19] were synthesized under different values of the grid pitch. The considered benchmarks are summarized in the first column of Tables 1-3. Each test circuit was synthesized using Synopsys Design Compiler Topographical, which performs logic synthesis and physical optimization according to the wire technology parameters. Routing was performed using Metal-1 to Metal-4 layers. Each circuit was synthesized under several speed constraints in order to validate the results for different performance targets. To this end, each circuit was preliminarily characterized to obtain the minimum delay by performing five synthesis runs (with minimum grid size P min ), starting with very tight timing constraints, and updating the timing constraint for the next run with the result of the previous one. This allowed for obtaining the very minimum delay achievable in the critical path. Then, in order to evaluate the impact of the routing grid at different speed constraints, synthesis runs were then performed for a delay constraint of 1 Â , 1.25 Â , 1.5 Â , 2 Â and 5 Â greater than the minimum value, and with interconnect parasitic data corresponding to the various routing grid pitches adopted (ranging from P min to 1.7P min ). For each of these circuits and for each speed constraint, power and area were also evaluated. The resulting values of the critical path delay, area and power normalized to the case with minimum pitch are reported in Tables 1-3 , which respectively refer to the case of 1 Â , 2 Â and 5 Â delay constraint.
To summarize the results in Tables 1-3 From Figs. 7-9, both power and area are always minimized for P opt ¼1.45P min -1.5P min under any design, which is close to the optimum value of 1.5P min that was theoretically obtained in Section 4 from the minimization of the figure of merit in (2) . Hence, the proposed metric in (2) consistently describes the power-delay-area tradeoff, and can be used for design purposes. Moreover, the optimum pitch is almost independent of the considered design, which agrees very well with qualitative considerations in Section 3.2. This is very interesting from a design point of view: indeed, this means that the optimum pitch can be found once and for all, then the same value can be used in different designs.
According to Fig. 7a , the adoption of minimum pitch leads to a 1.7 Â increase in power and 1.3 Â in area for the 1 Â delay constraint, compared to the optimum case, thereby confirming that the optimization of P under differential routing is critical and has a strong effect on the power-delay-area tradeoff.
Comparison of Figs. 7-9 also shows that the optimum pitch is also independent of the delay constraint. However, the benefits of the pitch optimization tend to be reduced when the delay constraint is relaxed. Indeed, the power (area) under the optimum pitch is reduced by 20-45% (10-30%) when 1 Â or 2 Â delay constraint is assumed, compared to the minimum-pitch case. The power (area) saving reduces to 5-35% (less than 10%) when considering the 5 Â delay constraint. This means that the pitch optimization is effective in reducing power and area for realistic cases where a high or moderate performance is required, whereas it is less advantageous in designs with very loose delay constraint. This can be intuitively explained by observing that, tight delay constraints force the synthesis tool to use high-strength cells, which suffer from high power consumption and area. Equivalently, when pitch is optimized, the resulting decrease in the wire capacitance leads to the adoption of cells with smaller strength, thereby significantly reducing the overall power and area (see note 1). On the other hand, under loose delay constraint, minimum-strength cells are usually adopted; hence the wire capacitance reduction due to the pitch optimization does not lead to a reduction in the cell power-area, because cells are already minimum-sized.
Finally, a moderate reduction of the gate count (in the order of 10%) was observed under the optimum pitch (curves are omitted for the sake of compactness). This can be explained by observing that, under minimum pitch, the wire capacitance is so high that it is advantageous to split each wire into several shorter wires, i.e. to use a larger number of gates. For the same above reasons, the gate count is largely independent of the grid pitch for loose delay constraints.
Conclusion
In this paper, the impact of routing grid pitch on the powerdelay-area tradeoff has been analyzed in the case of intra-module fully differential routing. Analysis has showed that the wire grid pitch must be carefully set in circuits with differential routing, as opposite to traditional single-ended circuits, whose power-delay-area tradeoff is not so insensitive to the grid pitch. To quantitatively evaluate this tradeoff, a simple metric was introduced, and various interesting properties were derived from design considerations. The optimum grid pitch predicted by this metric agrees well with the optimum obtained in real designs, and is almost independent of the specific circuit under design.
The design of 30 test circuits in 0.18 mm technology has shown that the pitch optimization can lead to a power and area saving at the same time, which, respectively, range from 20% to 45% and 10% to 30% for an assigned delay constraint. Reduced advantages are observed in circuits with very loose delay constraint.
