Abstract selective reduction of the crosstalk effects in on-chip buses. By providing immunity from crosstalk in buses, our encoding based techniques reduce In recent times the ratio of the cross-coupling capacitance between ad-the crosstalk induced delay variation effect in on-chip buses. This has the jacent on-chip wires on the same metal layer to the total capacitance of important benefit of reducing the maximum delay as well as reducing sigany wire is becoming quite large. As a consequence, signal wires exhibit nal integrity problems in the bus signals. The bus size overheads of our a significant delay variation and noise immunity problems. This problem techniques are high when a greater crosstalk immunity is desired. This is aggravated for long on-chip buses. In this paper, we develop memory-allows the designer to effectively trade off the degree of crosstalk control based crosstalk canceling CODECs for on-chip buses. We describe an desired with the bus size overhead. In the sequel, we refer to bus overReduced Ordered Binary Decision Diagram (ROBDD) based methodol-head as the additional number of bits required in order to encode a bus in ogy to accurately compute the bus area overhead of the CODECs. We report the asymptotic overhead for CODECs which cancel three kinds of
In recent times, with wiring delays increasing compared to gate de-1. Introduction lays [8] , it is often the case that the critical delay in a circuit is deterCrosstalk has become a significant problem in deep sub-micron (DSM) mined by long buses. In such a case, buses could be encoded with the VLSI design [8] . The aggressive scaling of processes that lies at the heart techniques described in this paper, allowing the design to be operated at of the relentless drive towards smaller and faster ICs results in an increase a much greater frequency. A designer would gladly tolerate the bus size in wiring delays due to increasing wire sheet resistivities. To reduce this overhead involved with the use of our approach, in such a scenario. effect, recent processes have scaled wires only in the horizontal dimension, This paper is organized as follows. Section 2 provides definitions used effectively creating 'tall' wires. As a consequence, the cross-coupling ca-in the rest of the paper. This section also provides a classification (simipacitance (Cx) between two minimally spaced adjacent wires on the same lar to that of [5] ) of bus data patterns based on the maximum amount of metal layer is much greater than the substrate capacitance (Csub) of any of crosstalk incurred by such patterns. In Section 3, we discuss previously the wires.
published approaches for solving this problem. In Section 4, we describe If the ratio r = is large, crosstalk between adjacent wires on the our approach of creating memory-based crosstalk canceling CODECs. In same metal layers manifests in ways that make designs unpredictable. In Section 5 we report the results of experiments that we have performed to particular, it results in a significant delay variation in a wire, depending quantify the tradeoff between the degree of crosstalk immunity achieved on the state of its neighbors. Also, it can result in possible signal integrity by the above techniques, and the bus size overhead incurred. We compare problems, since a static wire can suffer a glitch caused by capacitively our bus size overheads with those reported in [5, 4] , in which memoryless coupled voltages from its switching neighbors. Crosstalk has therefore CODECS to eliminate 4 C,3 C and 2 C crosstalk patterns were described. become a critical design issue in modern IC design.
We conclude the paper in Section 6. Consider three adjacent wires in an on-chip bus, which are driven by signals bi-,, bi and bi+,. The total effective (switched) capacitance of 2. Preliminaries driver bi is dependent on the state of bi-, and bi+,. In the best case , the In this section, we introduce the classification scheme for bus data trantotal effective capacitance of bi is Cmin Csub, and in the worst case2, the sitions which we will utilize in the sequel. Our classification is largely effective capacitance is Cmax 4 Cx + Csub. With r > 1, we observe that borrowed from that introduced in [5] .
»Cax > 1, and hence the delay of bus signals strongly depends on the data Consider an n-bit bus, consisting of signals b1 b2, b3 bn1 bn pattern being transmitted on the bus. As a result of this large delay variation, the worst case delay of a signal in an on-chip bus is also increased, DEFINITION 1. : A Vector v is an assignment to the signals bi asfollimiting system performance. The problems due to crosstalk are aggra-lows: vated in long on-chip buses, since bus signals are longer and therefore bi vi, (where 1 < i < n and vi C {o, 1}). more capacitive, resulting in larger worst case delays3. Therefore [17] , [6] In [15] , [5] , [7] , [11] and [16] , the goal was to avoid 4 C and 3 C trannot a 'C, 3C, 2C, or iC sequence.
sitions. In contrast, our techniques can speed up a bus even further, by en-DEFINITION 7. A p C crosstalk canceling CODEC (or p . C crosstalk suring that the bus never exhibits 2 C transitions as well. In other words, free CODEC) transforms an arbitrary m-bit vector sequence into a n-bit our 2 C free approach can exploit cross-talk to speed up a bus further.
vector sequence (m < n) such that the output vector sequence is a (p1) . Additionally, unlike the above papers as well as [4], our techniques uti-C sequence.
lize memory-based CODECs, resulting in much lower bus size overheads. Our algorithms to find the bus overhead utilizes ROBDDs [3] , and thereby DEFINITION 8. A set Cn of n-bit vectors is said to be a p C crosstalk represents the vector transitions implicitly and therefore compactly. We free clique ifany vector sequence vl --V2 made up ofvectors Vl, V2 C Cn report the bus size overheads for 4 C, 3 C and 2 C free memory-based is a I C sequence (where I < p), and there exists v*,ve C Cn such that CODECs.
v-v* is a (p -1) C sequence.
If a sequence of vectors on a bus is a p C sequence (0 < p < 4), then the 4. Memory-based Crosstalk Cancellation physical interpretation of this is that:
For a memory-based code, let vr be the vector present on the bus at time * This vector sequence has at least one bit b for which there exists contr. Let vr+ be the vector present on the bus at time tr+ . If it is guaranteed secutive vectors that require the driver of this bit to charge a capacitance that for any r, Vr * vr+1 is a p C transition, then the sequence is a p C P Cx +Cub. Note that Cx »Csub sequence (sufficient condition). For an mbit bus, such a sequence satisfies * For this sequence, there does not exist any bit such that the driver of this the property that at any given time tr, there must be at least 2. distinct bit is required to charge a capacitance greater than P Cx ub(p + 1) C free transitions available. In other words, for any vr, there must bit is required to charge a capacitance greater than pv CX + Csub.
be at least 2/ distinct vr+ 's which are (p + 1) C free with respect to vr.
A memoryless CODEC simply encodes an m bit vector with a unique
To decode the data, the receiving decoder needs to know both the curn bit vector. A memory-based CODEC encodes an m bit vector with an n rent received symbol and the previously received symbol. The encoder bit vector. The encoding depends on the k previous n bit vectors that were generates the next symbol based on the data input and the previously transtransmitted on the bus (for a memory depth k).
mitted symbol. As a consequence, memory elements are needed in both Note that in the sequel, if we say that a CODEC is kC -free, we mean the encoder and decoder.
that it results in cross-talk ofmagnitude (k -1)C or less, for any bus tran-A memory-based code will satisfy the (p + 1) C free condition iff for sition.
each vector v in the set, there are at least 2m vectors (including v itself) that are (p + 1) C free with respect to v. It is not required that every pair 3. Previous Work of vectors in the set is a (p + 1). C free pair.
Crosstalk reduction for on-chip buses has been the focus of some recent 4.1 Summary of our Approach research. In [15], the main contribution of the authors was to extend the Our approach to determine the effective bus of width m that can be enElmore delay model to account for distributed nature of self and crosscoded in a k C free manner, using a physical bus of width n consists of coupling capacitances in on-chip buses. They suggest the possibility of two steps: using CODECs to eliminate certain bus transitions. They also suggest that encoding could speed up buses by 2 x (this would be achieved by ensuring * First, we construct an ROBDD GnC free which encodes all vector tranthat bus never exhibits 4 C or 3 C transitions). In [5] , the authors classify sitions on the n-bit bus that are k C free. bus data transitions from a crosstalk viewpoint, and describe memoryless * From GnC free, we find the effective bus width m, such that an m bit CODECs to eliminate 4 C and 3 C transitions on the bus. They show bus canbeencodedin a k C freemannerusing GnC free that the asymptotic overhead when eliminating 3 C transitions is about These steps are described in the sequel.
44%. In [4], the authors describe 2 C and 1 C cross-talk canceling mem-T sfree oryless CODECs. The CODECs described in [5, 4] to CODEC implementation were not quantified. Further, the algorithm to la,w nutvl opt '.Sic h OD fafnto n deemn th bu ovrha reuie aexlctnu rtinoal22n ve-its complement contain the same number of nodes (except for a completor transitions. In contrast, we employ implicit enumeration, resulting in a ment pointer), this enables an efficient construction of Gn fe. Further, more compact representation and therefore a more efficient computation. the ROBDD allows us to represent the legal (and illegal) k C free crosstalk In [7] , the authors reduce crosstalk induced delay variation in buses by transitions on the bus implicitly, sharing nodes maximally and in a canonselectively skewing bus data signals. Finally, [ 11 l] proposes a bus repeater Sulmnerk te sizing methodology which accounts for crosstalk induced delays and conSppose we want to construct GC fre In that case, we allocate 2n trols them by upsizing the drivers. This could result in driver circuits with ROBDD variables. The first n variables correspond to the vector from large power and area requirements. l 2Owhich a transition is made (referred to as v {V v, V2,. , Vn}). The next n variables correspond to the vector to which a transition is made (re-The resulting encoder is memory-based. ferred to as w = {w1,w2, ,wn}). If a vector sequence v* -* w* is legal Given Gnf free we find m using Algorithm 2. The input to the alwith respect to k C crosstalk, then w* -* v* is also legal. In other words, gorithm is m C and C free We first find the out-degrees (self-edges are kC-free V*, W* kC-freew*, goih* n~efrtfn h u-ere sl-de r Gfn GV*,W=) n w*,V* counted) of each v, C V (where V is a hash Figure 1 . The irregularity of the curves If an rn-bit (mn < n) bus can be encoded using the legal transitions in arises from the fact that sometimes, two buses (of width n and n + 1) have GnC free, then there must exist a closed set of vertices Vc C Rn in the v an identical effective bus width of mn. In that case, the overhead as defined kC-free~~~~~~~~~~~~~~above is larger for the bus of real width n + 1. spceofc4 fee(v w schtht Note that this figure indicates that the asymptotic bus size overheads for * Each source vertex V5 C Vc has at least 2m outgoing edges (v5, Wd) to the memory based CODECs are much lower than the memoryless CODEC destination vertices wd (including the self edge), such that the destinaoverheads reported in [5, 4] . The overhead for a 2 C free memory based tion vertex Wd C Vc.
CODEC is about 117%Y compared to 146%Y for a memoryless CODEC. The * The car-dinality of Vc is at least 2m.
1 121overhead for a 3 * C free memory based CODEC is about 30% compar-ed 2-re~5 . and each segment be encoded and decoded independently. In such a situWe demonstrate that the asymptotic bus size overheads for memoryation, we could choose a bus width n that yields the lowest overhead, by based CODECs that are free of 2 C, 3 C and 4 C transitions are respecreferring to Figure 1 . In particular, the choice of 4 or 6 bit segments is tively around 117%, 30% and 8%. This is in contrast to the memoryless preferable over 5 or 7 bit segments, if we were trying to eliminate 2 C CODEC overheads for the same transitions, which are 146%, 44% and cross-talk. 33% respectively. The overall area reduction (compared to memoryless The standard cell based implementation of the encoder and decoder re-CODECs) of our approach is about 25%, 10% and 20% for 2 C, 3 C and sults in a delay of 280ps, using a 0.lpm bsim]OO process [1] (this is the 4 C free encoded buses respectively. The user can trade off the bus size worst delay among all decoders and decoders required for any of the 2-C-overhead against the cross-talk immunity that is desired. According to our free. 3 C-free and 4-C-free approaches). A Programmable Logic Arrays experiments, this tradeoff can result in a bus speed up of above 6 x. (PLA) based realization may reduce this delay further.
The implementation of the memory-based CODECs is more complex
