New Crosstalk Avoidance Codes Based on a Novel Pattern Classification by Shi, Feng et al.
ar
X
iv
:1
20
9.
26
72
v1
  [
cs
.IT
]  
12
 Se
p 2
01
2
1
New Crosstalk Avoidance Codes
Based on a Novel Pattern Classification
Feng Shi, Student Member, IEEE, Xuebin Wu, and Zhiyuan Yan Senior Member, IEEE
Abstract—The crosstalk delay associated with global on-chip
interconnects becomes more severe in deep submicron technology,
and hence can greatly affect the overall system performance.
Based on a delay model proposed by Sotiriadis et al., transition
patterns over a bus can be classified according to their delays.
Using this classification, crosstalk avoidance codes (CACs) have
been proposed to alleviate the crosstalk delays by restricting the
transition patterns on a bus. In this paper, we first propose a
new classification of transition patterns, and then devise a new
family of CACs based on this classification. In comparison to the
previous classification, our classification has more classes and
the delays of its classes do not overlap, both leading to more
accurate control of delays. Our new family of CACs includes
some previously proposed codes as well as new codes with
reduced delays and improved throughput. Thus, this new family
of crosstalk avoidance codes provides a wider variety of tradeoffs
between bus delay and efficiency. Finally, since our analytical
approach to the classification and CACs treats the technology-
dependent parameters as variables, our approach can be easily
adapted to a wide variety of technology.
Index Terms—Crosstalk avoidance codes, delay, interconnects
I. INTRODUCTION
RECENT International Technology Roadmap of Semicon-ductors (ITRS) [1] has shown a troubling trend: while
gate delay decreases with scaling, global wire delay increases.
This is because with the process technologies scaling down
into deep submicrometer (DSM), the crosstalk delay becomes
dominant in global wire delay due to the increasing coupling
capacitance between adjacent wires. Hence, the crosstalk delay
has become a serious bottleneck of the overall system perfor-
mance.
The analytical model proposed by Sotiriadis et al. [2], [3],
a widely used delay model, gives upper bounds on the delay
of all wires on a bus. According to [2], [3], the delay of the
k-th wire (k ∈ {1, 2, · · · ,m}) of an m-bit bus is given by
Tk =


τ0[(1 + λ)∆
2
1 − λ∆1∆2], k = 1
τ0[(1 + 2λ)∆
2
k − λ∆k(∆k−1 +∆k+1)], k 6= 1,m
τ0[(1 + λ)∆
2
m − λ∆m∆m−1], k = m,(1)
where λ is the ratio of the coupling capacitance between ad-
jacent wires and the ground capacitance, τ0 is the propagation
delay of a wire free of crosstalk, and ∆k is 1 for 0 → 1
transition, -1 for 1 → 0 transition, or 0 for no transition on
the k-th wire. In this model, the delay of the k-th wire depends
F. Shi and Z. Yan are with the Department of Electrical and Computer
Engineering, Lehigh University, Bethlehem, PA 18015 (e-mails: {fes209,
yan}@lehigh.edu). X. Wu is with LSI Corporation in Milpitas, CA, USA
(e-mail: xuebin.wu@lsi.com).
on the transition patterns of at most three wires, k− 1, k, and
k+1 only. The transition patterns over these three wires can be
classified based on Eq. (1) into five classes, denoted by Di for
i = 0, 1, 2, 3, 4, and the patterns in Di have a worst-case delay
(1 + iλ)τ0. This classification enables one to limit the worst-
case delay over a bus by restricting the patterns transmitted on
the bus. That is, by avoiding all transition patterns in Di for
i > i0, one can achieve a worst-case delay of (1+ i0λ)τ0 over
the bus. Based on this principle, crosstalk avoidance codes
(CACs) of different worst-case delays have been proposed
(see, for example, [4]–[6]). For example, forbidden overlap
codes (FOCs), forbidden transition codes (FTCs), forbidden
pattern codes (FPCs), and one lambda codes (OLCs) achieve
a worst-case delay of (1 + 3λ)τ0, (1 + 2λ)τ0, (1 + 2λ)τ0,
and (1 + λ)τ0, respectively. Based on Eq. (1), a worst-case
delay of τ0 can be achieved by assigning two protection
wires to each data wire [5]. Other types of CACs, such as
those with equalization [7] or two-dimensional CACs [8], have
been proposed in the literature. For CACs, since the area and
power consumption of their encoder/decoder (CODECs) are
all overheads, the complexities of the CODECs are important
to the effectiveness of CACs. Thus, efficient CODECs have
been proposed for CACs [9]–[11].
The classification of transition patterns based on the model
in [2], [3] has two drawbacks. First, the model in [2], [3]
has limited accuracy because of its dependence on only three
wires: the model overestimates the delays of patterns in D1
through D4, while it underestimates the delays of patterns
in D0. For this reason, the scheme with a worst-case delay
of τ0 in [5] is invalid since its actual delay is much greater.
Second, the actual delay ranges in some classes overlap with
others. This, plus the overestimation of delays for D1 through
D4, implies that the delays of existing CACs are not tightly
controlled. These drawbacks motivate us to include more wires
and to classify the transition patterns without overlapping
delay ranges.
In [12], we have proposed a new analytical five-wire delay
model. Two extra neighboring wires are included in the
delay model [12], and the delay of the middle wire of five
neighboring wires is determined by the transition patterns on
all five wires. This five-wire model has better accuracy than
the model in [2], [3] for Di for i = 0, 1, 2, 3, 4 [12]. This work
confirms that using more wires leads to improved accuracy.
There are two main contributions in this paper:
• First, we approximate the crosstalk delay in a five-wire
model and propose a new classification of transition
patterns.
• Second, we propose a family of CACs based on our
2classification.
The work in this paper is different from previous works,
including our previous works, in several aspects:
• First, although the delay approximation in this paper is
also based on a five-wire model, it is different from that
in our previous work [12]. The delay approximation in
this paper is carried out by extending the approach in
[13] from a three-wire model to a five-wire one.
• Second, our classification of transition patters is different
from that in [2], [3] (based on Eq. (1)), in two aspects.
First, our classification has seven classes as opposed to
five based on Eq. (1). Second, while the delays of some
classes overlap for the classification based on Eq. (1), all
classes in our classification have non-overlapping delays.
These two key differences allow us to have a more
accurate control of delays for transition patterns.
• Our new family of CACs is also different from previ-
ously proposed CACs, all of which are based on the
classification in [2], [3] (based on Eq. (1)). While some
codes in this new family are shown to be the same as
existing CACs, OLCs, FPCs, and FOCs, this family also
includes new codes that achieve smaller worst-case delays
and improved throughputs than OLCs, which have the
smallest worst-case delays among all existing CACs.
The rest of the paper is organized as follows. In Section II,
we first propose our classification and compare it with that
in [2], [3]. We then present our new family of CACs in
Section III and compare their performance with existing CACs
in Section IV. Some concluding remarks are provided in
Section V.
II. INTERCONNECT DELAYS AND
CLASSIFICATION
A. Interconnect Modeling
Since the functionality and performance in DSM technology
are greatly affected by the parasitics, distributed RC models
are widely employed to analyze on-chip interconnects. In this
paper, we consider the distributed RC model of five wires
shown in Fig. 1, where Vi(x, t) denotes the transient signal
at time t and position x (0 ≤ x ≤ L) over wire i for
i ∈ {1, 2, 3, 4, 5}, r and c denote the resistance and ground
capacitance per unit length, respectively. Also, λc denotes the
coupling capacitance per unit length between two adjacent
wires. The value of λ depends on many factors, such as the
metal layer in which we route the bus, the wire width, the
spacing between adjacent wires, and the distance to the ground
layer. We consider a uniformly distributed bus with the same
parameters r, c, and λ for all the wires.
B. Derivation of Closed-form Expressions
When determining the delay of a wire, the model in [2], [3]
considers only the effects of either one or two neighboring
wires (cf. Eq. (1)). To address the drawbacks of the model
in [2], [3] described above, additional neighboring wires need
to be accounted for. In our delay derivation below, whenever
possible we consider four neighboring wires of a wire, two
Wire 1
Wire 2
Wire 3
Wire 5
x
r   x
c   x
c   x
V1(0,t)
V2(0,t)
V3(0,t)
V5(0,t)
V5(L,t)
V3(L,t)
V2(L,t)
V1(L,t)
L
Wire 4
V4(0,t)
V4(L,t)
6
6
6
6
λ
Fig. 1. A distributed RC model for five wires.
neighboring wires on each side, to determine its delay. To
approximate the delay of a side wire (wires 1, 2, n−1 or n) of
an n-wire bus, three neighboring wires are considered. This
is because the side wires are affected by fewer neighboring
wires. This scheme is similar to the model in [2], [3] and
appears to work well. We focus on the 50% delay, which is
defined as the time required for the unit step response to reach
50% of its final value.
In [13], the crosstalk of two coupled lines was described
by partial differential equations (PDEs), and a technique
for decoupling these highly coupled PDEs was introduced
by using eigenvalues and corresponding eigenvectors. In our
work, we extend this approach from a three-wire model to a
five-wire one. Specifically, we first use the technique in [13] to
decouple the PDEs that describe the crosstalk of four coupled
wires, then solve these independent PDEs for closed-form
expressions, and finally approximate the delays of each wire.
The PDEs characterizing five wires with length L are given
by:
∂2
∂x2
V(x, t) = RC
∂
∂t
V(x, t), (2)
where R = diag{r r r r r}, V(x, t) =
[V1(x, t) V2(x, t) V3(x, t) V4(x, t) V5(x, t)]
T
, and
C = c

 1+λ −λ 0 0 0−λ 1+2λ −λ 0 00 −λ 1+2λ −λ 0
0 0 −λ 1+2λ −λ
0 0 0 −λ 1+λ

 .
The eigenvalues of C/c are given by p1 = 1, p2 = 1 +
5+
√
5
2 λ, p3 = 1 +
5−√5
2 λ, p4 = 1 +
3+
√
5
2 λ, and p5 = 1 +
3−√5
2 λ. Their corresponding eigenvectors ei’s are given by
e1 = [1 1 1 1 1]
T
, e2 = [
√
5−1
4 − 1+
√
5
4 1 − 1+
√
5
4
√
5−1
4 ]
T
,
e3 = [
−√5+1
4
√
5−1
4 1
√
5−1
4 −
√
5+1
4 ]
T
, e4 = [−1
√
5+1
2 0 −√
5+1
2 1]
T
, and e5 = [−1 −
√
5−1
2 0
√
5−1
2 1]
T
, respectively.
With a technique for decoupling partial differential equa-
tions similar to [13], Eq. (2) is transformed into
∂2
∂x2
Ui(x, t) = rcpi
∂
∂t
Ui(x, t), for i = 1, 2, 3, 4, 5, (3)
where Ui(x, t) = VT (x, t)ei denotes the transformed signals.
The decoupled PDEs in Eq. (3) are independent of each
other. Each Ui(x, t) describes a single wire with a modified
capacitance cpi. The solution to Ui(L, t) is given by a series of
3the form Ui(L, t) = Vdd +
∑∞
k=0 rke
− t
s
k
τ
. As shown in [13],
a single-exponent approximation Vdd(1 + r0e−
t
s0τ ) is enough
for t/τ > 0.1, where r0 and s0 are the coefficients of the most
significant term.
For different transitions, we solve Eq. (3) for Ui(x, t) and
obtain V3(L, t) = 15 [U1(L, t) + 2U2(L, t) + 2U3(L, t)], which
is given by a sum of a constant and three exponent terms,
Vdd(1− c0e−
t
a0τ − c1e−
t
a1τ − c2e−
t
a2τ ). Then the 50% delay
of wire 3 can be evaluated by solving V3(L, t) = 0.5Vdd.
For side wires, PDEs characterizing four wires with length
L are given by:
∂2
∂x2
V(x, t) = RC
∂
∂t
V(x, t), (4)
where R = diag{r r r r}, V(x, t) =
[V1(x, t) V2(x, t) V3(x, t) V4(x, t)]
T
, and C =
c
[
1+λ −λ 0 0
−λ 1+2λ −λ 0
0 −λ 1+2λ −λ
0 0 −λ 1+λ
]
.
The eigenvalues of C/c are given by p1 = 1, p2 = 1 +
(2 − √2)λ, p3 = 1 + 2λ, and p4 = 1 + (2 +
√
2)λ. Their
corresponding eigenvectors ei’s are given by e1 = [1 1 1 1]T ,
e2 = [−1 (1 −
√
2) − (1 −√2) 1]T , e3 = [1 − 1 − 1 1]T ,
and e4 = [−1 (1 +
√
2) − (1 +√2) 1]T , respectively.
By decoupling the PDEs in Eq. (4), we have
∂2
∂x2
Ui(x, t) = rcpi
∂
∂t
Ui(x, t), for i = 1, 2, 3, 4, (5)
The expressions of wires 1 and 2 are given by V1(L, t) =
1
4U1(L, t) − 2+
√
2
8 U2(L, t) +
1
4U3(L, t) − 2−
√
2
8 U4(L, t) and
V2(L, t) =
1
4U1(L, t)−
√
2
8 U2(L, t)− 14U3(L, t)+
√
2
8 U4(L, t),
respectively. Then the 50% delays of wires 1 and 2 can be
evaluated by solving Vi(L, t) = 0.5Vdd for i = 1, 2.
C. Pattern Classification
First, we consider the classification of transition patterns
over five wires with respect to the delay of the middle wire
(wire 3). In this paper, we use “↑” to denote a transition
from 0 to the supply voltage Vdd (normalized to 1), “-” no
transition, and “↓” a transition from Vdd to 0. We first focus
on patterns with a ↑ transition on wire 3 in a five-wire bus
and derive V3(L, t) for each pattern as described in Sec. II-B.
There are 34 = 81 different transition patterns, which can be
partitioned into 25 subclasses according to the expressions of
the output signals on wire 3: All transition patterns in each
subclass have the same expression V3(L, t). The expressions
of all 25 subclasses are shown in Tab. I. Then the expressions
V3(L, t) of all patterns in the 25 subclasses are evaluated for
their 50% delays. By grouping subclasses with close delays
into one class, we can divide the 81 transition patterns into
seven classes Ci for i = 0, 1, · · · , 6 shown in Tab. I. For all
25 subclasses, simulated delays are also provided in Tab. I.
For all seven classes, the difference between evaluated delay
and simulated delay in Tab. I is small.
All evaluations and simulations are based on a freePDK
45nm CMOS technology with 10 metal layers [14]. We assume
that the top two metal layers, layers 9 and 10, are used for
routing global interconnects, and that metal layer 8 is used as
1 3 5 7 9 11 13
0
5
10
15
20
25
30
35
40
45
λ
D
e
la
y
/τ
0
C0
C1
C2
C3
C4
C5
C6
Fig. 2. Delays of the middle wire for all patterns with respect to λ in a
five-wire bus (τ0 = 1.42ps).
the ground layer. An interconnect model in [15] is used for
parasitic extraction. For a 5mm bus in the top metal layer, the
key parasitics, resistance, ground capacitance, and coupling
capacitance, are given by R = 68.75Ω, Cgnd = 41.32fF , and
Ccouple = 505.68fF , respectively. The bus is modeled by a
distributed RC model as shown in Fig. 1 with 100 segments.
The two important parameters used in our delay approximation
are τ0 = 0.5RCgnd = 1.42ps and λ = Ccouple/Cgnd = 12.24.
Since the crosstalk delay on the bus constitutes a major part of
the whole delay, the delays introduced by buffers are ignored.
We assume that ideal step signals are applied on the bus
directly. The closed-form expressions are evaluated for 50%
delays via MATLAB and the simulation is done by HSPICE.
From Tab. I, it can be easily verified that C5 and C6 are the
same as D3 and D4 in [2], [3], respectively. That is, the middle
three wires of the transition patterns in C5 (C6, respectively)
constitute D3 (D4, respectively). The transition patterns in
D0, D1, and D2 are divided into five classes C0—C4 in our
classification with following relations, C4 ⊂ D2, C3 ⊂ D1∪
D2, C2 ⊂ D0∪D1, C1 ⊂ D0∪D1∪D2, and C0 ⊂ D0∪D1.
Note that the coefficients ci for i = 0, 1, 2 of the expression
of wire 3 are independent of technology and determined by
different patterns. For a given pattern, the coefficients ci are
fixed and the delay is a function of τ0 and λ. Since the ratio
t/τ0 appears in the exponent term, varying τ0 would scale
delays in all classes. Thus, the classification does not depend
on τ0. The coupling factor λ could affect the delay differently.
In the following, we verify our classification for technology
with different coupling factor, λ = 1, 2, · · · , 13, and show the
results in Fig. 2. Different classes are denoted by different line
styles. Each class contains multiple lines, which represents a
subclass. Patterns in each subclass have the same delay. For
λ ≥ 3, the ranges of delays in all classes do not overlap.
Also, the delay in each subclass increases linearly with λ.
This implies that our classification is valid provided that the
coupling factor λ is at least 3.
4TABLE I
CLOSED-FORM EXPRESSIONS FOR THE OUTPUT SIGNALS ON WIRE 3 IN A FIVE-WIRE BUS WITH EVALUATED AND SIMULATED 50% DELAYS
(τ0 = 1.42 ps, τ = 8pi2 τ0 , λ = 12.24, a0 = 1, a1 = 1 +
5−
√
5
2
λ, AND a2 = 1 +
5+
√
5
2
λ FOR ALL CLASSES).
Class i Patterns
Closed-form expression for output signal on wire 3
Evaluated delays (ps) Sim. delay (ps)
Vdd(1 − c0e−
t
a0τ − c1e−
t
a1τ − c2e−
t
a2τ )
c0 c1 c2
0
↑↑↑↑↑ 4
pi
0 0 1.08 1.18
-↑↑↑↑, ↑↑↑↑- 16
5pi
2(1+
√
5)
5pi
2(1−
√
5)
5pi
1.41 1.50
↑-↑↑↑, ↑↑↑-↑ 16
5pi
2(1−
√
5)
5pi
2(1+
√
5)
5pi
1.41 1.50
1
-↑↑↑-, ↓↑↑↑↑, ↑↑↑↑↓ 12
5pi
4(1+
√
5)
5pi
4(1−
√
5)
5pi
2.35 2.40
- -↑↑↑, ↑↑↑- -, -↑↑-↑, ↑-↑↑- 12
5pi
4
5pi
4
5pi
2.35 2.40
↑-↑-↑, ↑↑↑↓↑, ↑↓↑↑↑ 12
5pi
4(1−
√
5)
5pi
4(1+
√
5)
5pi
2.35 2.45
2
-↑↑↑↓, ↓↑↑↑- 8
5pi
6(1+
√
5)
5pi
6(1−
√
5)
5pi
6.17 6.84
- -↑↑-, -↑↑- -, ↓-↑↑↑, ↓↑↑-↑, 8
5pi
2(3+
√
5)
5pi
2(3−
√
5)
5pi
9.62 9.21↑-↑↑↓, ↑↑↑-↓
↓↑↑↑↓ 4
5pi
8(1+
√
5)
5pi
8(1−
√
5)
5pi
9.90 10.70
3
- -↑↑↓, ↓↑↑- -, -↑↑-↓, ↓-↑↑- 4
5pi
4(2+
√
5)
5pi
4(2−
√
5)
5pi
14.07 14.22
↓-↑↑↓, ↓↑↑-↓ 0 2(5+3
√
5)
5pi
2(5−3
√
5)
5pi
16.91 17.18
- -↑-↑, ↑-↑- -, -↑↑↓↑, ↑↑↑↓-, 8
5pi
2(3−
√
5)
5pi
2(3+
√
5)
5pi
19.24 18.47
-↓↑↑↑, ↑↓↑↑-
4
- -↑- -, ↑-↑-↓, ↓-↑-↑,
4
5pi
8
5pi
8
5pi
22.67 22.60-↑↑↓-, ↑↑↑↓↓, ↓↑↑↓↑,
-↓↑↑-, ↑↓↑↑↓, ↓↓↑↑↑,
- -↑-↓, ↓-↑- -, -↑↑↓↓, ↓↑↑↓-, 0 2(5+
√
5)
5pi
2(5−
√
5)
5pi
24.58 24.68
-↓↑↑↓, ↓↓↑↑-
↓-↑-↓, ↓↑↑↓↓, ↓↓↑↑↓ − 4
5pi
4(3+
√
5)
5pi
4(3−
√
5)
5pi
25.84 26.03
5
↓↓↑-↓, ↓-↑↓↓ − 8
5pi
2(7+
√
5)
5pi
2(7−
√
5)
5pi
36.63 36.91
- -↑↓↓, ↓↓↑- -, -↓↑-↓, ↓-↑↓- − 4
5pi
12
5pi
12
5pi
37.24 37.52
- -↑↓-, -↓↑- -, ↑-↑↓↓, ↑↓↑-↓, 0 2(5−
√
5)
5pi
2(5+
√
5)
5pi
38.07 38.35↓-↑↓↑, ↓↓↑-↑,
- -↑↓↑, ↑↓↑- -, -↓↑-↑, ↑-↑↓- 4
5pi
4(2−
√
5)
5pi
4(2+
√
5)
5pi
39.22 39.47
↑-↑↓↑, ↑↓↑-↑ 8
5pi
6(1−
√
5)
5pi
6(1+
√
5)
5pi
40.87 41.11
6
↓↓↑↓↓ − 12
5pi
16
5pi
16
5pi
48.43 48.85
↓↓↑↓-, -↓↑↓↓ − 8
5pi
2(7−
√
5)
5pi
2(7+
√
5)
5pi
50.43 50.86
-↓↑↓-, ↑↓↑↓↓, ↓↓↑↓↑ − 4
5pi
4(3−
√
5)
5pi
4(3+
√
5)
5pi
52.78 53.25
↑↓↑↓-, -↓↑↓↑ 0 4(5−3
√
5)
5pi
4(5+3
√
5)
5pi
55.48 55.97
↑↓↑↓↑ 4
5pi
8(1−
√
5)
5pi
8(1+
√
5)
5pi
58.52 59.04
Then, we consider the classification of transition patterns
over four wires with respect to the delays of the side wires.
We classify patterns by considering the worst-case delays of
wires 1 and 2, respectively. Note that the classification with
respect to the delays of wires 4 and 5 would be the same
by symmetry. We first focus on patterns with a ↑ transition
on wire 2 in a four-wire bus. There are 33 = 27 different
transition patterns. As described in Sec. II-B, we first derive
the expressions V2(L, t) of these 27 patterns shown in Tab. II.
By evaluating these patterns for their 50% delays, we group
patterns with close delays into one class, and form 5 classes
jC for j = 0, 1, 2, 3, 4 as shown in Tab. II. Then, we focus
on patterns with a ↑ transition on wire 1. There are 33 = 27
different transition patterns. As described in Sec. II-B, we first
derive the expressions V1(L, t) of these 27 patterns shown in
Tab. III. By evaluating these patterns for their 50% delays, we
group patterns with close delays into one class, and form 3
classes jC for j = 0, 1, 2 as shown in Tab. III. When both
wires 1 and 2 have transitions, the delay on wire 2 is larger
than that of wire 1, which can be verified from Tabs. II and III.
In this case, we focus on the delay of wire 2. When only wire 1
has transition, we focus on the delay of wire 1. The difference
between evaluated delay and simulated delay is small as shown
in Tabs. II and III with one exception (the pattern ↑↑↓↑ in 1C
in Tab. II), which doesn’t change our classification.
From Tabs. II and III, the classes 3C and 4C of our
classification are exactly the same as D3 and D4 in [2], [3],
respectively. The class 1C and 2C of our classification are
subsets of D1 and D2 in [2], [3], respectively. The class 0C
is a subset of D0 ∪D1 in [2], [3].
Similar to the classification of middle wires, we conclude
that the classification on side wires does not depend on τ0. To
verify our classification for technology with different coupling
effects, we consider coupling factor λ = 1, 2, · · · , 13, and
show the results in Fig. 3. Each class contains multiple lines,
each of which represents a pattern in Tabs. II and III. For
λ ≥ 1, the ranges of delays in all classes do not overlap.
Also, the delay in each subclass increases linearly with λ. This
5TABLE II
CLOSED-FORM EXPRESSIONS FOR THE OUTPUT SIGNALS ON WIRE 2 IN A FOUR-WIRE BUS WITH EVALUATED AND SIMULATED 50% DELAYS
(τ0 = 1.42 ps, τ = 8pi2 τ0 , λ = 12.24, a0 = 1, a1 = 1 + (2−
√
2)λ, a2 = 1 + 2λ, AND a3 = 1 + (2 +
√
2)λ FOR ALL CLASSES).
jC Patterns
Closed-form expression for the output signal on wire 2
Evaluated delays (ps) Sim. delay (ps)
Vdd(1− c0e−
t
a0τ − c1e−
t
a1τ − c2e−
t
a2τ − c3e−
t
a3τ )
c0 c1 c2 c3
0
↑↑↑↑ 4
pi
0 0 0 1.08 1.18
↑↑↑- 3
pi
√
2
2pi
1
pi
−
√
2
2pi
1.55 1.61
↑↑-↑ 3
pi
2−
√
2
2pi
− 1
pi
− 2+
√
2
2pi
1.55 1.62
-↑↑↑ 3
pi
−
√
2
2pi
1
pi
√
2
2pi
1.55 1.64
1
↑↑↑↓ 2
pi
√
2
pi
2
pi
−
√
2
pi
3.33 3.22
↑↑- - 2
pi
1
pi
0 1
pi
4.54 3.48
-↑↑- 2
pi
0 2
pi
0 7.21 5.15
↑↑-↓ 1
pi
2+
√
2
2pi
1
pi
2−
√
2
2pi
9.70 9.38
↑↑↓↑ 2
pi
0 2−
√
2
2pi
− 2
pi
9.98 3.92
-↑↑↓ 1
pi
√
2
2pi
3
pi
−
√
2
2pi
12.89 13.03
2
↑↑↓- 1
pi
4−
√
2
2pi
− 1
pi
4+
√
2
2pi
17.02 16.05
-↑-↑ 2
pi
1−
√
2
pi
0 1+
√
2
pi
19.67 18.79
↑↑↓↓ 0 2
pi
0 2
pi
20.05 19.85
-↑- - 1
pi
2−
√
2
2pi
1
pi
2+
√
2
2pi
22.59 22.48
-↑-↓ 0 1
pi
2
pi
1
pi
24.12 24.22
↓↑↑↑ 2
pi
−
√
2
pi
2
pi
√
2
pi
26.02 26.06
↓↑↑- 1
pi
−
√
2
2pi
3
pi
√
2
2pi
26.89 27.06
↓↑↑↓ 0 0 4
pi
0 27.45 27.68
3
-↑↓↓ − 1
pi
4−
√
2
2pi
1
pi
4+
√
2
2pi
37.44 37.74
-↑↓- 0 2−
√
2
pi
0 2+
√
2
pi
38.61 38.89
↓↑-↓ − 1
pi
2−
√
2
2pi
3
pi
2+
√
2
2pi
39.06 39.40
-↑↓↑ 1
pi
4−
√
2
2pi
− 1
pi
4+
√
2
2pi
40.12 40.39
↓↑- - 0 1−
√
2
pi
2
pi
1+
√
2
pi
40.21 40.55
↓↑-↑ 1
pi
2−3
√
2
2pi
1
pi
2+3
√
2
2pi
41.63 41.98
4
↓↑↓↓ − 2
pi
2−
√
2
pi
2
pi
2+
√
2
pi
50.92 51.36
↓↑↓- − 1
pi
4−3
√
2
2pi
1
pi
4+3
√
2
2pi
52.99 53.44
↓↑↓↑ 0 2−2
√
2
pi
0 2+2
√
2
pi
55.28 55.79
1 3 5 7 9 11 13
0
5
10
15
20
25
30
35
40
45
λ
D
e
la
y
/τ
0
0C
1C
2C
3C
4C
Fig. 3. Delays of side wires for all patterns with respect to λ in a four-wire
bus (τ0 = 1.42ps).
implies that our classification on side wires is valid provided
that the coupling factor λ is at least 1.
In addition to being a finer classification, the new classi-
fication has no overlapping delays among different classes.
Fig. 4 compares the simulated delays of different classes based
on the classification in [2], [3] and our new classification. In
Fig. 4, the grey bars identify the minimum and maximum
simulated delays in every class. Note that only two extremes
are important, and not all delay values in the grey bars are
achievable by some transition patterns. In Fig. 4(a), the thick
line segments denote the upper bounds for delay of each class
based on Eq. (1). The upper bounds by the model in [2], [3]
overestimate the delays of D1 through D4 and underestimate
the delay of D0. As shown in Fig. 4(a), the actual delays
in D0, D1, and D2 overlap with each other. Some patterns
with smaller delays have potential to transmit information
at a higher speed, but are categorized into a class with a
larger delay bound. Thus, the classification by the model
in [2], [3] does not result in effective crosstalk avoidance
codes. In contrast, the delays of different classes in our new
classification do not overlap as shown in Fig. 4(b), 4(c), and
4(d). By classifying patterns this way, we have a more accurate
6TABLE III
CLOSED-FORM EXPRESSIONS FOR THE OUTPUT SIGNALS ON WIRE 1 IN A FOUR-WIRE BUS WITH EVALUATED AND SIMULATED 50% DELAYS
(τ0 = 1.42 ps, τ = 8pi2 τ0 , λ = 12.24, a0 = 1, a1 = 1 + (2−
√
2)λ, a2 = 1 + 2λ, AND a3 = 1 + (2 +
√
2)λ FOR ALL CLASSES).
jC Patterns
Closed-form expression for the output signal on wire 1
Evaluated delays (ps) Sim. delay (ps)
Vdd(1− c0e−
t
a0τ − c1e−
t
a1τ − c2e−
t
a2τ − c3e−
t
a3τ )
c0 c1 c2 c3
0
↑↑↑↑ 4
pi
0 0 0 1.08 1.18
↑↑↑- 3
pi
− 2+
√
2
2pi
− 1
pi
2−
√
2
2pi
1.55 1.59
↑↑-↑ 3
pi
√
2
2pi
1
pi
−
√
2
2pi
1.55 1.61
↑-↑↑ 3
pi
−
√
2
2pi
1
pi
√
2
2pi
1.55 1.64
1
↑↑↑↓ 2
pi
2+
√
2
pi
− 2
pi
2−
√
2
pi
2.50 2.70
↑↑- - 2
pi
1+
√
2
pi
0 1−
√
2
pi
2.83 2.90
↑↑↓↑ 2
pi
√
2
pi
2
pi
−
√
2
pi
3.33 3.20
↑↑-↓ 1
pi
4+3
√
2
2pi
− 1
pi
4−3
√
2
2pi
4.65 4.99
↑-↑- 2
pi
1
2pi
0 1
2pi
4.54 3.49
↑↑↓- 1
pi
2+3
√
2
2pi
1
pi
2−3
√
2
2pi
5.53 5.88
↑↑↓↓ 0 2+2
√
2
pi
0 2−2
√
2
pi
7.03 7.39
↑- -↑ 2
pi
0 2
pi
0 7.21 5.15
↑-↑↓ 1
pi
4+
√
2
2pi
− 1
pi
4−
√
2
2pi
7.41 6.89
↑- - - 1
pi
2+
√
2
2pi
1
pi
2−
√
2
2pi
9.70 9.35
↑- -↓ 0 2+
√
2
pi
0 2−
√
2
pi
10.68 10.54
↑-↓↑ 1
pi
√
2
2pi
3
pi
−
√
2
2pi
12.89 13.03
↑-↓- 0 2+2
√
2
2pi
2
pi
2−2
√
2
2pi
13.03 13.14
↑-↓↓ − 1
pi
4+3
√
2
2pi
1
pi
4−3
√
2
2pi
13.11 13.21
2
↑↓↑↓ 0 2
pi
0 2
pi
20.05 19.85
↑↓-↓ − 1
pi
4+
√
2
2pi
1
pi
4−
√
2
2pi
21.86 21.91
↑↓↑- 1
pi
2−
√
2
2pi
1
pi
2+
√
2
2pi
22.59 22.48
↑↓↓↓ − 2
pi
2+
√
2
pi
2
pi
2−
√
2
pi
23.10 23.23
↑↓- - 0 1
pi
2
pi
1
pi
24.12 24.22
↑↓↓- − 1
pi
2+
√
2
2pi
3
pi
2−
√
2
2pi
25.10 25.30
↑↓↑↑ 2
pi
−
√
2
pi
2
pi
√
2
pi
26.02 26.06
↑↓-↑ 1
pi
−
√
2
2pi
3
pi
√
2
2pi
26.89 27.06
↑↓↓↑ 0 0 4
pi
0 27.45 27.68
control of delays for transition patterns.
III. NEW MEMORYLESS CROSSTALK AVOIDANCE
CODES
A. Previous CAC Design
CACs reduce the crosstalk delay for on-chip global intercon-
nects by encoding a k-bit data word (x1x2 · · ·xk) into an n-bit
(n > k) codeword (c1c2 · · · cn). Two kinds of CACs, CACs
with memory and memoryless CACs, have been investigated in
the literature. CACs with memory, as shown in Fig. 5(a), need
to store all codebooks corresponding to different codewords
(c1c2 · · · cn), since the encoding depends on the data word
(x1x2 · · ·xk) as well as the preceding codeword. In contrast,
memoryless CACs, as shown in Fig. 5(b), require a single
codebook to generate codewords for transmission, because the
encoding depends on the data word only. Hence, memoryless
CACs are simpler to implement than CACs with memory. We
focus on memoryless CACs in this paper.
The codebook of a memoryless CAC satisfies the property
that each codeword must be able to transition to every other
codeword in the codebook with a delay less than the require-
ment. Most memoryless CACs in the literature are based on
the model in [2], [3]. The key idea is to eliminate undesirable
patterns for transmission. Existing memoryless CACs include
OLCs, FPCs, FTCs, and FOCs [4]–[6], [16], which achieve a
worst-case delay of (1 + λ)τ0, (1 + 2λ)τ0, (1 + 2λ)τ0, and
(1+3λ)τ0, respectively. As mentioned above, the scheme that
was proposed to achieve a worst-case delay of τ0 is invalid
since the model in [2], [3] underestimates the delays for 0C.
Thus, OLCs achieve the smallest worst-case delay (1 + λ)τ0
among existing CACs.
There exist several methods to obtain a memoryless code-
book based on pattern pruning, transition pruning, or recursive
construction. The pattern pruning technique is quite straight
forward, and gives a codebook with a smaller worst-case delay
by eliminating some patterns. For example, FOCs cannot have
both 010 and 101 patterns around any bit position, and FPCs
are free of 010 and 101 patterns [16]. The transition pruning
technique [6] is based on graph theory. This method first builds
a transition graph with all possible codewords as nodes and all
valid transitions as edges, and then finds a maximum clique.
A clique is defined as a subgraph where every pair of nodes
are connected with an edge. A maximum clique is defined as a
clique of the largest possible size in a given graph. Since every
pair of nodes is connected, a maximum clique in this graph
7D0
D1
D2
D3
D4
100
Classification Based on (1)
C2
C3
C4
C5
C6
C0
C1
Delay  
New Classification for Wire 3
(a)
(b)
(c)
New Classification for Wire 1
20 30 40 50 60 70
Delay  (ps)
100 20 30 40 50 60 70
(ps)
0C
1C
2C
3C
4C
100 20 30 40 50 60 70
Delay  (ps)
New Classification for Wire 2
(d)
0C
1C
2C
100 20 30 40 50 60 70
Delay  (ps)
Fig. 4. Simulated delays of different classes of transition patterns using (a)
Classification based on (1); (b) Classification with respect to the delay of the
middle wire in a five-wire bus; (c) Classification with respect to the delay of
wire 2 in a four-wire bus; (d) Classification with respect to the delay of wire
1 in a four-wire bus (λ = 12.24 and τ0 = 1.42ps).
constitutes a memoryless codebook with the largest size. The
codebook generation method is based on exhaustive search.
Although it is easy to get a maximum clique from a transition
graph with a small n, the complexity increases rapidly with
n. This is because the number of edges in an n-bit transition
graph is upper bounded by 2n−1(2n − 1), which increases
exponentially with n. In fact, it is an NP problem to find
a maximum clique for given constraints [17]. The recursive
technique constructs an (n + 1)-bit codebook from an n-bit
codebook [4], [5]. Since for a small n, a largest codebook can
be obtained easily via the second method, a codebook for an
n-wire bus can be constructed recursively.
B. CAC Design with New Classification
Since our classification of patterns is different from that in
[2], [3], the CAC designs should be reconsidered with our new
classification. In the following, we first introduce a recursive
method for codebook construction under different constraints,
and then derive the size of codebooks.
In our work, we use the recursive method to obtain a
memoryless codebook for the following two reasons. First,
it is complex to apply the pattern pruning technique, since our
new classification is based on transitions over five wires, and
it is not clear which patterns have larger worst-case delays
and should be removed. Second, it is hard to find a maximum
clique for a transition graph with a large n. In our method,
we first start with a 5-bit codebook, obtained by searching
for maximum cliques in a five-wire bus, and then build an
(n+ 1)-bit codebook by appending ’0’ and ’1’ to codewords
of an n-bit codebook while satisfying delay constraints.
Our new classifications partition patterns over five adjacent
wires into seven classes, C0 to C6, and patterns over four
adjacent wires into five classes, 0C to 4C. Similar to the CAC
design based on the model in [2], [3], the new classifications
are conducive to the design of CACs by eliminating undesir-
able transition patterns with large worst-case delays.
To get valid 5-bit codebooks, we first assume the allowed
patterns are from C0 to Ci for i = 0, 1, · · · , 6 in our
classification for middle wires. Then, for the side wires, we
assume patterns are from 0C to jC based on the classification
for side wires. Under these two assumptions, there are many
configurations of constraints, which are referred as (Ci, jC),
where i ∈ {0, 1, · · · , 6} and j ∈ {0, 1, · · · , 4}.
Since the worst-case delay of a bus is determined by the
largest delays among all wires, for an n-bit (n ≥ 5) bus under
(Ci, jC) we require that the worst-case delays on middle
wires and side wires are close enough. By our classifications,
we find 0C is close to C0, 1C close to C2 and C3, 2C
close to C4, 3C close to C5, and 4C close to C6. Hence,
among all configurations of constraints (Ci, jC), we only
focus on (C0, 0C), (C2, 1C), (C3, 1C), (C4, 2C), (C5, 3C),
and (C6, 4C). When n ≤ 4, the constraint Ci cannot be
enforced. Hence, the constraint (Ci, jC) reduces to jC. The
constraint (C0, 0C) appears to be too restrictive, and hence
we do not investigate it in this paper. The last configuration
(C6, 4C) is trivial, since it allows arbitrary transitions.
In the following, we propose a scheme for finding an n-bit
codebook C(Ci,jC)(n). For simplicity, we denote C(Ci,jC)(n)
as C(n) when there is no ambiguity about the constraint.
First, for a five-wire bus under constraint (Ci, jC), a pattern
transition graph is obtained. We search the graph for the largest
5-bit codebooks. One or two 5-bit codebooks of maximum
sizes exist for each constraint in Tab. IV, where we denote
an n-bit binary codeword (c1c2 · · · cn) as a decimal number∑n
i=1 ci2
n−i for simplicity. In [6], a bit boundary in a set of
codewords is said to be 01-type if only codewords with 00, 01,
and 11 are allowed across that boundary, and a bit boundary is
said to be 10-type when only codewords with 00, 10, and 11
are allowed across that boundary. It is shown that the largest
clique for a given constraint has alternating boundary types.
Thus, there are two largest cliques. Similarly, from Tab. IV,
we conjecture that the largest codebooks have alternating
constraints, C05 and C15 , for every five consecutive wires. For
constraint (C4, 2C), only one maximum 5-bit codebook exists.
We assume C15 is the same as C05 for constraint (C4, 2C).
8Encoder
Codebook
Buffer Encoder
Codebook
n-bit bus Buffer
Data
Encoder Buffer
Memory
n-bit bus Buffer
Memory
Decoder
Data
CodebookCodebookCodebookCodebook
CodebookCodebookCodebookCodebook
Fig. 5. System model for (a) CACs with memory; (b) Memoryless CACs.
Since we have two types of constraints, two largest codebooks
for each constraint can be obtained, except for (C4, 2C), where
the two codebooks are the same. Then we apply Alg. 1 to
obtain C(n). In the initialization, we pick a 5-bit codebook
C5 = C
0
5 . Then, the algorithm recursively appends one bit
to the codewords in the codebook in each iteration. For
ck = (c1c2 · · · ck), the appended bit x needs to satisfy that
the last five bits (ck−3ck−2ck−1ckx) form a codeword in Cs5 ,
which alternates between C05 and C15 . If we pick the other
5-bit codebook C5 = C15 , we would obtain another codebook.
Algorithm 1 Codebook design under (Ci, jC)
Input: C05 , C15 , n;
Initialize: k = 5, C5 = C05 , s = 1;
while k ≤ n− 1 do
for ∀ck = (c1c2 · · · ck) ∈ C(k) do
if (ck−3ck−2ck−1ck0) ∈ Cs5 then
append 0 to ck and add the new codeword to C(k+
1);
else if (ck−3ck−2ck−1ck1) ∈ Cs5 then
append 1 to ck and add the new codeword to C(k+
1);
end if
end for
s = 1− s;
k = k + 1;
end while
Output: C(n).
The recursive construction allows us to derive the size of
the codebooks. Let V(Ci,jC) be an all-one m-dimensional
row vector (m = |C05 |) under constraint (Ci, jC). Let
c
s
k be a k-bit codeword with last five consecutive bits
(ck−4ck−3ck−2ck−1ck) ∈ Cs5 for s = 0 or 1. If a 0 or
1 can be appended to csk to form a (k + 1)-bit codeword
whose last five bits (ck−3ck−2ck−1ckck+1) ∈ C1−s5 , such an
expansion is called a valid expansion. Otherwise, it is called
an invalid expansion. An expansion matrix is denoted as a
m × m matrix Ds(Ci,jC), where Ds(Ci,jC)(i, j) = 0 denotes
an invalid expansion and Ds(Ci,jC)(i, j) = 1 a valid expansion
from the i-th codeword in Cs5 to the j-th codeword in C1−s5
under constraint (Ci, jC). Each row of Ds(Ci,jC) has at most
two ones, since each k-bit codeword can be appended to
form at most two (k + 1)-bit codewords whose last five bits
satisfy the appropriate constraints. Let Y be an m ×m anti-
diagonal matrix with all ones. Due to symmetry between C05
and C15 , D0 and D1 satisfy D1(Ci,jC) = YD0(Ci,jC)Y. Define
D(Ci,jC) = D
0
(Ci,jC)Y = YD
1
(Ci,jC). We denote V(Ci,jC)
and D(Ci,jC) as V and D, respectively, when there is no
ambiguity about the constraint. Then, for n ≥ 5, the number
of codewords in an n-bit bus is equal to counting the valid
transitions and is given by
|C(n)| = VD0D1 · · ·VT
=
{
V(D0YYD1)
n−5
2 V
T if n is odd;
V(D0YYD1)
n−6
2 D
0
YYV
T if n is even;
= VDn−5YVT .
(6)
In the following, we first focus on constraints (C3, 1C),
(C4, 2C), and (C5, 3C). The codes based on these constraints
are shown to have the same codebooks as OLCs, FPCs, and
FOCs, respectively. Then, we consider constraint (C2, 1C),
which would lead to codes with a smaller delay at the expense
of a lower code rate.
C. Codes Under (C3, 1C)
The one Lambda codes have a worst-case delay (1 + λ)τ .
According to [16], the worst-case delay (1 + λ)τ can only
be achieved if and only if the transitions ↑↓ ×, -↑-, and ↑-
↑ plus their symmetric and complement versions (e.g. ↑↓ ×
and × ↓↑ are symmetric, and -↓- is the complement of -
↑-) are avoided, where ↑, ↓, ×, and - denote 0→1, 1→0,
don’t care, and no transition, respectively. The first constraint
of avoiding ↑↓ × ensures that a transition between any two
9TABLE IV
LARGEST 5-BIT CODEBOOK(S) UNDER CONSTRAINT (Ci, jC).
Constraint C05 C15
(C5, 3C)
{0, 1, 2, 3, 6, 7, 8, 9, 10, 11, 12, 14, {0, 1, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16,
15, 16, 17, 18, 19, 24, 25, 26, 27, 28, 30, 31} 17, 19, 20, 21, 22, 23, 24, 25, 28, 29, 30, 31}
(C4, 2C) {0, 1, 3, 6, 7, 12, 14, 15, 16, 17, 19, 24, 25, 28, 30, 31}
(C3, 1C) {0, 3, 14, 15, 24, 30, 31} {0, 1, 7, 16, 17, 28, 31}
(C2, 1C) {0, 3, 15, 24, 30, 31} {0, 1, 7, 16, 28, 31}
TABLE V
EXPANSION MATRIX FOR (C3, 1C), (C4, 2C), AND (C5, 3C).
D(C3,1C) =


0 0 0 0 0 1 1
0 0 0 0 1 0 0
0 1 0 0 0 0 0
1 0 0 0 0 0 0
0 0 1 1 0 0 0
0 1 0 0 0 0 0
1 0 0 0 0 0 0

, D(C4,2C) =


0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0


, D(C5,3C) =


0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


.
codewords does not cause opposite transition on any wire. This
condition is referred as a forbidden-transition (FT) condition.
The second constraint of avoiding -↑- ensures that 2C patterns
are removed. This constraint ensures two adjacent bit bound-
aries cannot both be 01-type or 10-type, and is referred as a
forbidden adjacent boundary pattern (FABP) condition [16].
The last two forbidden patterns give the constraint that no
patterns 010 and 101 appear in the codeword, which is referred
as a forbidden-pattern (FP) condition [16]. Codes satisfying
these necessary and sufficient conditions are called one
Lambda codes (OLCs). We denote the largest OLC codebook
size for an n-bit bus as Gn, and Gn is given by
Gn = Gn−1 +Gn−5 (7)
with initial conditions G1 = 2, G2 = 3, G3 = 4, G4 = 5, and
G5 = 7 [18].
With our classification, we explore codes under constraint
(C3, 1C). From Tab. IV, the two largest 5-bit codebooks are
given by C05={0, 3, 14, 15, 24, 30, 31} and C15={0, 1, 7, 16,
17, 28, 31}. An n-bit codebook C(n) can be obtained via
Alg. 1. The number of codewords is given by
|C(n)| = VDn−5(C3,1C)VT for n ≥ 5, (8)
where V is a seven-dimensional all one vector and D(C3,1C)
is a 7 × 7 expansion matrix as shown in Tab. V. We further
establish that the largest codebook sizes under constraint
(C3, 1C) satisfy the recursion:
Lemma III.1. For n ≥ 8, |C(C3,1C)(n)| is given by a recur-
sion |C(C3,1C)(n)| = |C(C3,1C)(n − 2)|+ |C(C3,1C)(n − 3)|,
with initial conditions |C(C3,1C)(n)| =7, 9, 12, for n =5, 6,
7, respectively.
See the appendix for the proof. In fact, we can further relate
these codes with OLCs by the following:
Theorem III.1. The codes under (C3, 1C) have the same
codebooks as OLCs. Hence, Gn = |C(C3,1C)(n)|.
See the appendix for the proof. Theorem III.1 implies that
the codes under constraint (C3, 1C) are equivalent to the class
of OLC codes.
D. Codes Under (C4, 2C)
The (1 + 2λ) codes have a worst-case delay of (1 + 2λ)τ .
No necessary and sufficient condition is known for a code to
be a (1 + 2λ) code. Two sufficient conditions FT and FP are
found, which lead to two families of (1+2λ) codes, FTC and
FPC, respectively. The size of an FTC codebook for an n-wire
bus is given by Fn+2, where Fn is the Fibonacci sequence
that satisfies Fn+2 = Fn+1 + Fn and has initial conditions
F1 = F2 = 1 [6]. The FPCs for an n-wire bus have a larger
codebook size 2Fn+1 [4].
With our classification, we explore codes under constraint
(C4, 2C). From Tab. IV, only one largest 5-bit codebook is
found C05={0, 1, 3, 6, 7, 12, 14, 15, 16, 17, 19, 24, 25, 28,
30, 31}. An n-bit codebook C(n) can be obtained via Alg. 1
by setting C15 = C05 . The number of codewords is given by
|C(n)| = VDn−5(C4,2C)VT for n ≥ 5 (9)
where V is a 16-dimensional all one vector and D(C4,2C) is a
16 × 16 expansion matrix as shown in Tab. V. We further
establish that the largest codebook sizes under constraint
(C4, 2C) satisfy the recursion:
Lemma III.2. For n ≥ 9, |C(C4,2C)(n)| can be simpli-
fied as recursion |C(C4,2C)(n)| = 2|C(C4,2C)(n − 1)
10
|C(C4,2C)(n − 2)| + |C(C4,2C)(n − 4)|, with boundary con-
ditions |C(C4,2C)(n)| =16, 26, 42, 68, for n =5, 6, 7, 8,
respectively.
See the appendix for the proof. Again, we can relate these
codes to existing CACs by the following:
Theorem III.2. The codes under (C4, 2C) have the same
codebooks as FPCs. Hence, 2Fn+1 = |C(C4,2C)(n)|.
See the appendix for the proof. Since FPCs and our codes
under (C4, 2C) can be obtained by excluding D3 plus D4
patterns and C5 plus C6 patterns, respectively, Theorem III.2
is not surprising given that C5 and C6 are the same as D3
and D4, respectively. Theorem III.2 implies that results in the
literature regarding FPCs are also applicable to codes under
constraint (C4, 2C).
E. Codes Under (C5, 3C)
The (1 + 3λ) codes have a worst-case delay of (1 + 3λ)τ ,
which can be achieved if and only if ↓↑↓ and ↑↓↑ are avoided.
So the necessary and sufficient condition for the (1 + 3λ)
codes is that the codebook cannot have both 010 and 101
appearing centered around any bit position, which is referred
as a forbidden-overlap (FO) condition. Codes satisfying the FO
condition are called FOCs. It is shown that the largest FOC
codebook for an n-bit bus is given by Tn+2, where Tn =
Tn−1 + Tn−2 + Tn−3 is the tribonacci number sequence with
initial conditions T1 = 1, T2 = 1, and T3 = 2 [16].
With our classification, we explore codes under constraint
(C5, 3C). Two largest 5-bit codebooks C05={0, 1, 2, 3, 6, 7,
8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 24, 25, 26, 27, 28,
30, 31} and C15={0, 1, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 17,
19, 20, 21, 22, 23, 24, 25, 28, 29, 30, 31} are found. Via
Alg. 1, an n-bit codebook C(n) can be obtained. The number
of codewords is given by
|C(n)| = VDn−5(C5,3C)VT for n ≥ 5, (10)
where V is a 24-dimensional all one vector and D(C5,3C) is
a 24× 24 expansion matrix as shown in Tab. V.
We further establish that the largest codebook sizes under
constraint (C5, 3C) satisfy the recursion:
Lemma III.3. For n ≥ 8, |C(C5,3C)(n)| can be simpli-
fied as recursion |C(C5,3C)(n)| = |C(C5,3C)(n − 1)| −
|C(C5,3C)(n − 2)| + |C(C5,3C)(n − 3)|, with boundary con-
ditions |C(C5,3C)(n)| =24,44,81, for n =5, 6, 7, respectively.
See the appendix for the proof. Again we can relate these
codes to existing CACs by the following:
Theorem III.3. The codes under (C5, 3C) have the same
codebooks as FOCs. Hence, Tn+2 = |C(C5,3C)(n)|.
See the appendix for the proof. Theorem III.3 is not
surprising, since FOCs and our codes under (C5, 3C) can be
obtained by excluding D4 and C6 patterns, respectively, and
D4 and C6 have been shown to be the same. Theorem III.3
implies that results in the literature regarding FOCs are also
applicable to codes under constraint (C5, 3C).
F. Codes Under (C2, 1C)
With our classification, we explore codes under constraint
(C2, 1C). From Tab. IV, the two largest 5-bit codebooks are
given by C05={00000, 00011, 01111, 11000, 11110, 11111}
and C15={00000, 00001, 00111, 10000, 11100, 11111}. An
n-bit codebook C(n) can be obtained via Alg. 1. The number
of codewords is given by
|C(n)| = VDn−5VT for n ≥ 5, (11)
where V is a six-dimensional all one vector and D =
 0 0 0 0 1 10 0 0 1 0 01 0 0 0 0 0
0 0 1 0 0 0
0 1 0 0 0 0
1 0 0 0 0 0


.
We further establish that the largest codebook sizes under
constraint (C2, 1C) satisfy the recursion:
Lemma III.4. For n ≥ 10, |C(C3,1C)(n)| can be simplified as
recursion |C(C2,1C)(n)| = |C(C2,1C)(n− 2)|+ |C(C2,1C)(n−
5)|, with initial conditions |C(C2,1C)(n)| =6, 7, 9, 11, 14, for
n =5, 6, 7, 8, 9, respectively.
See the appendix for the proof.
Lemma III.5. The codebook under (C2, 1C) is a subset of
OLC.
See the appendix for the proof.
G. Pruned Codes Under (C2, 1C)
For (C2, 1C), the restriction on the side wires is more
relaxed than that on the middle wires, which results in larger
worst-case delays for the side wires. Hence, we prune the
CACs under constraint (C2, 1C) by removing codewords with
larger delays on the side wires in order to achieve a smaller
worst-case delay. Since the pruned codes have a smaller delay
than OLCs, we call these pruned CACs improved one Lambda
codes (IOLCs). We obtain IOLCs by first finding an n-bit
codebook via Alg. 1 as in Sec. III-F, and then pruning the
codebook with Alg. 2. To prune the codebook C(n), we search
for maximum subsets of Ci5 (i = 0, 1) with smaller delays
on the side wires. For C05 , two maximum subsets C
0,0
5 ={0,
3, 15, 30, 31} and C0,15 ={0, 15, 24, 30, 31} are found with
smaller worst-case delays on wires 1 and 2 and wires 4 and
5, respectively. For C15 , a maximum subset C
1,1
5 ={0, 1, 7,
16, 31} is found with smaller worst-case delays on wires 4
and 5. Finally, a valid n-bit codebook is obtained with the
leftmost five bits belonging to C0,05 , and the rightmost five
bits belonging to C0,15 or C
1,1
5 depending on whether n is odd
or even.
The pruning algorithm for CACs under (C2, 1C) on an n-
bit bus is shown in Alg. 2. By pruning all codewords cn in
C(n), the algorithm removes codewords with larger delay on
side wires. With Alg. 2, we get an n-bit IOLC under constraint
(C2, 1C), and its size is given by
|CIOLC(n)| = W1Dn−5YWT2 for n ≥ 5, (12)
where W1 = [1 1 1 0 1 1], W2 = [1 0 1 1 1 1], and D is
the same as that in Eq. (11). Note that W1 and W2 are used
instead of V, because of the pruning of valid patterns on side
wires.
11
Algorithm 2 Pruning CACs under (C2, 1C)
Input: C0,05 , C
0,1
5 , C
1,1
5 , C(n);
if n is odd then
i = 1;
else
i = 0;
end if
for ∀cn = (c1c2 · · · cn) ∈ C(n) do
if (c1c2c3c4c5) 6∈ C0,05 or (cn−4cn−3cn−2cn−1cn) 6∈
C1−i,15 then
eliminate cn from C(n);
end if
end for
Output: C(n).
We further establish that the largest codebook sizes of
IOLCs satisfy the recursion:
Lemma III.6. For n ≥ 10, |CIOLC(n)| can be simplified as
recursion |CIOLC(n)| = |CIOLC(n − 2)|+ |CIOLC(n− 5)|,
with initial conditions |CIOLC(n)| =4, 5, 7, 8, 11, for n =5,
6, 7, 8, 9, respectively.
This recursion is the same as that in that in Lemma III.4.
It can be proved in the same fashion as for Lemma III.4, and
hence its proof is omitted.
Lemma III.7. The IOLC codebook is a subset of OLC.
See the appendix for the proof.
TABLE VI
SIMULATED DELAYS OF OUR IOLC, UNPRUNED (C2, 1C ) CODE, AND
OLC [5] FOR A 10-BIT BUS (λ = 12.24 AND τ0 = 1.42PS).
Wire i Delays (ps)IOLCs (C2, 1C) OLCs
1 10.08 5.49 10.55
2 7.03 9.13 2.92
3 9.31 9.31 5.94
4 9.31 9.45 6.09
5 9.59 9.36 10.73
6 9.41 9.41 13.64
7 10.14 10.14 14.06
8 9.65 10.57 14.84
9 8.97 9.14 8.99
10 5.28 13.50 14.84
IV. PERFORMANCE EVALUATION
In this section, we evaluate the performance of CACs based
on our classification with extensive simulations, and compare
them with existing CACs. Each CAC has two key performance
metrics: delay and rate. The delay of a CAC is the worst-
case delay when the codewords from the CAC are transmitted
over the bus. Codebook size and code rate are often used to
measure the overhead of CACs. The codebook size of a CAC
is simply the number of codewords. Suppose a CAC of size
M is transmitted over an n-bit bus, then its rate is defined
as
⌊log
2
M⌋
n
. A CAC of rate k/n implies that n − k extra
wires are used in addition to k data wires so as to reduce the
crosstalk delay. Hence, the code rate measures the area and
TABLE VII
SIMULATED DELAYS OF OUR IOLC, UNPRUNED (C2, 1C ) CODE, AND
OLC [5] FOR A 16-BIT BUS (λ = 12.24 AND τ0 = 1.42PS).
Wire i Delays (ps)IOLCs (C2, 1C) OLCs
1 10.32 13.92 15.95
2 7.43 9.51 10.03
3 9.57 10.88 15.54
4 9.83 10.21 15.75
5 10.16 10.16 15.02
6 10.33 10.34 15.57
7 10.39 10.39 15.70
8 10.23 10.23 15.48
9 9.87 10.25 15.57
10 10.40 10.39 15.66
11 10.34 10.33 15.52
12 10.17 10.21 14.88
13 10.25 10.39 15.85
14 9.98 10.92 15.59
15 9.61 9.62 10.13
16 5.58 13.92 16.11
power overhead of CACs: the higher the rate, the smaller the
overhead. Obviously, there is a tradeoff between the code rate
and delay of a CAC: typically a lower rate code is needed
to achieve a smaller delay. To measure the overall effects of
both rate and delay, we also define the throughput of a CAC
as the ratio of code rate and delay. The assumptions for this
definition are: (1) the clock rate of the bus is determined by
the inverse of the worst-case delay; (2) the throughput of the
bus is linearly proportional to k, the number of data wires.
Since codes under (C3, 1C), (C4, 2C), and (C5, 3C) have
exactly the same codebooks as OLCs, FPCs, and FOCs, their
delay, rate, and throughput are also the same. Under constraint
(C2, 1C), we propose two kinds of codes, unpruned codes and
pruned codes (IOLCs). In the following, we compare their
performance with OLCs in [5] with extensive simulations.
To compare the worst-case delay of our IOLCs, unpruned
(C2, 1C) codes, and OLCs, we simulate two buses, a 10-
bit bus and a 16-bit bus, with all transitions between any
two codewords in their codebooks and obtain the worst-case
delays of each wire. The simulation environment has been
explained in Sec. II-C. Both buses have a length of 5mm, and
τ0 = 1.42ps and λ = 12.24. The simulation results are shown
in Tabs. VI and VII, where for each CAC the largest delays
among all wires are in boldface. As commented above for
unpruned (C2, 1C) codes, the delays of the two outmost wires
are significantly greater than those of other wires. For a 10-bit
bus, the worst-case delays of our IOLC, unpruned (C2, 1C)
code, and an OLC are given by 10.14ps, 13.50ps, and 14.84ps,
respectively. The worst-case delay of our IOLC and unpruned
(C2, 1C) code are 31.67% and 9.03% smaller than that of
the OLC, respectively. For a 16-bit bus, the worst-case delays
of our IOLC, unpruned (C2, 1C) code, and an OLC are given
by 10.40ps, 13.92ps, and 16.11ps, respectively. The worst-case
delay of our IOLC and unpruned (C2, 1C) code are 35.44%
and 13.59% smaller than that of the OLC, respectively.
For all simulations, our IOLCs have better delay per-
formance than OLCs. Although both IOLCs and unpruned
(C2, 1C) codes have almost the same code rate and better
12
TABLE VIII
COMPARISON OF CODEBOOK SIZE AND THROUGHPUT OF IOLC, UNPRUNED (C2, 1C ) CODE, AND OLC [5] (λ = 12.24 AND τ0 = 1.42PS).
# of IOLC (C2, 1C) OLC
wires # of words # of bits Throughput Gain # of words # of bits Throughput Gain # of words # of bits
5 4 2 1.55 6 2 1.10 7 2
6 5 2 1.07 7 2 0.78 9 3
7 7 2 1.02 9 3 1.14 12 3
8 8 3 1.12 11 3 0.84 16 4
9 11 3 1.10 14 3 0.84 21 4
10 12 3 1.10 17 4 1.10 28 4
11 16 4 1.18 21 4 0.88 37 5
12 18 4 1.19 26 4 0.89 49 5
13 23 4 1.03 32 5 0.96 65 6
14 27 4 1.02 40 5 0.95 86 6
15 34 5 1.27 49 5 0.95 114 6
16 41 5 1.11 61 5 0.83 151 7
delay performance than OLCs, the delay performance of
IOLCs is much better than the unpruned (C2, 1C) codes.
With a more advanced technology where the coupling effect
is significant, the improvement of our IOLCs is bigger.
The comparisons of the codebook size between our IOLCs,
unpruned (C2, 1C) codes, and OLCs [5] and the throughput
gain with respect to OLCs are shown in Tab. VIII. The
throughput gain of our CACs with respect to OLCs is given
by the ratio between the throughput of our CACs and the
throughput of OLCs. The codebook sizes of the three codes
are close. In all cases, the difference of the number of bits
between our IOLCs and unpruned (C2, 1C) codes is within 1
bit. The difference of the number of bits between our IOLCs
and OLCs [5] is within 2 bits for n ≤ 16. In respect to
throughput, our IOLCs always have a greater throughput than
OLCs, and their throughput gain ranges from 1.02 to 1.55
for an n-wire bus (5 ≤ n ≤ 16). The unpruned (C2, 1C)
codes have better throughput in some cases than OLCs, and
the throughput gain ranges from 0.78 to 1.10 for an n-wire
bus (5 ≤ n ≤ 16). When unpruned (C2, 1C) codes have a
lower throughput than OLCs, IOLCs can be used.
Our IOLCs and unpruned (C2, 1C) codes provide additional
options for the tradeoff between code rate and code delay. In
addition to achieving higher throughputs, the new CACs are
also appropriate for interconnects where the delay is of top
priority.
It has been shown that the encoding and decoding of OLCs,
FPCs, and FOCs have quadratic complexity based on numeral
systems [11]. Since codes under (C3, 1C), (C4, 2C), and
(C5, 3C) have exactly the same codebooks as OLCs, FPCs,
and FOCs, their CODECs also have quadratic complexity.
Also, it is expected that the encoding and decoding of our
IOLCs and unpruned (C2, 1C) codes have a quadratic com-
plexity, since the codebooks of our IOLCs and unpruned
(C2, 1C) codes are proper subsets of OLCs.
We remark that the simulation results in Sections II-C and
IV are all based on a 45nm CMOS technology. We have also
run the same set of simulations based on a 0.1-µm technology
(omitted for brevity). Between the two sets of simulation
results, the main conclusions of the manuscript and the key
features of our proposed classification and CACs remain the
same. For instance, the delays of the patterns in different
classes do not overlap, regardless of the technology. Also, the
proposed CACs based on the new classification are also the
same. This actually demonstrates that our approach to delay
classification and CACs is applicable to a wide variety of
technology. This is because in our approach, the dependency of
the crosstalk delay on the technology is represented by the two
parameters, the propagation delay τ0 of a wire free of crosstalk
and the coupling factor λ. Since our analytical approach to
the classification and CACs treats these two parameters as
variables, our approach can be easily adapted to a wide variety
of technology.
V. CONCLUSIONS
In this paper, we propose a new classification of transition
patterns. The new classification has finer classes and the
delays do not overlap among different classes. Hence the new
classification is conducive to the design of CACs. To illustrate
this, we design a family of CACs with different constraints.
Some codes of the family are the same as existing codes,
OLCs, FPCs, and FOCs. We also propose two new CACs with
a smaller worst-case delay and better throughput than OLCs.
Since our analytical approach to the classification and CACs
treats the technology-dependent parameters as variables, our
approach can be easily adapted to a wide variety of technology.
APPENDIX
Proof of Lemma III.1: The eigenvalues of D are given
by solving det |λI−D| = 0. Then,
det |λI−D| = 0
⇒ λ7 − λ5 − λ4 = 0
⇒ D7 = D5 −D4
⇒ VD7VT = VD5VT +VD4VT
⇒ |C(n)| = |C(n− 2)|+ |C(n− 3)|.
For n = 5, 6, 7, the boundary conditions can be obtained
by Eq. (8) as |C(5)| = 7, |C(6)| = 9, and |C(7)| = 12. Thus,
the lemma holds for n ≥ 8.
Proof of Theorem III.1: It has been shown that an (n+1)-
bit OLC codebook C(n+1) can be constructed from an n-bit
codebook C(n) [5]. The necessary and sufficient condition
for OLCs defines the same expansion matrix as our codes.
The OLC construction is the same as that of our codes under
(C3, 1C) shown in Alg. 1. For n = 5, the OLC codebooks
13
are the same as our codes under (C3, 1C). So, for an n-bit
bus (n ≥ 5), codes under constraint (C3, 1C) are the same
as OLCs. For an n-bit bus (n ≤ 4), the constraint (C3, 1C)
reduces to 1C, and leads to the same codebooks as OLCs.
Hence, our codes under (C3, 1C) have the same codebooks
as OLCs, which implies that Gn = |C(n)|.
Proof of Lemma III.2: The eigenvalues of D are given
by solving det |λI−D| = 0. Then,
det |λI−D| = 0
⇒ D16 = 2D15 −D14 +D12
⇒ VD16VT = 2VD15VT −VD14VT +VD12VT
⇒ |C(n)| = 2|C(n− 1)| − |C(n− 2)|+ |C(n− 4)|.
For n = 5, 6, 7, 8, the boundary conditions can be obtained
by Eq. (9) as |C(5)| = 16, |C(6)| = 26, |C(7)| = 42, and
|C(8)| = 68. Thus, the lemma holds for n ≥ 9.
Proof of Theorem III.2: It has been shown that an (n+1)-
bit FPC codebook C(n+1) can be constructed from an n-bit
codebook C(n) [4]. The sufficient condition (FP condition)
for FPCs defines the same expansion matrix as our codes.
The FPC construction is the same as that of our codes under
(C4, 2C) shown in Alg. 1. For n = 5, the FPC codebooks
are the same as our codes under (C4, 2C). So, for an n-bit
bus (n ≥ 5), codes under constraint (C4, 2C) are the same
as FPCs. For an n-bit bus (n ≤ 4), the constraint (C4, 2C)
reduces to 2C, and leads to the same codebooks as FPCs.
Hence, our codes under (C4, 2C) have the same codebooks
as FPCs, which implies that 2Fn+1 = |C(n)|.
Proof of Lemma III.3: The eigenvalues of D are given
by solving det |λI−D| = 0. Then,
det |λI−D| = 0
⇒ D24 = D23 +D22 +D21
⇒ VD24VT = VD23VT +VD22VT +VD21VT
⇒ |C(n)| = |C(n− 1)|+ |C(n− 2)|+ |C(n− 3)|.
For n = 5, 6, 7, 8, the boundary conditions can be obtained
by Eq. (10) as |C(5)| = 24, |C(6)| = 44, and |C(7)| = 81.
Thus, the lemma holds for n ≥ 9.
Proof of Theorem III.3: It has been shown that an (n+1)-
bit FOC codebook C(n+1) can be constructed from an n-bit
codebook C(n) [4]. The necessary and sufficient condition
(FO condition) for FOCs defines the same expansion matrix
as our codes. The FOC construction is the same as that of our
codes under (C5, 3C) shown in Alg. 1. For n = 5, the FOC
codebooks are the same as our codes under (C5, 3C). So, for
an n-bit bus (n ≥ 5), codes under constraint (C5, 3C) are
the same as FOCs. For an n-bit bus (n ≤ 4), the constraint
(C5, 3C) reduces to 3C, and leads to the same codebooks
as FOCs. Hence, our codes under (C5, 3C) have the same
codebooks as FOCs, which implies that Tn+2 = |C(n)|.
Proof of Lemma III.4: The eigenvalues of D are given
by solving det |λI−D| = 0. Then,
det |λI−D| = 0
⇒ D6 = D4 −D
⇒ VD6VT = VD4VT +VDVT
⇒ |C(n)| = |C(n− 2)|+ |C(n− 5)|.
For n = 5, 6, 7, 8, 9, the boundary conditions can be
obtained by Eq. (11) as |C(5)| = 6, |C(6)| = 7, |C(7)| = 9,
|C(8)| = 11, and |C(9)| = 14. Thus, the lemma holds for
n ≥ 10.
Proof of Lemma III.5: As shown in Tab. IV, Ci5 under
(C2, 1C) is a subset of Ci5 under (C3, 1C) for i = 0, 1. Thus,
the valid expansions from Ci5 to C1−i5 under (C2, 1C) is part
of that under (C3, 1C). So, for an n-bit bus, C(C2,1C)(n) ⊂
C(C3,1C)(n). According to Thm. III.1, the n-bit codebook
C(C2,1C)(n) is a subset of an OLC codebook.
Proof of Lemma III.7: Since the IOLC codebook is a
subset of the unpruned codes under (C2, 1C), this follows
Lemma III.5.
REFERENCES
[1] [Online], “International technology roadmap for semiconductors,” avail-
able at http://www.itrs.net/Links/2011ITRS/Home2011.htm.
[2] P. P. Sotiriadis and A. Chandrakasan, “Reducing bus delay in submicron
technology using coding,” Proceedings of the Asia and South Pacific
Design Automation Conference, pp. 109–114, February 2001.
[3] P. P. Sotiriadis, “Interconnect modeling and optimization in deep sub-
micron technologies,” Ph.D. Dissertation, Massachusetts Institute of
Technology, 2002.
[4] C. Duan, A. Tirumala, and S. Khatri, “Analysis and avoidance of cross-
talk in on-chip buses,” The Ninth Symposium on High Performance
Interconnects (HOTI ’01), pp. 133–138, August 2001.
[5] C. Duan and S. Khatri, “Exploiting crosstalk to speed up on-chip
buses,” Proceedings of the Conference on Design Automation and Test
in Europe, vol. 2, pp. 778–783, February 2004.
[6] B. Victor and K. Keutzer, “Bus encoding to prevent crosstalk delay,”
Proc. IEEE/ACM International Conference on Computer-Aided Design,
pp. 57–63, 2001.
[7] S. Sridhara, G. Balamurugan, and N. Shanbhag, “Joint equalization and
coding for on-chip bus communication,” IEEE Trans. VLSI Systems,
vol. 16, no. 3, pp. 314–318, March 2008.
[8] X. Wu, Z. Yan, and Y. Xie, “Two-dimensional crosstalk avoidance
codes,” in Proc. IEEE Workshop on Signal Processing Systems (SiPS),
pp. 106–111, October 2008.
[9] C. Duan, C. Zhu, and S. P. Khatri, “Forbidden transition free crosstalk
avoidance codec design,” Proceedings of annual Design Automation
Conference, pp. 986–991, 2008.
[10] C. Duan, V. H. C. Calle, and S. P. Khatri, “Efficient on-chip crosstalk
avoidance codec design,” IEEE Trans. VLSI Systems, vol. 17, no. 4, pp.
551–560, April 2009.
[11] X. Wu and Z. Yan, “Efficient CODEC designs for crosstalk avoidance
codes based on numeral systems,” IEEE Trans. VLSI Systems, vol. 19,
no. 4, pp. 548–558, April 2011.
[12] F. Shi, X. Wu, and Z. Yan, “Improved analytical delay models for
coupled interconnects,” in Proc. IEEE Workshop on Signal Processing
Systems (SiPS), pp. 134–139, October 2011.
[13] T. Sakurai, “Closed-form expressions for interconnection delay, cou-
pling, and crosstalk in VLSI’s,” IEEE Transactions on Electron Devices,
vol. 40, no. 1, pp. 118–124, January 1993.
[14] [Online], “Pdk for the 45nm technology,” available at
http://www.eda.ncsu.edu/wiki/FreePDK.
[15] ——, “Predictive technology model (ptm),” available at
http://http://ptm.asu.edu.
[16] S. Sridhara, A. Ahmed, and N. Shanbhag, “Coding for reliable on-
chip buses: A class of fundamental bounds and practical codes,” IEEE
Transactions on Computer Aided Design Integrated Circuits System,
vol. 26, no. 5, pp. 977–982, May 2007.
[17] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide
to the Theory of NP-Completeness. W. H. Freeman and Company, New
York, 1979.
[18] S. R. Sridhara, A. Ahmed, and N. R. Shanbhag, “Area and energy
efficient crosstalk avoidance codes for on-chip buses,” in Proc. Int.
Conference on Computer Design, pp. 12–17, 2004.
