Staggered latch bus: A reliable offset switched architecture for long on-chip interconnect by Eze, M. et al.
Staggered Latch Bus: A Reliable Offset Switched
Architecture for Long On-Chip Interconnect
Melvin Eze
Dept. of Comp Sci and Eng.
Pennsylvania State University
University Park, PA 16802, USA
email: eze@cse.psu.edu
Ozcan Ozturk
Dept. of Comp Eng.
Bilkent University
06800 Bilkent, Ankara, Turkey
email: ozturk@cs.bilkent.edu.tr
Vijaykrishnan Narayanan
Dept. of Comp Sci and Eng.
Pennsylvania State University
University Park, PA 16802, USA
email: vijay@cse.psu.edu
Abstract—Due to architectural complexity and process costs,
circuit-level solutions are often the preferred means to resolving
signal integrity issues that affect the performance and reliability
of on-chip interconnect. In this paper, we consider multi-segment
bit-lines used in wide on-chip interconnect, and explore in detail
the effect of signal transition skew on the delay and time of flight
in the presence of crosstalk. We present the relationship between
segment delay, signal transition skew and the injected noise pulse
and propose a novel staggered latch bus architecture to explicitly
exploit transition skew for improved speed and performance. Our
proposed SLB architecture achieves an average of 2.5X (2.3X)
improvement in speed for fully-aligned (mis-aligned) buffering
schemes with no increase in area, power or additional wires
needed.
I. INTRODUCTION
Feature scaling has been key to sustaining the exponential
growth in chip performance over the decades. However, as on-
chip dimensions cross the 100 nm threshold, various signal
integrity challenges are threatening to limit this trend [1].
The adoption of high aspect ratio metal layers to mitigate
the inverse relationship between process scaling and metal
resistance, leads to the formation of large implicit coupling
capacitances between physically separated metal traces. These
large capacitances create non-negligible electrical interference
and crosstalk noise that can distort signals on neighboring
metal wires. The effect of this crosstalk noise is particularly
critical in the design and performance of multi-bit on-chip
interconnect structures such as buses, network links, memory
bit-lines which provide communication between functional
blocks, memory elements, I/O pins etc.
Crosstalk induced delay refers to the effective switching
speed of a coupled metal line due to signal activity on neigh-
boring lines. Line delay will vary depending on the specific
direction and the relative temporal overlap of the neighboring
transitions. Traditionally, the use of simultaneous latching in
synchronous circuits forces a built-in temporal overlap in
signal transitions. When transition directions on neighboring
lines coincide, line delay is reduced below-nominal, when
they diverge, line-delay increases above nominal, otherwise
line-delay is nominal. In multi-bit interconnect design, worst-
case delay margins are necessary to guarantee transmission
reliability with a small performance tradeoff. Future technol-
ogy nodes promise increasingly higher inter-metal coupling,
this will require larger delay margins in order to guarantee
reliability but also even larger tradeoffs.
ITRS Data on Cu Interconnect 
Tech (nm) 16 22 32 45 
Cc/Cm ratio (!) 2.0 1.7 1.5 1.2 
Permittivity ("r) 2.3 2.5 2.8 2.9 


























At  = 0
Negative Offset
  < 0
Positive Offset






Fig. 1. Delay as a function of inter-signal offset for a 1mm copper line with
both neighbors switching in opposite direction.
In this paper, we analyze the effect of offset switching
on line delay in comparison to the traditional simultaneous
switching approach. We show that offset switching is superior
in a context of highly coupled, segmented bit-lines. We further
present a novel staggered latching mechanism for seamless
synchronous operation and demonstrate the improved error
rates over traditional methods. Our solution requires zero
additional wires, and no appreciable increase in power and
area. The rest of this paper is organized as follows: Section
2 discusses some selected related work, Section 3 introduces
the effect of switching offset in electrically coupled bit-lines.
Section 4 introduces staggered latching as a novel clocking
scheme. Section 5 presents experiments and results and Section
6 concludes the paper.
II. RELATED WORK
Signal crosstalk in on-chip interconnect due to adjacent-
wire capacitive coupling, has received much interest and
attention in the literature. Efficient methods for extracting
and characterizing wire resistance, ground and coupling ca-
pacitance for both local and global wires are well known
[2]. Closed form expressions for modeling local interconnect
delay in the presence of coupling, have been proposed and
numerically efficient methods for electronic design automation
(EDA) purposes have also been published [3], [4]. The use of
miller capacitance models for inter-metal coupling capacitance
978-1-4799-0524-9/13/$31.00 ©2013 IEEE


















Fig. 2. Segment with maximum overlap to its adjacent neighbors. Transitions
at A, will generate a response transition at B composed of the superposition
of the direct response and any noise injected from adjacent segments
as proposed for fast delay calculations in [5], introduce non-
negligible inaccuracies with feature sizes below 50 nm. As a
result, complex noise superposition models, which have been
shown to offer more reliable delay estimates in the presence
of crosstalk, have been developed [4], [6]. In [7], closed form
expressions for the total noise waveform due to all active
neighbors of a wire was proposed for local wires but is
seamlessly extendable to long wire interconnect structures.
Crosstalk induced delay is skew-dependent. Skew-
dependent delay fluctuations are due to variations in the
temporal overlap between a transitioning signal and the noise
waveform from various neighboring aggressors, the larger the
overlap, the larger the change in the delay [8]. In [9], a similar
bus delay reduction technique is proposed to deliberately
introduce transition skew between adjacent wires on a bus.
They used the miller capacitance method and assumed a one-
way aggressor-victim model for the key delay analysis. This
approach is an over-simplification of the multi-way aggressor-
victim reality and as a result, it is insufficient for sub-50 nm
processes.
Our proposed approach makes two key contributions: First,
using the noise model proposed in [7] for the crosstalk noise
waveform, we propose an efficient corresponding delay model
as a function of inter-signal input skew for a typical segment
in a multi-segment interconnect (MSI). Second, unlike in [9],
we propose staggered latching, a novel synchronous clocking
strategy that efficiently leverages skewed switching with no
additional bus wire overhead. This improves performance in
the presence of large coupling capacitance in wide MSI.
III. SIGNAL SWITCHING OFFSET AND LINE DELAY
In this section, we develop analytical models for line delay
as a function of the switching offset between closely spaced
metal traces.
A. Signal Response Model
The signal delay of an n-bit MSI is limited by the slowest
bit-line. The response time on the slowest line depends on the
resistance and capacitance measured both to the ground plane
and to the adjacent lines.
A generalized coupling structure for a multi-segment inter-
connect is shown in Figure 2. The variable m represents the
mis-alignment factor between adjacent segments on neighbor-



















v4 = 1-v2(t-T0) 
Fig. 3. Lumped RC model for the general coupled segment. Immediate
neighbors are shown, farther segments are assumed grounded
allows for the modeling of either a Fully-aligned or Mis-
aligned strategy respectively. The test segment of interest is
in the middle. The goal is to obtain analytical expressions of
the overlap threshold for both arrangement strategies.
B. Nominal Response
An RC model for the coupled segments in Figure 2 is
shown in Figure 3. The total response vb in Figure 3 is a
superposition of the direct response due to the primary input
va, and the total noise injected by the secondary inputs v1, v2,
v3 and v4 through the coupling capacitances. We first obtain
the noise-free response and the analytical expression for the
corresponding nominal delay.
1) Drive buffer/Repeater: The segment driver is a large
inverter with minimum length (λ) sized transistors. The pull-up
PMOS transistor of size Wp is selected to match approximately
the performance of the pull-down NMOS transistor of size Wn.
If we define the ratio of transistor widths Xp = Wp/Wn, and
the capacitance per square for a minimum length MOSFET,
(C ′ox), we can obtain from ITRS data [1], Figure 1 and
HSPICE characterization runs the values shown in Table III-A.
The gate capacitance (Cgate), diffusion capacitance (Cdiff ),
and drive resistance (Rdrv) are then modeled by equations:
Cgate = 1.5 · C ′oxWn(1 + Xp), Cdiff = C ′oxWn(1 + Xp),
Rdrv = R
′
n/Wn. The characteristic driver delay (tDdrv) is
given by the model in equation 1.
tDdrv = 2.5 ·R′nC ′ox(1 +Xp) (1)
2) Metal trace: The direct response is obtained from an
s-domain analysis of the circuit in Figure 3 . The secondary
inputs are set to zero, and a unit step is applied to the primary
input va. The product of the total lumped resistance (RT ) and
the lumped capacitance to ground (CT ) is the intrinsic rc-
constant (τ) of the line segment. The coupling capacitance is
defined in terms of CT and a weighting factor (η), Cc = η·CT .
VbN (s) = ±
1 + s(1 + 2η)τ
s ·D(η)
(2)






ox pm arm arv
45 nm 8 18.4k 3.3 54.7 aF 102n 1.8 1.6
32 nm 6 22.8k 3.2 32.4 aF 61n 1.9 1.7
22 nm 10 28.7k 1.7 20.8 aF 43n 2.0 1.8













Fig. 4. General shape and characteristics of the Injected noise pulse for (a)
aligned segments and (b) misaligned segments
where τ = RTCT and in the denominator
D(η) =
(
1 + 2s(1 + 2η)τ + s2
(





The total resistance RT , shown in equation 4 is the sum of the
metal resistance Rm and the driver switching resistance Rdrv.
Likewise, the total capacitance to ground for a given segment,
shown in equation 5 is the sum of the contributions from the
metal and the driver.
RT = Rm +Rn/Wn (4)
CT = Cm + 2.5 · C ′oxWn(1 +Xp) (5)
Applying the Inverse Laplace Transform to VbN (s) we obtain
the general, normalized, time-domain signal form
vbN (t) = ±(1−A0 · e−t/τG0)u[t] (6)
where the constants G0 and A0 are obtained via padé approxi-
mation and coefficient matching of the s-domain polynomials.
See table III-C1. The nominal delay is obtained by solving for
the V dd/2 crossing point of equation 6.
tDnom = G0τ · ln(2A0) (7)
C. Noise Response
The RC model in Figure 3 is easily modified for noise
signal extraction by grounding the primary input va and driving
the secondary inputs v1, v2 and v3, v4 with unit step signals.
Note however, that due to the use of inverting repeaters,
compared to v1 and v2, the transition direction of v3(v4) is
opposite and shifted in time by T . The injected noise vb is thus
comprised of two components, one in-phase, the other counter-
phase, see fig. 4(b). The noise transient parameters depend on
the segment RC characteristics, the coupling capacitance (Cc),
and on the actual number of switching neighbors (SW ). For
planar 2D layout with a maximum of two closest neighbors (i.e
SW = 0,1,2), the general noise response in eqn. 8 is obtained
from an s-domain analysis of the modified fig. 3 circuit.









If we define a generalized noise pulse response for the variable
t and the model constants τ and G1(η) as vη(t) in eqn 9
vη(t) = ± t · e−t/τG1(η) · u[t] (9)
Then the Inverse Laplace Transform of eqn. 8 yields a corre-
sponding normalized, time-domain noise response vb in eqn.
10.
vb(t) = A1(ηA) · vηA(t)−A1(ηM ) · vηM (t− T ) (10)
TABLE II. SUMMARY OF MODEL CONSTANTS AS FUNCTION OF η
G0 = 1 + 2η A0 =
1 + 4η + 5η2
1 + 4η + 6η2
G1 =




(1 + 2η)(1 + 4η + 2η2)
G2 =
1 + 2η(2 + 3η + SW(1 + 2η))
1 + (2 + SW)η
A2 =
1 + (2 + SW)η
G2
G3 =





In general, ηA = (1 − m) · η and ηM = m · η. However,
focusing on fully-aligned (m = 0) or mis-aligned (m = 0.5),
the model constants A1 and G1 are obtained by substituting
(η = ηA) or (η = ηM ) in the expressions in Table III-C1.
1) Noise Duration: The noise pulse vb in eqn 10 has a
last crossing time z0, last absolute maximum (ñ) at time t̃.
The pulse has a duration (dk) measured in terms of non-zero,
integer (k) multiples of G1τ , i.e. for a specified noise limit
(Nlim), and for all integers k larger than k0, vb(dk) ≤ Nlim .






These parameters can be calculated from vb(t) for any chosen
value of m.
t̃ = z0 +G1τ , z0 =
{
0 m = 0
T + T/(eT/G1τ − 1) m = 0.5
(12)
The constant T is obtained by analyzing the circuit models in
fig 2 and fig 3.
T = tDdrv +G0τ · ln(2A0) (13)
Now, If we also express the time shift in terms T = j ·G1τ ,
where j > 0, then the value of the last maximum value (ñ) of
|vb(t)| can be calculated, see eqn 14.
ñ =

A1G1τ |1/e| m = 0
A1G1τ
∣∣∣∣(1− ej/e) · e−j( ejej−1)∣∣∣∣ m = 0.5 (14)
D. Delay, Offset and Overlap Threshold
In general, signal delay on the middle segment in figure 2 is
defined as the time difference between the last 0.5Vdd crossing
points measured from the signal va to vb. Since the signal
vb is a superposition of the direct response and the injected
noise, the signal delay is a function of the degree of temporal
overlap between them. If we define a variable alpha (α) as the
offset between switching events at the input, of the signal va
and any adjacent segments, then the signal-to-noise overlap at
the output vb, and consequently the signal delay tD(α) can
be expressed in terms of α. For large enough absolute offset
values, the overlap at the output between the transition event
of the direct response and the duration of the injected noise
is zero. This results in a signal delay that is indistinguishable
from a noise free delay. The smallest absolute offset value for
which this condition is true is defined as the offset Overlap
Threshold (αOS). We can calculate this value by solving for t
using the normalized voltage eqn 15.
vb(t, α) = r(t) + n(t− α) = 0.5 (15)
Using an intermediate variable sigma (σ), we can define a
parametric relationship r(t(σ)) = 0.5 − n(σ)). The signal
r(t) is the noise free response from eqn 6. The noise signal
n(t) depending on the design, is either the fully aligned or
misaligned noise pulse signal from eqn 10. Solving for t and
α in terms of the variable σ, we obtain the parameterized delay
and offset eqns 16
t(σ) = tDnom +G0τ · ln (1/(1− 2 · n(σ)))
α(σ) = t(σ)− σ (16)
For a given design and a specified noise limit Nlim, the cor-
responding dk can be obtained using equation 11. Substituting
into eqn 16 the following values: σ = dk and n(dk) ≤ Nlim
we obtain an expression for the overlap threshold for a chosen








In any MSI, regardless of alignment strategy, αOS represents
the minimum, mutual signal-transition offset between any set
of coupled segments that assures nominal signal delay on
both segments. For comparative analysis, the 0−90% segment
transition time (αSS) is derived for simultaneously switched
MSI using eqn 6 and the constants from Table III-C1. The
constant tuple (G,A) for noise free nominal transition is chosen
as (G0,A0). For noisy transitions, (G2,A2) and (G3,A3) are








The worst case segment delay for simultaneous/offset switch-
ing considering all coupling noise is shown in equation 19
tDmax ≤
{




G0τ · ln(2A0/(1− 2 ·Nlim)) OS
(19)
Using these analytical models, the potential speedup can be es-
timated for an M5, 0.25mm long, 6 line (4-signal, 2-grounded
dummy), 5-segment MSI, using only metal and drive buffer
RC parameters from current/predictive BEOL processes, see
Table III-A. Choosing Nlim = 0.05, a data stream with regular
gaussian transition distribution fig. 6, setting offset > αM ,
the model estimates an average speedup of 2.05X(1.70X) over
an SS-MSI for an OS-MSI-aligned(misaligned). Table III-D
shows the OS speedup results compared to FSS = tD−1max
across sub-50nm processes.
IV. MULTIPLE PHASE STAGGERED LATCHING
In this section, we propose a b-bit wide Multiple Phase
Staggered Latch (MPSL[b]) interconnect architecture that ex-
ploits offset switching to achieve improved crosstalk perfor-
mance.
TABLE III. PREDICTED FREQ SPEEDUP (OS/SS) FOR A 5-SEG OS BUS
Aligned MSI Misaligned MSI
Tech(λ) FSS (Hz) OS/SS αM FSS (Hz) OS/SS αM
45 nm 0.96G 1.8X 92ps 1.44G 1.53X 89ps
32 nm 0.76G 2.0X 97ps 1.14G 1.68X 90ps
22 nm 1.08G 2.1X 64ps 1.63G 1.75X 58ps











LATCH CONTROL SIGNALS 
LATCH CONTROL CIIRCUIT 
ϕb!ϕb,1! ϕb,1! ϕb,b-1!ϕm! ϕm!
MSI  
b-BITLINES 










SEND - SIDE RECV - SIDE 
ϕb,2!
reset!
l - l/b 
b-1!
Fig. 5. b-bit Multiple Phase Staggered Latch (MPSL) architecture for
systematic offset switching in synchronous data transmission
The top level architecture of an MPSL interconnect, shown
in figure 5, has two key sets of clocked latches: the Interfacing
(IF) latches and the Offset-Tuning (OT) latches. The IF latches
are collectively two sets of single-stage latches, b-bits wide,
each set placed at a boundary to bridge the clock transition
points of the enclosed structure with the send-side and the
receive-side logic. They are included to provide (where ab-
sent) explicit electrical isolation and signal racing avoidance.
The OT latches connect the IF latches with the physical bit
lines. Using numbered bit positions [1,2,..,j,j + 1,...,b], each
individually contains exactly b total latch stages. Spcifically,
the OT latches are arranged on the j-th bitline such that j
and (b − j) latches are placed at the send-side and receive-
side respectively. This results in a staggered configuration
and effectively achieves offset insertion at the send-side and
resynchronization/offset removal at the receive-side. Note that
the total number of latches traversed, end-to-end, for each
bit position is exactly equal. The parallel MSI bit lines that
form the physical connection between the send and receive
side can be arranged either in an aligned or in a misaligned
configuration.
All latches are two-state, sample/hold, clock level-sensitive
latches. The latch control signals are periodic with identical
period (Tclk). However, Tclk is sub-divided into multiple
phases and specific clocking signals are generated to operate
the MPSL structure. For the IF latches, a two-phase control
signal identical to the system clock signal is used to control
data ingress and egress. For the OT latches, all stages use a
b-phase control signal. In order to implement offset switching
however, a stage dependent phase offset is added to the control
signals between consecutive OT latch stages, forcing a b-by-
1 bit transmission/reception exclusivity across the b-bit wide
physical bit lines.
A. Clocking and Latch Control
At each bit position, the critical latch stage from a timing
perspective is the last latch before the X-segment MSI bit line.
Therefore, the relationship of clock period Tclk to this latch
stage, across all bit positions determines the performance of
the MPSL interconnect. For a general b-bit design, with i
consecutive bits-in-flight (biF ), if the MSI has a maximum
bit line delay (tDM ) and a bit-to-bit minimum separation
(αM ) at each position we can calculate key parameters. For
an X-segment bit line with SWmax as the maximum possible
number of switching neighbors, we use equation 19, for
Data Properties W = (w1, w2, w3) 
Pattern Distribution Stacked MPSL[2]  Stacked MPSL[3] 
Random Uniform (0.33, 0.33, 0.33) (0.50, 0.50, 0) 
Regular Gaussian (0.18, 0.65, 0.18) (0.50, 0.50, 0) 
Burst Skewed 
Gaussian 








Fig. 6. Statistical model constants for random, regular or burst data patterns
in stacked MPSL[b]. (b) Double-level-sensitive latch with reset.
the worst case segment delay tDmax, and with tDdrv from
equation 1, we obtain the maximum bit line delay.
tDM = X(tDdrv + tDmax) (20)
For the minimum bit-to-bit separation αM , clock period Tclk
in b-phases, if we use b = 1 for simultaneous switching, we
can write in general a scalar dot product of two vectors W
and αSW shown in equation 21
αM = W · αSW (21)
Where αSW=[αSW0 , αSW1 , αSW2 ] is the array of offset thresh-
old values, from eqns 17, 18, associated with noise injections
from neighboring switching activity. The vector W contains
the weight of each threshold value derived from the statistical
distribution of transitions in a data stream. We also obtain that




≤ T ≤ tDM
i
where T 6= Tclk & i ≥ 1 (22)
The Latch Control Circuit (LCC), generates the actual multiple
phase control signals for the IF and OT latch stages. It
is composed of l double-level-sensitive (DLS), latches (with
configurable reset), where l is the least common multiple
LCM(2, b). A DLS latch samples its input and holds its output
in every phase of the control clock signals period. l is chosen
to guarantee latching synchronization at the clock boundary
between the 2-phase IF latches and the b-phase OT latches. The
l DLS latches are connected in a single loop and controlled
by a single clock signal (φ) with a phase time (pφ) where
pφ = Tclk/l. Configuring the reset-mode of the LCC latches,
by setting the first l/2 (or l/b) as reset-to-one and the rest as
reset-to-zero, the 2-phase (b-phase) signal φm (φb) are easily
generated. All other latch signals are variants of the primary
(φm, φb) signals with one (or more) added phase offset. They
are easily generated via appropriate taps along the length of the
DLS latch loop of the particular multi-phase LCC. Note that
the phase time (pb = pφ·(l/b) ≥ αM ) for the b-phase signals is
the switching offset inserted between consecutive bit positions
in the MPSL[b] structure. For hardware implementations, each
output tap of the LCC circuit shown in fig. 5 can be distributed
to the specific latch stage via a delay equalized buffer tree
network (not shown).
B. Staggered Latch Bus (SLB)
The MPSL implementation of an N-bit bus is the stacked-
MPSL[b], where N is subdivided into b-bit sections, with each
assigned to an MPSL[b]. The simplest form is the stacked-
MPSL[2] or Staggered Latch Bus (SLB). In this configuration,





































Fig. 7. Comparing (a) the classical SS n-bit bus and (b) the proposed stacked
MPSL[2] or Staggered Latch Bus (SLB)
and likewise the signals φm and φb,1. No additional logic area
is required and an explicit LCC is therefore not necessary.
V. EXPERIMENTS AND RESULTS
In this section, we compare the data transmission error
rates of two switching methods: simultaneous switching (SS)
and offset switching (OS) over an increasing clock frequency.
SS is the traditional strategy widely used in synchronous
interconnect design while OS will be based on the MPLS
architecture. We present an experimental validation of a 32
bit MPSL, in an SLB-16 configuration and analyze the design
cost with the aid of various tools. Although the outputs (φb and
φb,1) of a 2-bit long LCC are identical to the logic clocks (φm
and φ̄m) respectively, an explicit LCC (only needed for b > 2)
is included in the experiment for completeness. Our approach
combines trace data and detailed HSPICE simulations.
For the HSPICE simulations, the MSI setup consists of
two planar arrays of 32, 5-segment, closely spaced parallel
bit lines, one array with fully-aligned segments the other with
misaligned segments. Each bit line segment consists of a strip
of M5 copper, 0.25mm long, driven by an optimally sized
inverting buffer. Metal sizing, spacing, resistivity and inter-
metal dielectric constants, are taken from the ITRS forecast
[1]. Device model files for 45 nm Predictive Technology Model
(PTM) process [10] are used for the buffer. The electrical
model for the wire resistance and ground capacitance were
distributed-π RC sections, with the coupling capacitances
between corresponding sections on adjacent segments similarly
modeled.
Bit Error Rates (BER) per word versus data clock fre-
quency (fclk = 1/Tclk) comparisons are performed for a 5
segment, 45 nm MSI and shown in figure 8(a). Operated in
either single or multiple biF mode, an SLB-16 based on the
OS scheme shows a 2.5X improved speed over a similarly
sized traditional SS scheme. When MSI-misaligned segments
are used, figure 8(b), we also obtain good results up to 2.1X
speedup compared to misaligned SS. Note that multiple biF
(2-biF) modes of operation are possible, this allows support
for even higher operating frequencies. The eye diagram in
figure 8(d) illustrates this, it shows an SLB-16 MSI-misaligned
bus operated in 2-biF mode demonstrating a 60% approximate
eye opening at an approximate data clock frequency of 5
GHz. At similar frequencies, the eye diagram in figure 8(c)
shows the inability of the SS MSI-misaligned bus to match
the performance of an OS MSI misaligned bus.
Scaling the SLB-16 design to 32, 22, and 16 nm nodes,
similar BER vs frequency comparative analysis between SS
and OS scheme were performed. A summary of the results
















































































































Fig. 8. Bit Error Rate (BER) and Eye diagram analysis for a 32-bit, 5 segment
per bit line interconnect in 45 nm tech: (a) and (b) BER vs Frequency (1-6
GHz) comparing simultaneously (SS) and offset switching (OS) (c) and (d)
Eye opening @ 5GHz for SS and OS respectively using mis-aligned segments
shown in figures 9(a) and 9(b) demonstrate similar perfor-
mance improvements, for both aligned and misaligned MSI
with an average speedup of 2.5X and 2.3X respectively. This
result is obtained with average dynamic power gain of about
0.5 dB, figures 9(c), 9(d), a slight deviation from 0 dB
primarily due to the use of (optional) LCC latches.
Although similar as a comparative measure, the difference
in nominal values between the simulated average speedup
2.5X(2.3X) and the predicted values 2.04X(1.70X) presented
in section III-D, is attributable to the constraint imposed
by the selection of Nlim used in the analytical model. On
the contrary, the maximum operating frequency reported here
in the simulation results indicates the speed fclk where the
BER per word first exceeds zero. Nevertheless, for quick
design space exploration especially across process nodes, the
analytical model provides a realistic, efficient speedup estimate
for offset-switched MSI designers and EDA tool vendors.
In general, MPLS[b] based designs for b > 2 require
an explicit LCC, careful control signal distribution planning,
additional latch hardware and area. This is unnecessary for the
MPSL[2] based SLB used in the experiment. Note that except
for the latch rearrangements, the total latch count and control
signals in the SLB are identical to the latch count and clock
signals respectively in a traditional SS bus.
VI. SUMMARY AND CONCLUSIONS
In this paper, we explored offset-switched interconnect, its
performance, power and area characteristics. We proposed a
staggered latch bus as a simple implementation of a more
general multi-phase staggered latch interconnect architecture.
We performed a comparative analysis with the classical simul-
taneously switched interconnect. The results show that offset





























































































































Fig. 9. Plots of SS vs. OS for operating frequency and power on a 32-bit, 5
segment aligned and misaligned MSI in sub-50nm technologies: (a) and (b)
max clock frequency and SS-OS speedup (c) and (d) Power and gain.
switching in the form of the simple SLB can achieve over 2X
improvement in line delay for a given line length, segment size
with no appreciable increase power, or need for extra wires.
ACKNOWLEDGMENT
This work was supported in part by NSF Awards 1205618,
0916887, 1213052
REFERENCES
[1] ITRS, “Interconnect,” in International Technology Roadmap for Semi-
conductors, 2011.
[2] T. Sakurai, “Approximation of wiring delay in mosfet lsi,” Solid-State
Circuits, Jan 1983.
[3] Sakurai, “Closed-form expressions for interconnection delay, coupling,
and crosstalk in vlsis,” Electron Devices, IEEE Transactions on, vol. 40,
no. 1, pp. 118 – 124, 1993.
[4] T. Xiao and M. Marek-Sadowska, “Efficient delay calculation in pres-
ence of crosstalk,” Quality Electronic Design, 2000. ISQED 2000.
Proceedings. IEEE 2000 First International Symposium on, pp. 491
– 497, 2000.
[5] J. Rubinstein, P. Penfield, and M. A. Horowitz, “Signal Delay in RC
Tree Networks,” Computer-Aided Design of Integrated Circuits and
Systems, IEEE Transactions on, vol. 2, no. 3.
[6] Devgan, “Efficient coupled noise estimation for on-chip interconnects,”
in Proceedings of IEEE International Conference on Computer Aided
Design (ICCAD).
[7] L. Chen and M. Marek-Sadowska, “Closed-form crosstalk noise metrics
for physical design applications,” Design, Automation and Test in
Europe Conference and Exhibition, 2002. Proceedings, pp. 812 – 819,
2002.
[8] M. Celik, L. Pileggi, and A. Odabasioglu, IC interconnect analysis, Jan
2002.
[9] K. Hirose and H. Yasuura, “A bus delay reduction technique considering
crosstalk,” Design, Automation and Test in Europe Conference and
Exhibition 2000. Proceedings, pp. 441–445, 2000.
[10] NIMO-Group-ASU, “Predictive technology model,” http://ptm.asu.edu,
2012.
