Evaluation of Energy-Recovering Interconnects for Low-Power 3D Stacked ICs by Asimakopoulos P et al.
Evaluation of Energy-Recovering Interconnects for
Low-Power 3D Stacked ICs
P. Asimakopoulos1, G. Van der Plas2, A. Yakovlev1 and P. Marchal2
1School of EECE, Newcastle University, UK
2IMEC, Belgium
Abstract—Energy-recovering schemes have been proposed in
the literature as an alternative approach to low-power design,
while their performance has been demonstrated to be extremely
promising when driving large capacitive loads, such as clock
distribution networks [1]. This work investigates the potential
of the energy-recovering methodology for improving the energy
efficiency of through-silicon via (TSV) interconnects in 3D ICs.
I. INTRODUCTION
Energy dissipation is a major concern for battery-powered
mobile systems. In 3D stacked ICs, TSV interconnects enable
low-parasitic direct connections between tiers and can allow
for considerable energy savings when compared to traditional
PCB chip-to-chip interconnections [2]. However, TSV par-
asitic capacitance can still become an important source of
energy dissipation in large, densely interconnected 3D SoCs,
since the combined capacitance and thus the energy required
to drive TSVs, will increase linearly with the number of tiers
and interconnections.
Energy-recovering logic has demonstrated great potential
when driving large capacitive loads and circuits utilizing this
technique have been successfully implemented in the past [1],
[5]. Energy-recovering designs can break the C · VDD energy
limit of conventional static CMOS, by spreading out charge
transfer more evenly over an entire switching period and
thus making energy dissipation proportional to the operating
frequency [3]. The result is very low energy dissipation which
can asymptotically approach zero at low operating frequencies
[4].
In this paper, an energy-recovering configuration for 3D ICs
is presented and an analysis is attempted based on theoretical
models. The proposed circuit is evaluated against conventional
static CMOS, while the energy efficiency dependence on
design parameters is extracted.
II. PROPOSED CONFIGURATION
In energy-recovering systems, load capacitances (CL) are
typically driven by resonant sinusoidal waves which charge
nodes and recover part of the charge in the falling half-period
of the wave. It can be proven [6] that if a resistive load R is
present in the current flow path, the energy dissipated on that
load during a full charge/discharge cycle ( 1
f
) would be:
EDISS =
pi2
2
(RCLf)CLV
2
DD (1)
If an equivalent capacitive load was driven by conventional
CMOS logic while switching at the same frequency f , the
energy dissipation per cycle would be:
ECONV =
1
2
CLV
2
DD (2)
Therefore, as long as the switching frequency is f < 1
pi2RCL
,
energy-recovering designs can save energy when compared to
conventional CMOS logic.
In a 3D IC, TSVs will provide an interconnection path
for signals crossing adjacent tiers. Since TSVs can have an
appreciable parasitic capacitance [7], which will increase as
additional TSVs are connected in series, ratioed buffer stages
will be required so as to ensure a sharp-rising signal at the
output of the TSVs. To reduce energy dissipation, an energy-
recovering configuration could replace the required buffering
stages allowing only sinusoidal signals crossing TSVs and thus
saving energy according to (1).
In the proposed 3D interconnecting configuration (Figure
1), each tier is clocked with the assistance of conventional
buffer stages whereas data signals, prior to crossing a TSV,
are converted to sinusoidal waves with the use of adiabatic
drivers (Figure 2).
Tier1
Tier2
Tier3
f-f
f-f
TSV
ADIABATIC
DRIVER
DATA
IN
DATA 
OUT
P2LC
CLK 
IN
CLK
BUFFER
P2LC
TSV TSV
TSV TSV
TSV
Figure 1. Proposed configuration.
The excellent energy efficiency of sinusoidal charging and
energy recovery can considerably reduce energy dissipation
for interconnecting signals. At the same time, compatibility
with digital logic is retained by converting locally in each tier,
978-1-4244-4512-7/09/$25.00 ©2009 IEEE
978-1-4244-4512-7/09/$25.00 ©2009 IEEE
Authorized licensed use limited to: Newcastle University. Downloaded on May 21,2010 at 15:25:58 UTC from IEEE Xplore.  Restrictions apply. 
IN IN
ININ
OUT OUT
Figure 2. Adiabatic driver.
the resulting sinusoidal dual-rail pulses back to level signals
through Pulse-to-Level converters (P2LC) (Figure 3).
VDD
ININ
OUT OUT
Figure 3. A Pulse-to-Level Converter implementation.
III. ANALYSIS
Sinusoidal waveforms can be efficiently generated with
the use of a single resonant-tank inductor [8], which when
combined with the adiabatic driver’s resistance (RTG) and the
TSV capacitance (CTSV ), forms an RLC oscillator (Figure 4)
resonating at:
f =
1
2pi
√
LindCTSV
(3)
VDD/2 Adiabatic
Driver
L
C
TSV
R
ind
M1
R
TGATE
T=1/f
Figure 4. Resonant pulse generator.
The bulk of the energy dissipation in the proposed configu-
ration will occur on the adiabatic driver, the inductor’s parasitic
resistance (Rind) and transistor M1. Since the sinusoidal pulses
will have to be converted back to level-signals after crossing
a TSV, the P2L converters will contribute to the total energy
dissipation as well.
A. Adiabatic driver
The adiabatic driver used in this configuration (Figure 2)
is based on adiabatic circuits presented in [9], [10], with the
load capacitance replaced by the parasitic capacitance of TSVs
(CTSV ).
Energy in the adiabatic driver is dissipated both on the
transmission-gate resistance (RTG) and for driving its input
capacitance. Assuming that both nFET and pFET transistors
are equally sized (Wn), the cross-coupled pFETs reduce the
conventionally-driven input capacitance by ½, however their
gate capacitance (Cn) will appear as an additional capacitive
load to the driver. Furthermore, drain/source diffusion capac-
itance (CD) can be an important portion of the load since in
each cycle 6CD will be present in the current flow path (4
contributed by the ON T-gate and 2 by the OFF T-gate).
Therefore the combined load capacitance will be:
CL = CTSV + Cn + 6CD (4)
The resistance of the T-gate (RTG) can be related to the gate
capacitance by a “Device technology factor” (κTG) [11], which
we can define for our convenience:
κTG = RTGCn → RTG = κTG
Cn
(5)
Combining (1), (4) and (5) gives the total dissipated energy
per cycle in the adiabatic driver:
EAD = CnV
2
DD +
pi2
2
κTG
Cn
f [CTSV + Cn + 6CD]
2 V 2DD (6)
The second term of (6) has a consistent contribution to
the energy dissipation on every cycle, while the first term is
dependent on the data switching activity (D).
We can also further simplify this equation by defining the
diffusion capacitance as a fraction of the input capacitance,
CD = bCn (7)
and equating term pi2
2
κTGf to a variable,
y =
pi2
2
κTGf (8)
Equation (6) then becomes:
EAD = D · CnV 2DD +
y
Cn
[CTSV + (6b+ 1)Cn]
2 V 2DD
=
»
Cn
„
D
y
+ (6b+ 1)2
«
+
1
Cn
C2TSV
–
· yV 2DD
+(12b+ 2)CTSV · yV 2DD (9)
Since in (9) Cn is the free parameter, the first two terms
of (9) are inversely proportional and EAD is minimized when
they become equal. The value of the gate capacitance at that
point is calculated as:
Cn(opt) =
s
[
D
y
+ (6b+ 1)2]−1CTSV (10)
Combining (9) and (10) results in minimum energy dissi-
pation for the adiabatic driver, which is:
EAD(min) =
"s
D
y
+ (6b+ 1)2 + (6b+ 1)
#
· 2yCTSV V 2DD (11)
Replacing variable y in (11), we can observe the dependence
of the energy dissipation on parameters f and κTG:
EAD =
"s
2D
pi2κTGf
+ (6b+ 1)2 + (6b+ 1)
#
·pi2κTGfCTSV V 2DD (12)
Authorized licensed use limited to: Newcastle University. Downloaded on May 21,2010 at 15:25:58 UTC from IEEE Xplore.  Restrictions apply. 
B. Switch M1
Respectively, the energy dissipation on M1 which is
switched-on briefly to recover the energy dissipated each cycle
on the Rtotal = RTG +Rind, is a trade-off between dissipation
on its on-resistance RM1 and input capacitance CM1. Since M1
is a fairly large transistor, previous ratioed stages will have a
significant energy consumption. For that reason a m factor is
used to compensate for the additional losses. IM1(rms) is the
rms current passing through the transistor while turned-on and
VGM1 is the peak gate voltage. A methodology for deriving
optimum values for both these parameters is proposed in [8].
The total dissipated energy on M1 can be calculated as:
EM1 = mCM1V
2
GM1 +
I2
M1(rms)
RM1
f
(13)
For transistor M1 we can also define a “Device technology
factor”:
κM1 = RM1CM1 ⇒ RM1 = κM1
CM1
(14)
Substituting RM1 into (13):
EM1 = mCM1V
2
GM1 +
1
CM1
I2
M1(rms)
κM1
f
(15)
Minimum energy consumption will occur when the two
terms of (15) are equal:
CM1(opt) =
r
κM1
mf
IM1(rms)
VGM1
(16)
EM1(min) = 2IM1(rms)VGM1
r
mκM1
f
(17)
C. Inductor’s parasitic resistance
The inductor’s (Lind) parasitic resistance Rind is propor-
tional to the Qind factor, which is typically implementation
technology dependent. For the purposes of this analysis it can
be estimated as:
Qind =
1
Rind
s
Lind
CL
⇒ Rind =
1
Qind
s
Lind
CL
⇒ Rind =
1
QindCL2pif
(18)
Energy dissipation on Rind can be estimated using (1) as:
Eind =
pi2
2
(RindCLf)CLV
2
DD (19)
Combining (18) and (19) we calculate the inductor’s energy
dependence on the Qind factor:
Eind =
pi
4
CL
Qind
V 2DD (20)
D. Total energy dissipation
The total dissipated energy for the adiabatic driver, switch
M1 and inductor’s resistance can be calculated by combining
(12), (17) and (20):
Etotal = EAD + EM1 + Eind (21)
Nevertheless, for (21) to be complete the energy contri-
bution of the P2L converters has to be included as well,
which cannot be theoretically derived and is addressed in the
following sections.
IV. EVALUATION
For the evaluation of the proposed method, the κTG, κM1
and CD parameters were extracted using simulation models
for a 130nm process. The TSV capacitance was assumed to be
160fF, a value which can be either derived from a single TSV
or a series combination of TSVs distributed among subsequent
tiers in a 3D IC. The supply voltage was at 1.2V and data
switching activity was assumed to be equal to 1.
All parameters were inserted into (21) and the energy
dissipation per bit per cycle was calculated, with the operating
frequency and Q factor as free variables. An identical circuit
configuration was simulated as well in a commercial SPICE
program and the simulation data showed good correlation with
the theoretical estimations.
The inductor’s quality factor can have a considerable effect
on the energy dissipation and this can be observed in Figure 5,
where the estimated energy dissipation is plotted for various
operating frequencies.
Export
Page 1
5 7 9 11 13 15 17 19
20
40
60
80
100
120
Energy Dissipation - 130nm - 160fF Load
500Mhz 300Mhz 100Mhz
Q factor
En
er
gy
 (f
J)
Figure 5. Estimated energy dissipation at 500, 200, 100 MHz. TSV=160fF.
V. COMPARISON
The proposed configuration was compared to a conventional
CMOS buffer with ratioed stages driving an equivalent TSV
load capacitance. Since the conventional buffer when trans-
mitting data will produce just one charging event per 2 cycles
of the operating frequency, its energy dissipation per cycle
is calculated as in (2). If we also include the data switching
activity, then:
ECONV = D · 1
2
CLV
2
DD (22)
The estimated energy performance improvement of the
proposed configuration over the conventional buffer for D = 1,
can be seen in Figure 6.
Authorized licensed use limited to: Newcastle University. Downloaded on May 21,2010 at 15:25:58 UTC from IEEE Xplore.  Restrictions apply. 
Export
Page 1
5 7 9 11 13 15 17 19
0
10
20
30
40
50
60
70
80
Energy Improvement (%) - 130nm - 160fF Load
500Mhz 300Mhz 100Mhz
Q factor
Im
pr
ov
em
en
t (
%
)
Figure 6. Energy performance improvement of the proposed energy-
recovering configuration over a conventional buffer driving an equivalent TSV
load capacitance. Export
Page 1
5 7 9 11 13 15 17 19
0
10
20
30
40
50
60
Energy Reduction - 130nm - with P2L - 300MHz
80fF 120fF 160fF
Q factor
Im
pr
ov
em
en
t (
%
)
Figure 7. Energy dissipation reduction at 300MHz when P2LC is included
and the TSV load is variable.
Varying the value of the TSV load capacitance has no
effect on the estimated energy improvement percentage, as
the energy dissipation for both the proposed method and the
conventional buffer is linearly related to the load capacitance.
However, the data plotted in Figures 5, 6 can be considered
as the theoretical maximum performance attainable by the
energy-recovering method, since the energy dissipation of the
P2L converters is not included in these estimations.
To include the effect of the P2L converters in the energy
calculations, a simple circuit topology is chosen (Figure 3) and
its energy dissipation is extracted from simulation data. Since
the P2L converter has constant energy dissipation regardless
of the value of the TSV capacitance, the linear relation of
the total dissipated energy to the load capacitance value is
no longer valid for the energy-recovering circuit. This can be
observed in Figure 7, where the estimated energy improvement
is plotted at 300MHz operating frequency and the TSV load
capacitance is variable. It can be expected that as the load
capacitance increases, energy performance would approach the
estimations in Figure 6.
Switching activity can also be a significant factor affecting
energy performance. Since in the energy-recovering circuit
the sinewave oscillation cannot be halted, all capacitances
in the current flow path will charge and discharge on each
cycle regardless of data activity. In contrast, static CMOS
ideally dissipates energy only when switching and thus the
energy-recovering circuit can compare favorably only at high
switching rates. In Figure 8, the estimated effect of the
switching activity on energy performance is plotted for an
Export
Page 1
5 7 9 11 13 15 17 19
-40
-30
-20
-10
0
10
20
30
40
50
60
Energy Dissipation - 130nm - with P2L - 300MHz, 160fF
0.6 0.8 1
Q factor
Im
pr
ov
em
en
t (
%
)
Figure 8. Improvement in energy dissipation when P2LC is included and
switching activity is variable. f=300MHz, TSV=160fF.Export
Page 1
5 7 9 11 13 15 17 19
0
10
20
30
40
50
60
70
Energy Dissipation - with P2L - 300MHz - 160fF Load
130nm 65nm
Q factor
Im
pr
ov
em
en
t (
%
)
Figure 9. Improvement in energy dissipation when P2LC is included and
process technology is variable. f=300MHz, TSV=160fF.
operating frequency of 300Mhz.
Since the technology factors κTG and κM1 were extracted
for the 130nm process, reducing their value by ½ could
also provide us with an estimation of the circuit’s energy
performance for the 65nm node. The result in plotted in
Figure 9 and as can be observed, technology scaling has a
positive effect on energy dissipation when compared to the
conventional buffer.
VI. CONCLUSIONS
In this work, a theoretical analysis was developed to in-
vestigate the potential of the energy-recovering methodology,
as used in adiabatic logic and resonant clock distribution
networks, for reducing the energy dissipation of 3D IC inter-
connects. The total energy dissipation per cycle and optimum
device sizing were extracted for the proposed method using the
theoretical models. Simulation data showed good correlation
with the theoretical estimations on a 130nm process.
The proposed configuration was compared against a con-
ventional CMOS buffer, driving an equivalent TSV load ca-
pacitance and its energy performance was evaluated. Analysis
revealed energy dependence on Q, f , D and TSV capacitance
parameters and the results demonstrated favorable energy
performance for high Q factors/switching activities/TSV ca-
pacitances and low operating frequencies. Furthermore, an
estimation was provided on the energy performance behavior
of the energy-recovering design in an advanced technology
node, where improved efficiency was demonstrated.
Authorized licensed use limited to: Newcastle University. Downloaded on May 21,2010 at 15:25:58 UTC from IEEE Xplore.  Restrictions apply. 
REFERENCES
[1] H. Mahmoodi, V. Tirumalashetty, M. Cooke and K. Roy, “Ultra Low-
Power Clocking Scheme Using Energy Recovery and Clock Gating”,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
Jan. 2009, Volume: 17, Issue 1, pp. 33-44.
[2] M. Facchini et al., “System-level Power/performance Evaluation of 3D
stacked DRAMs for Mobile Applications”, 2009, Design, Automation,
and Test in Europe (DATE) conference 2009.
[3] C. L. Seitz et al., “Hot Clock nMOS”, Technical Report, California
Institute of Technology, 1985, [CaltechCSTR:1985.5177-tr-85].
[4] S. Kim, C. H. Ziesler, M. C. Papaefthymiou, “Charge-Recovery Com-
puting on Silicon”, IEEE Trans. Computers 54(6): 651-659 (2005).
[5] V. S. Sathe, J. C. Kao and M. C. Papaefthymiou, “Resonant-Clock Latch-
Based Design”, IEEE Journal of Solid-State Circuits, Volume 43, Issue
4, April 2008, pp. 864 - 873.
[6] B. Voss and M. Glesner, “A low power sinusoidal clock”, The 2001
IEEE International Symposium on Circuits and Systems, Volume 4, 6-9
May 2001, pp. 108 - 111, vol. 4.
[7] P. Marchal et al., “3D Technology Assessment: Path-finding the tech-
nology/Design Sweet-spot”, Proceedings of the IEEE, January 2009, vol
97, pp. 96-107.
[8] D. Maksimovic, V. G. Oklobdzija, “Integrated power clock generators
for low energy logic”, 26th Annual IEEE Power Electronics Specialists
Conference, 18-22 Jun 1995, pp. 61-67, vol.1.
[9] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and E.
YC. Chou, “Low-power digital systems based on adiabatic-switching
principles”, IEEE Trans. Very Large Scale Intgr. (VLSI) Syst., vol. 2,
no. 4, pp. 398407, April 1994.
[10] C. C. Yeh, J. H. Lou, and J. B. Kuo, “1.5V CMOS Full-Swing Energy
Efficient Logic (EEL) Circuit Suitable for Low-Voltage and Low-Power
VLSI Application,” Elec. Lett., Vol. 33, No. 16, pp. 1375-1376, 1997.
[11] D. Maksimovic, “A MOS gate drive with resonant transitions”, 22nd
Annual IEEE Power Electronics Specialists Conference, 24-27 Jun 1991,
pp.527-532.
Authorized licensed use limited to: Newcastle University. Downloaded on May 21,2010 at 15:25:58 UTC from IEEE Xplore.  Restrictions apply. 
