Online Fault Tolerance Technique for TSV-Based 3-D-IC by Zhao, Yi et al.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS                                                                                                     
0000–0000/00$00.00 © 2013 IEEE 
1 
Online fault tolerance technique for TSV-based 3D-IC 
Yi Zhao, Saqib Khursheed, Bashir M. Al-Hashimi Fellow, IEEE 
 
Abstract—This paper presents the design, validation and 
evaluation of an efficient online fault tolerance technique for 
fault detection and recovery in presence of three TSV defects: 
voids, delamination between TSV and landing pad, and TSV 
short-to-substrate. The technique employs transition delay test 
for TSV fault detection. Fault recovery is achieved by employing 
redundant TSVs and rerouting signals to fault-free TSVs. This 
technique is efficient because it requires small (2 x number of 
TSVs per group) number of clock cycles for fault detection and 
recovery. Synthesis results using 130-nm design library show 
that 100% repair capability can be achieved with low area 
overhead (4% for the best case). 
 
Index Terms—Online test, fault tolerance, 3D, TSV, delay test.  
I. INTRODUCTION 
Through-Silicon-Vias (TSV) based vertical 
interconnections is the most popular method for implementing 
3D ICs [1]. Recent research has shown that the yield of TSV 
based 3D-ICs is affected by TSV manufacturing defects [2, 3] 
and its reliability is affected by thermal stress induced during 
fabrication process and normal operation [4-6]. TSV 
manufacturing defects are introduced in the bonding stage of 
fabrication process when different dies are bonded together 
and one defective TSV can potentially fail the entire design 
along with known-good dies. These challenges are highlighted 
and novel solutions have been proposed for improving 
testability [7-10], yield and reliability [11-16]. Various types 
of TSV defects caused by manufacturing process and thermal 
stress are highlighted in [4, 5]. Out of all these defects, this 
paper focuses on three TSV defects: void, delamination 
between TSV and landing pad, and TSV short to substrate, as 
these have been extensively studied by test community as 
highlighted by recent literature [7-10]. The electrical models 
for these defect types are shown in Fig. 1. 
  One known issue with pre-bond testing is that it does not 
scale well, because defects can be introduced during bonding 
stage as well as during normal operation, for example due to 
thermal stress. Reference [5] shows that thermal stress can 
damage TSV interconnects, leading to delamination at TSV 
interface with the bonding pad. Void growth can also occur 
during normal operation due to thermal load [6]. This 
motivates the need for improving in-field reliability. A 
popular method for improving yield is to introduce redundant 
TSVs and associated control logic [11-16]. References [12-14] 
 
  Manuscript received on 11th July, 2013, and revised on 27th Feb, 2014, and 
3rd Jun 2014. Y. Zhao and B. M. Al-Hashimi are with the School of 
Electronics and Computer Science, University of Southampton, UK 
(email:{yz2g08, bmah}@ecs.soton.ac.uk).  
S. Khursheed is with the Dept. of Electrical Engineering and Electronics, 
University of Liverpool, UK (email: S.Khursheed@liverpool.ac.uk) 
utilized redundant TSVs in a TSV block for improving yield 
by repairing defective TSVs. Reference [11] employs 
redundant TSVs (as in [12-14]) and partition multiple regular 
and redundant TSVs into TSV groups using a grouping ratio, 
where redundant TSVs are used to repair defective TSVs in a 
group. This is used to improve yield and reduce hardware 
overhead. Reference [15] proposed a dedicated switch 
structure for TSV repair across TSV groups, thus increasing 
repair efficiency. The only work that focuses on improving in-
field TSV reliability is presented in [16], which uses on-chip 
processor for online fault detection and recovery. The aim of 
this paper is to present an efficient and cost-effective online 
fault tolerance technique capable of TSV fault detection and 
recovery for designs, where such an on-chip processor is not 
available. The proposed technique is efficient because it 
provides hardware based solution which requires only 2 clock-
cycles for fault detection and recovery per TSV, leading to 
faster detection and recovery than available methods [15, 16]. 
It is cost-effective because the hardware overhead is 
minimized (without affecting repair capability) by selecting 
the best grouping ratio through an exhaustive search method 
[11]. Using Synopsys design compiler, it is shown that the 
hardware overhead of the proposed technique is lower than 
available techniques [7, 10, 15, 16], while also achieving 100% 
repair capability. 
The rest of the paper is structured as follows: Sec. II briefly 
describes the electrical models of TSV defects. The proposed 
online fault tolerance technique is described in Sec. III. 
Simulation results are presented in Sec. IV and Sec. V 
concludes the paper. 
II. PRELIMINARIES 
In this work, modelling of TSV is based on a transmission line 
T-model (Fig. 1(a)) [17], where Rtsv and Ctsv denote TSV 
resistance and capacitance respectively, Rpull denotes the 
resistance of the pull-up network of the driving gate and Cp 
denotes the parasitic capacitance of the circuit. We assume 
that the signal is transmitted from TSV terminal on die 1 
(referred as terminal t1) by a driving gate, to the TSV terminal 
on die 2 (referred as terminal t2). Three defects are considered 
in this work: voids, delamination at interface and short-to-
substrate. Void defects are caused by improper TSV filling 
and thermal stress during normal operation of the device [6]. 
This defect increases TSV resistance [9] and causes delay 
faults. Second defect type (TSV delamination defect) is either 
due to misalignment of bonding pad and TSV during 
fabrication or due to thermal stress induced on TSV in thermal 
processing and normal operation. This thermal stress can 
cause delamination at the interface of TSV structure (TSV and 
its landing pad) and increases TSV resistance. These two 
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS                                                                                                   
 
2 
types of defects (void and delamination) can both be modelled 
as a resistive open defect, which increases signal delay. The 
equivalent electrical model of this defect is shown in Fig. 1(b) 
[10], where a resistor (Ropen) represents an open defect. Third 
defect type (short-to-substrate) is due to pinhole in the 
dielectric layer that is deposited to form the side wall between 
TSV and substrate. It models a resistive path from TSV to 
substrate due to the non-conformal sidewall insulation [7]. 
The electrical equivalent of this defect is shown in Fig. 1(c), 
in which a short resistor is incorporated, denoted by Rshort, 
which represents the leakage current path between TSV and 
substrate and reduces the current for charging TSV.  
 
(a) TSV T-model [17] 
 
(b) Model: Void/delamination defect [10]        (c) Model: Short to substrate[7] 
Fig. 1. Electrical equivalent circuit model for TSV with defects. 
III. TSV FAULT TOLERANCE TECHNIQUE 
  Fig. 2 shows the block diagram of the proposed fault 
tolerance technique to test and repair a single TSV group with 
a grouping ratio of 4:2. It consists of three blocks: detection 
block, recovery block and routing block. These blocks are 
used to test and repair a group of TSVs (referred as TSV 
group). A TSV group with a grouping ratio of m:n, consists of 
m inputs (output) signals, m regular TSVs, and n redundant 
TSVs, where each TSV group can tolerate up to n TSV 
defects. The generic architecture (grouping ratio m:n) is 
shown in reference [18] (Fig. 8 and associated text) and 
achieves 100% repair capability as demonstrated in Sec. IV-C 
of reference [18] (TABLE III and associated text). For 
illustration, a grouping ratio of 4:2 is shown in Fig. 2. The 
number of redundant TSVs in a design has an effect on yield, 
repair capability and hardware cost. For a given fault  rate, 
recent  papers have proposed algorithms to determine 
grouping ratio to minimize hardware cost and maximize yield 
[11], [15]. In this work, it is assumed that TSVs are divided 
into groups at design time. 
  The detection block (Fig. 2) is used for testing each TSV in a 
group. Input test patterns are applied from a die (Die 1) and 
output test response is observed through Test observation 
block located on subsequent die (Die 2). A double TSV 
interconnection is used to update TSV status register on die 1. 
This concept was also used in [2] for error communication 
between dies. The detection block uses delay test to 
differentiate between faulty and fault-free TSVs, where each 
TSV is tested for three defect types (Fig. 1). The status of each 
TSV is updated in TSV status registers, which are located on 
both dies and holds all locations of faulty TSVs in a group. 
Note that the detection block does not distinguish between 
different defect types, as that is typically required for 
diagnosis. Once fault detection is complete, recovery is 
initiated to reroute signals through fault-free TSVs (replacing 
defective TSVs) by reconfiguring the routing block between 
signals and TSVs (Fig. 2). 
TS
V
1
TS
V
2
TS
V
3
TS
V
4
R
ed
un
da
nt
 
TS
V
1
R
ed
un
da
nt
TS
V
2
Signal line 1
Signal line 2
Signal line 3
Signal line 4
Signal line 1
Signal line 2
Signal line 3
Signal line 4
Output signals
TSV status registerControl
Control 
Die 1
Die 2
- T
es
t p
at
te
rn
- I
np
ut
 si
gn
al
s
D
ou
bl
e-
TS
V
 
in
te
rc
on
ne
ct
Test Test Test Test Test Test
Detection 
Block
TSV status register
Recovery Block
Recovery Block
Routing Block
Routing Block
Test observation
TSV Group 4:2
De-Mux 
Terminal
Mux Terminal
Detection 
Block
Recovery Block
Routing Block
Detection Block
 
Fig. 2. Detection and recovery blocks for a grouping ratio of 4:2 
The recovery block is implemented on both dies as shown 
in Fig. 2, and it consists of TSV status register and control. 
TSV status register holds fault status of each TSV. The 
control unit provides appropriate control signals for 
multiplexers inside the Routing block, to connect signal lines 
with appropriate TSV. The control is also used to report when 
the number of defective TSVs is higher than the maximum 
tolerance limit of a TSV group. 
The routing block consists of a set of multiplexers (for output 
signals, Die 2) and de-multiplexers (for input signals, Die 1) 
to connect each signal line to a TSV according to the selection 
signals provided by the control unit of the Recovery block. 
The connection boxes (Fig. 2) represent the de-Mux (Mux) 
terminals for input (output) signal lines. For a grouping ratio 
of 4:2, each signal can use one of the three possible TSVs; 
hence a 1-to-3 de-multiplexer is needed (Fig. 3).  Fig. 2 
illustrates the detection and recovery blocks for a grouping 
ratio of 4:2, assuming two defective TSVs (TSV2 and TSV4). 
See Sec. III of reference [18] for an example that illustrates 
the working of these three blocks. 
A. Detection Block 
Fig. 3 shows the detection block for a single TSV, as an 
example. It consists of an input signal unit for test patterns and 
input signals, where transition signals are stored for test 
application. Fig. 3 also shows the test  observation  block (part 
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS                                                                                                   
 
3 
TS
V
1
Die 1
Lvth
Test clock
SI
D
clk
Q TSV status register
Die 2
Observation point
t2
t1
Test_result
TS
V
2
TS
V
3
- T
es
t p
at
te
rn
s
- I
np
ut
 si
gn
al
s
Signal line 1
… … … …
In_TSV1
(a)
(b)
(c)Test observation 
De-Mux terminal
dpitch
2*dpitch
dpitch dpitch
Input Unit
 
Fig. 3 Detection block for a single TSV 
of Detection block; Fig. 2), where test output is observed in a 
flip-flop and stored in TSV status registers. The SI signal and 
NAND gate is used to initialize TSV status registers. The 
detection block applies a transition signal on a die (Die 1) and 
the output is observed on the subsequent die (Die 2). We next 
explain the working of detection block when considering three 
defects (Fig. 1). 
As described in Sec. II, void and delamination defects 
increase TSV resistance forming a higher resistance TSV path, 
thus increasing RC delay. To derive RC delay at TSV (t2, Fig. 
3), we employed TSV electrical model with void or 
delamination defects (Fig. 1). The RC delay at TSV (t2) is: 
�𝑅𝑝𝑢𝑙𝑙 + 12 𝑅𝑇𝑆𝑉 + 12 𝑅𝑜𝑝𝑒𝑛�𝐶𝑇𝑆𝑉 + �𝑅𝑝𝑢𝑙𝑙 + 𝑅𝑇𝑆𝑉 + 𝑅𝑜𝑝𝑒𝑛�Cp              (1) 
Where, Ropen denotes the open resistance due to void or 
delamination defect, Rpull denotes the resistance of the pull-up 
network driving the TSV (de-multiplexers, Fig. 3) and Cp 
denotes the parasitic capacitance of the test circuit. When the 
TSV is fault-free Ropen~0, the TSV resistance is small (in 
hundreds mΩ) and can be ignored when compared to the pull- 
up resistance of driving gate Rpull, which is usually several kΩ, 
such that the path delay is not effected by the TSV resistance. 
However, in case of void or delamination defects, open 
resistance of a TSV (Ropen) can be up to 1MΩ [9], which is 
significantly higher than accumulative effect of RTSV and Rpull. 
  Assuming the NAND gate (Fig. 3) with logic threshold 
voltage denoted by Lvth, where Lvth of a gate input is the input 
voltage at which the output voltage reaches half of the supply 
voltage, while the other gate input(s) are at non-controlling 
value(s) [19]. A rising transition is applied to the TSV from 
In_TSV1 (Fig. 3), since the delay at t2 end is dependent on the 
value of Ropen, the rising transition at t2 becomes slower, such 
that at a given capture time, the voltage at the t2 is lower than 
Lvth, as illustrated in Fig. 4. Therefore, if TSV open resistance 
due to void or delamination defect exceeds a critical value 
Ropen-critical, the voltage at t2 is lower than the Lvth at a given 
signal capture time and therefore the test detects a faulty 
signal Test_result=1 (Fig. 4). Signal capture time represents 
the test clock frequency, which is applied to the flip-flop 
shown in Fig. 3. Note that the internal clock may be used as a 
test clock to avoid overhead of a separate DFT clock as used 
in [19]. TSV open critical resistance Ropen-critical is a function of 
logic threshold voltage Lvth and signal capture time (denoted 
by test clock frequency Fclock), where Lvth is kept at 50% of Vdd 
for illustration, otherwise it varies per gate input and is also 
effected by process variation [19].  
The testing method for short-to-substrate TSV defect is 
similar to that of void or delamination defects. For short-to-
substrate TSV defect, a resistive path between TSV and 
substrate causes current leakage (Fig. 1(c)), leading to reduced 
TSV charging current. Assume a rising transition is applied 
from In_TSV1 (Fig. 3), which can be expressed as, Icharge = I1 – 
Ileakage, where, I1 is the input current at t1 and Ileakage is the 
leakage current from TSV to substrate through the short 
resistor (Fig. 1(c)). Due to lower TSV charging current (Icharge), 
the rising transition time observed at t2 increases with 
increase in defect size (Fig. 4). When applying a rising 
transition signal at In_TSV1, TSV with short to substrate 
defect exhibits degraded voltage level at TSV end (Fig. 4), 
this is because Rshort forms a voltage divider between Rshort, 
Rpull and Rtsv (Fig. 1(c)). We have analysed defect behaviour 
using HSpice and 65-nm design library. It is found that the 
detection block is capable of detecting all three defects and as 
expected, detectable defect size increases with test clock 
frequency. See Sec. IV-A of  reference [18] for an illustrative 
example, which shows characterization of TSV open (short) 
critical resistance and the increase of detectable resistance 
range with applied test clock frequency Fclock.  
0.0
1.0
2.0
  0
Lvth
0.0
1.0
2.0
  0
Time (ns)
Faulty (Void/Delamination)Signal Capture Time
Faulty 
Free
Faulty
Test_result =1
Test_result = 0
1 2 3 4 5
1 2 3 4 5
V
ol
ta
ge
 
(v
)
V
ol
ta
ge
 
(v
)
Signal at node (b) in Fig. 3,  t2
Signal at node (c) in Fig. 3, Test_result
Faulty (short)
Defect Free
Fig. 4 Test pattern for detection of void or delamination defect 
B. Recovery and Routing Blocks 
The recovery block (Fig. 2) is used to bypass defective 
TSVs with fault-free TSVs and it is implemented on both dies 
that are connected by the TSV group. Recovery is initiated 
after testing and it is used to reconfigure connections between 
input/output signals with fault-free TSVs. This section 
describes the working of reconfiguration process by 
considering a design with a grouping ratio of 4:2. 
The circuit for reconfiguring input and output signals are 
similar and therefore only input part is shown in Fig. 5. As can 
be seen, it consists of the following six components: 1) A 
routing block consisting of de-multiplexers to connect signal 
lines with TSVs; 2) A latch chain that controls the de-
multiplexers; 3) TSV status register which stores faulty status 
information for each TSV, where logic ‘0’ indicates fault-free 
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS                                                                                                   
 
4 
- T
es
t p
at
te
rn
s
- I
np
ut
 si
gn
al
s
TS
V
1
TS
V
2
TS
V
3
TS
V
4
R
ed
un
da
nt
 
TS
V
1
R
ed
un
da
nt
TS
V
2
Signal line 1
Signal line 2
Signal line 3
Signal line 4
Die 1
Die 2
D
Q
D
Q
D
Q
D
Q
D
Q
D
Q
D
Q
D
Q
Exceed 
tolerance limit
Signal 
Line 
counter
Enable
TSV status register (Die 2)
TSV status register (Die 1)
0 0 0 01 1
0 0 0 01 1
Faulty TSV 
accumulator
Test TestTestTestTestTest
 (         ,          ) = (0 0)
Comparator
Tolerance limit
Latch Chain
Test observation 
Routing block
D
ou
bl
e 
TS
V
 
In
te
rc
on
ne
ct
io
n
2S  (0)
1S  (0)
3S  (0)
4S  (0)
2S  (1)
3S  (1)
4S  (1)
1S  (0)1S  (1)
2S  (1) 2S  (0)
3S  (0)
4S  (0)4S  (1)
3S  (1)
1S  (1)
1S (1) 1S (0)
 (         ,          ) = (0 1)2S (1) 2S (0)
 (         ,          ) = (1 0)3S (1) 3S (0)
 (         ,          ) = (1 0)4S (1) 4S (0)
Recovery 
blockDe-Mux Terminal
 
Fig. 5 Reconfiguring a faulty design with a grouping ratio of 4:2 
and ‘1’ indicates faulty status of TSV; 4) A signal line counter 
to indicate the number of signals that have been configured, it 
is also used to update the latch chain through “enable”; 5) An 
adder “Faulty TSV accumulator”, which counts faulty TSV 
number and provides input to the latch chain; 6) A comparator 
which compares the existing faulty TSV number with the 
tolerance limit of the TSV group, and reports an error in case 
of exceeding the tolerance limit. See Sec. III-B (and Fig. 7(b)) 
of reference [18] for an illustrative example and a set of test 
vectors for detecting these defects. 
C. Scalability and Hardware Cost  
Based on the modelling method for each block this technique 
can be easily scaled to a design with grouping ratio of m:n. In 
such configuration each group contains m+n TSVs with m 
input/output signal lines. The TSV status register consists of 
m+n bits. Each signal line can have n+1 TSVs for 
communication, such that 1-to-(n+1) de-multiplexer is needed. 
Selection signal for signal line i will need k=⌈log2(n+1)⌉ bits, 
which are Si(0), Si(1), …, Si(k-1). Therefore, for each signal 
line the latch chain consists of k latches, which holds the de-
multiplexer selection signal. The signal line counter needs to 
generate ‘m’ latch renew enable signals. Overall, for a 
grouping ratio of m:n, this technique requires m+n clock 
cycles to test m regular and n redundant TSVs serially and 
m+n clock cycles for repairing all TSVs in the presence of 
defects. Therefore in total it requires only 2.(m+n) clock 
cycles for fault detection and recovery. Theoretical lower 
bound to test and repair all TSVs per design is 2 clock cycles, 
assuming the availability of an infrastructure to test and repair 
all TSV in parallel. The proposed technique approaches 
theoretical lower bound by using only 2.(m+n) clock cycles. 
See Fig. 8 of reference [18] for the generic architecture. The 
area overhead of proposed technique (detection, recovery and 
routing blocks on both dies) is: 
Area = 𝐴𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 + 𝐴𝑟𝑜𝑢𝑡𝑖𝑛𝑔 + 𝐴𝑟𝑒𝑐𝑜𝑣𝑒𝑟𝑦+𝐴𝑟𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡 𝑇𝑆𝑉 
        = (𝑚 + 𝑛)     𝑁𝑎𝑛𝑑 𝑔𝑎𝑡𝑒𝑠          + {3(𝑚 + 𝑛) + 2𝑚⌈𝑙𝑜𝑔2(𝑛 + 1)⌉}    𝐹𝑙𝑖𝑝𝐹𝑙𝑜𝑝          + (𝑚) 𝑑𝑒𝑚𝑢𝑥1−𝑡𝑜−(𝑛+1) + (𝑚) 𝑚𝑢𝑥(𝑛+1)−𝑡𝑜−1 
        + (2) 𝑠𝑖𝑔𝑛𝑎𝑙 𝑙𝑖𝑛𝑒 𝑐𝑜𝑢𝑛𝑡𝑒𝑟𝑚−𝑏𝑖𝑡 + (2) 𝑎𝑐𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑜𝑟           + 𝑐𝑜𝑚𝑝𝑎𝑟𝑎𝑡𝑜𝑟 + 𝐴𝑟𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡 𝑇𝑆𝑉                                       (2)    
where “A” denotes area overhead of a TSV group with a 
grouping ratio of m:n; all other notations have their usual 
meaning. It can be seen that this technique can be easily 
scaled to suit a generic design with any specified grouping 
ratio. The wirelength overhead can be understood from Fig. 3, 
which increases due to alternative route paths from signals to 
TSVs. For a general grouping ratio of m: n, one signal has 
n+1 possible route paths (n alternative routes). The lower 
bound of the wirelength overhead can be achieved based on an 
assumption that, within a group TSVs are located next to each 
other with a minimum pitch, denoted as dpitch, and this is 
applicable to each group (Fig. 3). Such that for M total 
number of signals (M regular TSVs) organized with grouping 
ratio of m: n, the wirelength overhead is M*(∑ i ∗ dpitch)ni=1 . 
This wirelength overhead includes routing block, which 
dominates the wirelength overhead due to the TSV 
redundancy. The wirelength overhead due to recovery block is 
not included in this equation. 
IV. SIMULATION RESULTS 
  Two sets of simulations are conducted to validate and 
evaluate the fault tolerance technique using 
STMicroelectronics 130-nm cell library. For validation using 
HSPICE and ModelSim of the proposed technique, see Sec. 
IV-A and Sec. IV-B of reference [18]. 
  The cost-effectiveness of this fault tolerance technique is 
evaluated using five benchmark designs from IWLS 2005 [20]. 
Table I shows the results in terms of hardware cost of this 
technique using benchmark designs. The diameter of a TSV is 
5-um [17], such that for each redundant TSV, the area 
overhead is 25 um2. Regular TSV number for each design is 
obtained from [21], where selected circuits are synthesised 
under130-nm cell library as well. For design with a given 
number of regular TSVs, the best grouping ratio can be found 
through an exhaustive search algorithm [11] to minimize area 
overhead without affecting the targeted repair capability, as 
shown by the fourth column of Table I. Area overhead of the 
proposed technique is calculated using Eq. 2 (shown as 
“calculation” results in Table I) and compared with synthesis 
results using 130-nm gate library and Synopsys design 
compiler (Synthesis results, Table I). Area overhead due to 
each block (Fig. 2) of the proposed fault tolerance technique is 
shown individually in Table I for calculation results. Synthesis 
results indicate that area overhead of the proposed technique 
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS                                                                                                   
 
5 
is only 4% (best case, netcard). Notice this area overhead is 
due to targeting 100% repair capability, with the expected 
fault rate of 0.001 (for illustration purposes). See Sec. IV-C of 
reference [18] for more detail on how 100% repair capability 
is achieved with selected grouping ratio. 
The comparison with recent reported techniques [7, 10, 15, 
16] is shown in Table II. For illustration purposes, regular 
TSV number is assumed to be 1000. In [7, 10] test 
mechanisms for detecting open TSV defects and short-to-
substrate defects are reported. As can be seen (Table I), the 
proposed technique requires lower area overhead for detection 
logic when compared with the one reported in [7, 10]. The 
ideas presented in [15, 16] utilise an on-chip microprocessor 
to implement control logic for repairing. The proposed work is 
the first to show detailed hardware solution for designs, where 
such an on-chip microprocessor is either not employed or 
cannot be used for TSV repairing. 
V. CONCLUSION 
This paper has presented a cost-effective and efficient online 
fault tolerance technique for improving in-field reliability of 
TSV based 3D ICs. Hardware cost analysis using 130-nm 
design library shows that 100% repair capability is possible 
with low area overhead. 
ACKNOWLEDGEMENTS 
This work is supported in part by EPSRC (UK) under grant no. 
EP/K000810/1, and by the Dept. of Electrical Engineering & 
Electronics, Uni. of Liverpool (UK).  
REFERENCES 
[1] K. Banerjee et al., "3-D ICs: A novel chip design for improving deep-
submicrometer interconnect performance and systems-on-chip integration", 
Proc. IEEE, vol. 89, no. 5, pp. 602-633, 2001. 
[2] N. Miyakawa et al., "Multilayer stacking technology using wafer-to-wafer 
stacked method," J. Emerg. Technol. Comput. Syst. vol. 4, no. 4, 2008. 
[3] A. W. T. et al., “Enabling soi based assembly technology for three 
dimensional integrated circuits,” Electron Devices Meeting, Technical Digest. 
IEEE Internationa,l (IEDM), 2005. 
[4] A. D. Trigg, et al., "Design for reliability in via middle and via last 3-D 
chipstacks incorporating TSVs," Electronics Packaging Technology 
Conference (EPTC), 2010 
[5] L. Gyujei et al., "Interfacial reliability and micropartial stress analysis 
between TSV and CPB through NIT and MSA," Electronic Components and 
Technology Conference (ECTC), 2011.  
[6] T. Frank et al., "Reliability approach of high density Through Silicon Via 
(TSV)," Electronics Packaging Technology Conference (EPTC), 2010.  
[7] Minki Cho et al., "Design method and test structure to characterize and 
repair TSV defect induced signal degradation in 3D system," Computer-Aided 
Design (ICCAD), pp.694-697, Nov. 2010. 
[8] Po-Yuan Chen et al., "On-Chip TSV Testing for 3D IC before Bonding 
Using Sense Amplification," Asian Test Symposium, pp.450-455, Nov. 2009. 
[9] Shi-Yu Huang et al., "Small delay testing for TSVs in 3-D ICs," Design 
Automation Conference (DAC), 2012. 
[10] Ye Fangming and K. Chakrabarty, "TSV open defects in 3D integrated 
circuits: Characterization, test, and optimal spare allocation," Design 
Automation Conference (DAC), 2012. 
[11] Y. Zhao, et al., "Cost-Effective TSV Grouping for Yield Improvement of 
3D-ICs", Asian Test Symposium (ATS), Nov 2011.  
[12] I. Loi et al., "A low-overhead fault tolerance scheme for TSV-based 3-D 
network on chip links", International Conference on Computer-Aided Design 
(ICCAD), 2008. 
[13] U. Kang, et al. “8Gb 3-D DDR3 DRAM using through-silicon-via 
technology”. IEEE Journal of Solid-State Circuits, 45(1):111–119, 2010 
[14] A.-C. Hsieh et al., “TSV Redundancy: Architecture and Design Issues in 
3D IC,” pp. 166-171, Design, Automation & Test in Europe (DATE), 2010. 
[15] L. Jiang et al., "On effective TSV repair for 3D-stacked ICs," DATE, 
pp.793-798, Design, Automation & Test in Europe (DATE), 2012.  
[16] L. Jiang et al. "On effective and efficient in-field TSV repair for stacked 
3D ICs", Design Automation Conference (DAC), 2013.  
[17] G. Katti et al., "Electrical Modeling and Characterization of Through 
Silicon via for Three-Dimensional ICs," Electron Devices, IEEE Transactions 
on, vol.57, no.1, pp.256-262, Jan 2010. 
[18] Y. Zhao, "Design and Validation of Online Fault Tolerance Architecture 
for TSV-based 3D-IC," Technical Report-URL: 
http://eprints.soton.ac.uk/id/eprint/362558. 
[19] S. Khursheed et al., "Delay Test for Diagnosis of Power Switches," IEEE 
Trans. on Very Large Scale Integration (VLSI) Systems,2013. 
[20] IWLS 2005 Benchmark circuits.-URL: 
http://iwls.org/iwls2005/benchmarks.html 
[21] J. Cong et al., "Thermal-aware cell and through-silicon-via co-placement 
for 3D ICs," Design Automation Conference (DAC), pp.670-675, 2011. 
TABLE II: Comparison between proposed technique and [7, 10, 15, 16] 
(regular TSV number is 1000) 
Technique Proposed technique 
TSV repairing 
[15, 16] 
Test method 
[7] [10] 
Objective 
Detection (open and 
short to substrate 
defect)  
and Repairing 
Repairing 
Detection 
(short to 
substrate) 
Detection 
(open) 
Cost 
No. 
redundant 
TSV 
25 
(grouping ratio 80:2) 128 N/A N/A 
Routing 1000 Mux 3000 Mux N/A N/A 
Recovery 
13*Signal line 
counter 
+13*Faulty TSV 
adder 
+13*comp+3025*FF 
On-chip 
microprocessor 
+ Router 
configuration  
block 
N/A N/A 
Detection 
(testing 
logic) 
1025*Nand  
+ 
1025*FF 
On-chip test 
block 
1k*Voltage 
comp + 
2k*INV+ 
1k*FF+ 
2k*Mux 
1k*voltage 
comp + 
1k*INV+ 
1k*FF+ 
1k*Mux 
TABLE I: Area overhead analysis of the proposed fault tolerance technique (TSV failure rate 0.001). 
Circuits 
Design 
area 
(um2) 
regular 
TSV 
(signal 
TSV) 
best 
grouping 
ratio [11] 
No. 
of 
Spare 
TSV 
Area Overhead per die 
(calculation)  /um2 
Synthesis results 
/um2 
Spare 
TSV 
area 
DoubleT
SV 
structure 
Detection Recovery Routing Total (calculation) 
Overall 
percentage 
(calculation) 
 
Total 
(synthesis) 
Overall 
percentage 
(synthesis) 
aes_core 818,750 1,362 80:2 34 850 850 16,857 81,832 28,874 129,264 15.79% 143,007 17.4% 
ethernet 2,858,975 3,782 80:2 94 2,350 2,350 46,809 227,232 80,178 358,920 12.55% 397,080 13.8% 
des_perf 3,428,571 3,678 80:2 92 2,300 2,300 45,522 220,983 77,974 349,079 10.18% 386,190 11.2% 
vga_lcd 4,400,000 7,356 240:3 93 2,325 1,550 89,934 408,738 252,311 754,857 17.16% 828,748 18.8% 
netcard 28,034,722 9112 240:3 114 2,850 1,900 111,403 506,310 312,542 935,005 3.34% 1,133,868 4.0%  
