Built-in Self Repair Approach by Module Relocation for FPGA Based Reconfigurable Systems  by Eapen, Madhuri Elsa et al.
 Procedia Technology  24 ( 2016 )  1587 – 1594 
Available online at www.sciencedirect.com
ScienceDirect
2212-0173 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of ICETEST – 2015
doi: 10.1016/j.protcy.2016.05.146 
International Conference on Emerging Trends in Engineering, Science and Technology  
(ICETEST - 2015) 
Built-in Self Repair Approach by Module Relocation for FPGA 
Based Reconfigurable Systems 
Madhuri Elsa Eapena*, Pradeep C.b, Anila Ann Varghesec, Jisha M. Naird 
a,b,c,dDept. of Electronics and Communication, SAINTGITS College of Engineering, Kottayam, India, 686532 
Abstract 
Systems that are installed in harsh environment conditions are continuously exposed to radiations, temperature variations and 
pressure variations which cause fast circuit degradation and malfunctioning. FPGAs are used as the core component in many 
such systems especially in mission critical and safety critical applications. To ensure reliable and prolonged system functioning 
until mission completion, proper fault recovery techniques need to be incorporated in to the system during the design phase itself. 
Traditional self-repairing schemes utilize spare cells to replace faulty cells. So, the number of spare cells increases with the 
number of faults to be repaired which creates high area overhead. Dynamic runtime partial reconfiguration has been considered 
to be a promising technique that helps improve the flexibility and efficiency of FPGA based systems. The key concept behind 
self-repairing scheme discussed in this paper is faulty module relocation to enable better use of resources and scheduling of repair 
for different modules to maintain system operation until mission completion or up to the required lifetime with maximum 
efficiency. The paper presents an efficient self-repairing scheme for FPGAs which can handle higher number of faults with better 
resource utilization and lesser overheads. 
 
© 2016 The Authors.Published by Elsevier Ltd. 
Peer-review under responsibility of the organizing committee of ICETEST – 2015. 
Keywords:relocation; self-repair; placement; dynamic run-time partial reconfiguration 
 
 
* Corresponding author. Tel.: +91-8547370462 
E-mail address:madhurilseapen@gmail.com 
© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the organizing committee of ICETEST – 2015
1588   Madhuri Elsa Eapen et al. /  Procedia Technology  24 ( 2016 )  1587 – 1594 
1. Introduction 
Most modern VLSI based electronic systems that are used in safety and mission critical applications use Field 
Programmable Gate (FPGA) as their central working component. The working environment of almost all these 
systems are subject to continuous adverse changes (high or very low temperature, pressure, radiations etc) that there 
is higher chances of degradation and malfunctioning of the circuits, this adversely affect the performance and the 
reliability of the system. Also cause yield loss and sometimes can be a threat to property and even human life. In 
order to maintain a continuous uninterrupted operation of the system, measures must be taken from the design time 
of a system. This mainly addresses to a proper fault handling and recovery method, which can detect and correct or 
repair the faults that cause system malfunctioning in the apt time with minimum overheads. This needs fault tolerant 
schemes to be adhered with the system which mainly focus on an unaffected system function even after a permanent 
fault is detected.  
FPGAs (Field Programmable Gate Array) have been introduced in to industry, since more than two decades. And 
ever since then they have been a constant area of attraction for many areas of VLSI based system design. Due to 
increased logic handling capacity of FPGA they have been considered as an alternative for implementing complex 
and high computing designs and systems [1]. Ability to reconfigure at runtime further increases the flexibility and 
adaptive nature of the FPGAs in harsh working environments. The main limitation of reconfiguration is its time to 
reconfigure. The time taken to reconfigure entire FPGA is very high; even for a small change in any of the internal 
module the entire FPGA need to be configured again. As a solution to this dilemma dynamic partial reconfiguration 
was introduced, a catching feature that has been added to modern FPGAs (example: Xilinx Virtex 5, 6, 7 series). [2] 
Dynamic partial reconfiguration enables to configure a part of the FPGA while the remaining continues normal 
operation even during runtime which is referred as runtime dynamic partial reconfiguration. Even though partial 
reconfiguration has some latency it is very limited compared to time taken for entire FPGA reconfiguration. This 
feature of the FPGA is presently focused for implementing several FPGA self-repairing techniques. 
Self-repair or self healing is property exhibited by living organisms [3]. It is the ability of living organism to 
continue normal functionality even after a serious damage has occurred. They also overcome the damage with time 
by healing it by growing fresh cells in place of damaged cells. Repairing up to this extend is almost impossible in 
case of silicon based electronic system. But attempts are made by scientist in the field of self-repair to at least 
partially adopt this ability to self-heal so as to retain normal operation even after a fault has occurred until the 
specified mission is met with least overheads. Traditional methods of repair in FPGAs are mainly by use of spare 
modules or hardware redundancy. The spare modules are used to replace the faulty cell; which demands a spare cell 
for every faulty cell; this increases the area overhead if the number of faults to be handled is very high. In 
redundancy methods like in DMR or TMR the same module is implemented multiple times which even though 
enables fast repair, brings higher area and power consumption. Some of the most commonly used self-repairing 
methods are discussed in the following section. 
The paper deals with a self-repairing method which can repair more number of faults with the higher utilization 
of resources. The main concepts used behind this are relocation of the faulty modules and fine grain isolation of 
faulty unit, i.e. faulty CLB instead of entire module constituting it. This enables better resource utilization and 
generates space for future relocation after each repair. 
The paper layout is as follows; Section II discusses some of the interesting literatures published in FPGA self-
repairing. Section III describes the self repairing method. The results demonstrating the proposed method is 
illustrated in section IV. Section V concludes the paper with future scopes. 
2. Previous Works 
After understanding the importance of repairing FPGA faults, many literatures have been published proposing 
different self-repairing schemes, architectures and algorithm. Some of them are discussed in this section. In [4], 
Kim, S et al. discuss a fast fault recovery method by use of both redundant cell and also spare cell; also a detection 
unit is allotted with every cell. The paper demonstrates architecture with sixteen cells, each having a redundant cell 
and four spare cells. Due to the presence of spare and redundant cells fast fault recovery is possible. But the area 
1589 Madhuri Elsa Eapen et al. /  Procedia Technology  24 ( 2016 )  1587 – 1594 
overhead is very high also a decision making circuitry is necessary as the inputs to the spares and working cell are 
already pre-routed so which among the two has to function at a particular time. 
The most commonly used method to achieve fault tolerance in FPGA is by using TMR (Triple Modular 
Redundancy). It is a hardware redundancy technique. It uses three multiples of the same hardware and the output is 
taken via a majority gate; it can handle only a single fault but requires no fault detection unit separately [5]. Other 
diversities of TMR were also proposed like TMR with alternative computing and partial TMR. All of them were 
proposed to reduce the overhead caused by regular TMR. TMR with alternative computing uses three multiples of 
the hardware but each one is implemented in different way internally (one using gates other using multiplexers). 
Partial TMR is again another variant of TMR, proposed mainly to limit the huge area overhead produced by TMR 
[6]. In this case, only some sub-units which demands higher fault coverage or more importance are implemented 
with TMR. All the others are left out normally. TMR has two main disadvantages; higher area overhead and power 
consumption. Also it can only handle limited number of faults mostly only one. 
Lala proposed another method of repair inspired from human immune system [7]. In this method a spare cell is 
allotted for four working cells and they are connected through routing cells. The cells are arranged such that every 
working cell has two spare cells always present adjacent to them. When a fault is detected in any of the working cell 
any one of the adjacent spare is taken to replace it; which will be according to the decision router cells.    
[3] presents a fault recovery algorithm using king spare allocation of the spare cell. King spare allocation means 
for eight working cells 1 spare cell is allotted and it is located at the centre of the eight working cell. And this spare 
cell is used by any of the eight working cells when they get faulty. In case of absence of a spare in that tile (eight 
working cell is considered as a tile), Dijkshtra algorithm is used to locate the nearest spare and cells are shifted to 
bring the spare closer to the faulty cell. The shifting is very time consuming as it requires that much more number of 
reconfigurations. 
3. Proposed Self-repairing Method 
The work mainly focus on developing a better self-repairing strategy capable of handling more number of faults 
with better resource utilization and limited overheads. In short, higher fault handling capacity can be achieved with 
least overheads. The main technique used is dynamic runtime partial reconfiguration. To improve the efficiency of 
this technique module relocation is taken into account. The unique concept used in this method is relocation of the 
reconfigurable modules to another location within the reconfigurable area of the FPGA, which is not affected by any 
fault. The relocation is done such that, best resource utilization is achieved and more area is available in order to 
repair future faults. Also the faulty module once repaired is deleted and the smallest unit (according to the 
considered resolution by designer) with the fault is only isolated. This enables better area utilization. 
3.1. FPGA area 
The FPGA area is divided mainly into two parts static region and reconfigurable region. The static region 
consists of permanent units and cannot be configured again. The reconfigurable region is the area where the 
reconfigurable modules are placed. That is these modules are logic units whose configuration can be changed at 
runtime or individually without disturbing the rest of the system using dynamic partial reconfiguration.  
3.2. Reconfigurable tile 
The reconfigurable area is again divided into different tiles within which the reconfigurable modules are 
arranged. The simplest case is that the entire reconfigurable area is considered to be a tile.  
3.3. Reconfigurable module  
The reconfigurable modules are components that are implemented in the FPGA that can be placed, deleted or 
reconfigured at runtime dynamically. These are located in the reconfigurable area of the FPGA. 
1590   Madhuri Elsa Eapen et al. /  Procedia Technology  24 ( 2016 )  1587 – 1594 
3.4. Fragmentation 
It is a major issue that occurs inside the FPGA reconfigurable area due to inefficient placement and runtime 
deletion and creation of reconfigurable modules. The total area inside the FPGA is divided into many small areas 
such that even though sufficient area is available for a module it cannot be utilized as the area is split up and spread 
out. No continuous free area is available. This leads to poor resource utilization [8]. 
3.5. Relocation 
It is a method used to improve the efficiency of dynamic partial reconfiguration. It helps to relocate the 
reconfigurable modules to another location within the reconfigurable area itself. It can be done through software [9] 
(PARBIT) or hardware filters [10] (REPLICA).  
 [11] shows that defragmentation and module relocation are two concepts that can improve the efficiency of 
partial reconfiguration. [12] describes a priority based algorithm with dynamic partial reconfiguration that can repair 
more number of faults. For easy demonstration purpose the entire rectangle comes under one tile. The entire 
reconfigurable area is modeled as a two dimensional array of CLBs. The maximum resolution of the repair scheme 
discussed here is up to CLB level.  And eight different reconfigurable modules are placed arbitrarily within this area 
with one spare which is located at the centre of the reconfigurable area. And some area is left unoccupied inside the 
reconfigurable area for module relocation. Here, modules of different size are considered in order to avoid 
unnecessary wastage of area by considering all modules of equal size. 
Say for example, if there are n reconfigurable modules of size i1, i2, i3 … in  . Then a spare cell of size equal to the 
i will be allotted where, i >= max {i1, i2, i3, … in }. Maximum fault coverage can be achieved by leaving behind a 
continuous free area of size i for relocation of modules. This can sometimes lead to difficulty in the initial placement 
and design of the modules. So as a whole it can be said that to ensure more than 90 % fault coverage a free area of 
size greater than or equal to x will be enough, where x= {i1 + i2 + i3 …+ in} / n. i.e. the average of all the size of 
reconfigurable modules. According to the need of the customer the system can be designed with that much free area 
according to the longevity of the system. 
When a fault is detected efforts are made to replace the faulty cell with the spare. If the spare cell is already used 
then the faulty cell is relocated to a new location with unoccupied cells. To facilitate more unoccupied space the 
faulty cell location is made unoccupied and only the faulty CLBs are made dead, which helps use those healthy 
CLBs for future repair by relocation. 
 Demonstration of this repair scheme is discussed in the next section. Each reconfigurable module is considered 
as a cell and each cell is fixed with some of its criteria. 
3.6. Cell (p, w, h, a, pos(x), pos(y)) 
x p = priority of the cell 
x w = width of the cell 
x h = height of the cell 
x a = area occupied by the cell (in terms of CLBs) 
x pos(x) = X-position within the reconfigurable area of the FPGA 
x pos(y) = Y- position within the reconfigurable area of the FPGA 
 
Each of the reconfigurable modules within the reconfigurable area are dynamic components of the system that is 
implemented on the FPGA. Each of them will be implemented as different rectangular modules of different size at 
different location according to the designer. 
3.6.1. Priority of a cell 
Priority of a cell indicates importance of a particular unit within the system; say, a high priority cell is the one 
with the highest importance; without this particular unit the entire system may stop functioning. And low priority 
1591 Madhuri Elsa Eapen et al. /  Procedia Technology  24 ( 2016 )  1587 – 1594 
ones are of less important ones. Say for example in case of a processor MMU is a high priority unit while timer may 
be low priority module. 
3.6.2. Width and height of the cell 
It indicates the physical dimension of the rectangle 
3.6.3. Area occupied by the cell 
 Our algorithm has maximum resolution up to configurable logic blocks (CLBs). So the area occupied by the cell 
is represented in terms of the number of CLBs. Now considering the CLBs, it is characterized mainly by its position 
and its status.  
3.6.4. CLB (position(x, y), status (occupied, rm) 
The position(x, y) determine the x, y position of the CLB. The status determines whether a particular CLB is 
occupied (if occupied 1 else 0); if occupied which reconfigurable module is using this CLB currently. The monitor 
unit is used to update the details of the CLBs; to find which all are occupied and which all are available for 
relocation if necessary. 
Monitor = details of all unoccupied cells 
4. Demonstration of the Proposed Repair Scheme 
The total FPGA area is considered to be 50 x 40.  For easy demonstration purpose the entire reconfigurable area 
is considered as one tile. The entire reconfigurable area is modeled as a two- dimensional array of CLBs. Eight 
different reconfigurable modules of different size are placed arbitrarily within this area with one spare which is 
located at the centre of the reconfigurable area. And some area is left unoccupied inside the reconfigurable area for 
module reallocation. Figure 1 shows the initial placement of the cell within the reconfigurable area of the FPGA. 
Cells 1 to 8 are the normal working cells and 9th cell is the spare cell and it is the cell with the largest size so that it 
can be used to replace any faulty cell. Also more than average of the sizes of modules is left for relocation purpose. 
Table 1. Simulation Parameters 
Parameters Values 
Fault_info 1 
FPGA Area 50 x 40 
Reconfigurable Area 40 x 30 
No. of Tiles 1 
No. of Cells 9 
No. of working cells 8 
No. of spares 1 
Working cells 1,2,3,4,5,6,7,8 
Spare cells 9 
Faulty cell 1 
Total no. of CLBs 300 
Occupied CLBs 194 
 
1592   Madhuri Elsa Eapen et al. /  Procedia Technology  24 ( 2016 )  1587 – 1594 
 
Fig. 1. Placement of the reconfigurable modules 
When a fault is detected, attempts are made to repair this cell. After repair by relocating faulty cell to a non-
defective location, instead of isolating the entire cell (initially found faulty) only the faulty CLB within the cell is 
marked isolated. 
 
Fig. 2. Cell 1 is marked faulty 
Thus, each relocating faulty cell creates more area for next relocation during repair. The initial faulty module is 
made unoccupied, so that it can be used for further fault correction by relocation, which enables better resource 
utilization. The spare cell is used to repair a high priority cell only, as it is faster. And in case of no spare further 
faults can be repaired by relocation method. The sequence of operations done is demonstrated in figure 2, 3 and 4 
when a fault is detected on the 1st cell as it is low priority cell, repair is done by relocation. 
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
1 2 3 4
5
6 7 8
9
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
2 3 4
5
6 7 8
9
1
1593 Madhuri Elsa Eapen et al. /  Procedia Technology  24 ( 2016 )  1587 – 1594 
 
Fig. 3. Cell 1 is regenerated and undergoes self test 
 
Fig. 4. Faulty CLB is isolated 
Figure 1 shows a fault is detected in cell 1. In the cell only the faulty CLB is isolated from the faulty module, cell 
1, rest of the area of the cell can be used for future relocation purpose which enables better resource utilization. 
Figure 3 shows the relocation of the faulty cell 1 to another location; the relocated new cell 1 is shown in dashed 
edges. The faulty cell area undergoes self test to find the faulty CLB (Shown as ST:1).  After the self test only the 
faulty CLB is isolated and all other CLBs are made unoccupied as shown in figure 4. This method enables better 
resource utilization and more number of faults can be repaired as all the non-defective CLBs can be reused again for 
other relocations.  While in case of other repairing schemes, the entire faulty cell undergoes apoptosis and is not 
utilized. Also, the first faulty cell with high priority uses spare cell for repair. This also creates some area for 
relocation.  A comparison between the previous works and the proposed system is shown in table II; the number of 
spares or redundant cells used in each method and subsequent overheads shows the advantages of the proposed work 
over others 
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
2 3 4
5
6 7 8
9
1
ST :1
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
2 3 4
5
6 7 8
9
1
1594   Madhuri Elsa Eapen et al. /  Procedia Technology  24 ( 2016 )  1587 – 1594 
Table 2. Comparison with Previous Techniques 
Approach Resolution 
Total 
no. of 
cells 
No. of 
working 
cells 
No. of spare 
cells per 
working cell 
No. of 
redundant cells 
per working 
cell 
Area 
overhead 
Resource 
Utilization 
TMR 
Module level 
(low 
resolution) 
3 3 0/1 2/1 
High  
(3 times) 
Low 
Paralogous 
genes [4] 
Module level 
(low 
resolution) 
6 4 1/4  1/1 High Low 
Proposed 
method 
CLB level 
(high 
resolution) 
9 8 1/8 0/1 Low High 
5. Concluding Remarks and Future Scope 
An efficient FPGA self-repair method with high fault handling capability and better resource utilization is 
proposed and demonstrated. The main concepts that enabled better resource utilization is faulty reconfigurable 
module relocation which helped use both the unoccupied space and the non-defective portion within a faulty 
module. Compact placement of the relocated modules, enable to gather more continuous space for future relocation 
of modules in case of more number of fault occurrence. If the customer demands a better routing complexity, then 
placement after relocation can be done accordingly. In short, this concept can be extended to form a self- repair 
algorithm for FPGA based systems. Other future scope of this concept is to incorporate this self repairing scheme in 
to a system and implement on FPGA which supports dynamic runtime partial reconfiguration. 
References 
[1] Farooq U, Marrakchi Z, Mehrez H. FPGA architectures: An overview. In Tree-based Heterogeneous FPGA Architectures. Springer New 
York; 2012. p. 7-48. 
[2] Hagemeyer J, Kettelhoit B, Koester M, Porrmann M. Hagemeyer. Design of homogeneous communication infrastructures for partially 
reconfigurable FPGAs. In Proc. of the Int. Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA'07); 2007. 
[3] Pradeep C, Radhakrishnan R, Samuel P. Fault recovery algorithm using king spare allocation and shortest path shifting for reconfigurable 
systems. Journal of Theoretical and Applied Information Technology 61, no. 2; 2014.  
[4] Kim S, Chu H, Yang I, Hong S, Jung SH, Cho KH. A hierarchical self-repairing architecture for fast fault recovery of digital systems 
inspired from paralogous gene regulatory circuits. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 20, no. 12; 2012. p.  
2315-2328.  
[5] Lyons RE, Vanderkulk W. The use of triple-modular redundancy to improve computer reliability. IBM Journal of Research and 
Development 6, no. 2; 1962. p. 200-209. 
[6] Pratt B, Caffrey M, Graham P, Morgan K, Wirthlin M. Improving FPGA design robustness with partial TMR. In Reliability Physics 
Symposium Proceedings, 2006. 44th Annual., IEEE International. IEEE, 2006. p. 226-232. 
[7] Lala PK, Kumar BK, Parkerson JP. On self-healing digital system design. Microelectronics journal 37, no. 4; 2006. p.  353-362. 
[8] Joseph S, Baskaran K. Performance Analysis of Various Fragmentation Techniques in Runtime Partially Reconfigurable FPGA. 
International Journal of Computer Applications 94, no. 8; 2014.  
[9] Horta EL, Lockwood JW. Automated method to generate bitstream intellectual property cores for Virtex FPGAs. In Field Programmable 
Logic and Application. Springer Berlin Heidelberg; 2004. p. 975-979 
[10] Kalte H, Lee G, Porrmann M, Rückert U. Replica: A bitstream manipulation filter for module relocation in partial reconfigurable systems. 
In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. IEEE, 2005. p. 151b-151b. 
[11] Compton K, Li Z, Cooley J, Knol S, Hauck S. Configuration relocation and defragmentation for run-time reconfigurable computing. Very 
Large Scale Integration (VLSI) Systems, IEEE Transactions on 10, no. 3; 2002. p. 209-220. 
[12] Pradeep C, Radhakrishnan R, Baby N, Samuel P. Multiobjective Built In Self Repair Algorithm With Multiple Fault Detection For 
Reconfigurable Systems. Journal of Theoretical & Applied Information Technology 69, no. 2; 2014. 
