Self-Repairing Hardware Architecture for Safety-Critical
  Cyber-Physical-Systems by Khairullah, Shawkat & Elks, Carl
1 
 
Research Article 
 
A Self-Repairing Hardware Architecture for Safety-Critical Cyber-Physical-
Systems 
 
 
Shawkat S. Khairullah 1*, Carl R. Elks 2 
 
1 Department of Computer Engineering, University of Mosul, Mosul, Iraq 
2 Department of Electrical and Computer Engineering, Virginia Commonwealth University, 601 West Main 
Street, Richmond, United States of America 
*khairullahss@mymail.vcu.edu 
 
 
Abstract: Digital embedded systems in safety-critical cyber-physical-systems require high levels of resilience and robustness 
against different fault classes. In recent years, self-healing concepts based on biological physiology have received attention for 
the design and implementation of reliable systems. However, many of these approaches have not been architected from the 
outset with safety in mind, nor have they been targeted for the safety-related automation industry where significant need 
exists. This paper presents a new self-healing hardware architecture inspired by integrating biological concepts, fault 
tolerance techniques, and IEC 61131-3 operational schematics to facilitate adaption in automation and critical infrastructure. 
The proposed architecture is organized in two levels: the critical functions layer used for providing the intended service of the 
application and the healing layer that continuously monitors the correct execution of that application and generates health 
syndromes to heal any failure occurrence inside the functions layer. Finally, two industrial applications have been mapped on 
this architecture to date and we believe the nexus of its concepts can positively impact the next generation of critical cyber-
physical-systems in industrial automation. 
 
1. Introduction 
As we move toward a universe of networkable objects, 
cyber physical production systems, distributed 
manufacturing and distributed automation emerges as a set 
of loosely coupled, smart, autonomous units working 
together to achieve plant operations in ways we could not 
imagine a few decades ago.  To fully achieve the promise of 
autonomic behaviour and resiliency in advanced automation, 
these Cyber Physical Systems (CPSs) will need to rely on 
controllers that have properties of resilience, self-healing 
and agility. System resilience is “the ability of 
organizational, hardware and software systems to mitigate 
the severity and likelihood of failures or losses, to adopt to 
changing conditions, and to respond appropriately after the 
fact”[1]. System resilience and autonomic computing 
methods are emergent technologies organized around a 
concept of self-governance – much in the way biological 
systems have evolved. From the broadest stance, system 
resilience can be seen as four interwoven characteristics: 
self-configuration, self-optimization, self-healing, and self-
protection. Of these four attributes, self-healing and self-
protection capabilities are necessary requirements with 
respect to critical CPSs; where production down time, 
damage to expensive machinery, and serious injury can 
occur [2], [3]. Over the last two decades, new research on 
self-healing digital systems inspired by biology has been 
growing steadily – to enhance fault tolerance, resilience, and 
survivability properties [4], [5], [6], [7], [8], [9], [10]. To 
date, most of these approaches have not addressed 
automation or Instrumentation and Control (I&C) 
applications. Specifically, we found the “programmability” 
aspects of the previous designs to be too low level for 
application engineers to understand (e.g., VHDL and bit 
level implementations), or they lacked the computational 
capacity required for control applications (i.e., simple logic). 
To address these issues, among others, this paper presents a 
new self-healing hardware architecture with its main 
fourfold contributions: 1) Achieving high levels of tolerance 
against different classes of failures and faults. 2) Providing 
multiple layers of defense, healing, and graceful 
reconfiguration. 3) Manage complexity to enhance the 
verification and validation (V&V) properties. 4) And 
providing programmability via function blocks constructs – 
to enhance accessibility for practicing engineers.  
    Our inspiration for this research is not aimed toward 
blindly mimicking these processes, but motivated by 
comprehending their robust operation and using it as a 
foundation to build a new self-healing architecture with its 
concept that leverages existing programmable and 
configurable hardware technologies such as Field 
Programmable Gate Array (FPGA) technology, and 
Application Specific Integrated Circuits (ASICs). Many 
studies (and marketplace products) have confirmed that 
FPGA technology is a viable and competitive option for 
various application domains: aerospace, nuclear industry, 
and industrial control systems due to their characteristics 
with regard to re-programmability, high performance, 
concurrency, and low-cost development. The other main 
advantage for using FPGA technology is the large 
ecosystem of design, programming, and verification tools 
that allow users to build custom designs directly onto 
FPGAs [11], [12].  
2 
 
    Aggressive technology and power scaling have, over the 
past few decades, led to Nano-electronic devices that are 
faster, more power efficient and cheaper with each new 
generation. However, the reliability of highly scaled digital 
systems has been on a gradual decline for the past decade. 
This can be attributed in part to the increased sensitivity of 
highly scaled devices to transient faults arising from random 
events such as high-energy particle strikes, High Intensity 
Fields, electrostatic discharge and power fluctuations.  
Accordingly, contemporary IC technology (45nm and below) 
devices tend to manifest more Single Event Upset (SEU) 
and Multi Event Upset (MEU) and transient fault 
occurrences than previous technology generations [13], [14].  
    The proposed architecture in this paper is guided by a 
combination of two design principles: biologically inspired 
concepts and architectural principles. Firstly, we present the 
biological techniques: the cell life cycle and the immune 
system used by living organisms to achieve self-diagnostic 
and self-healing against the invaders. The cell life cycle is 
considered to be a self-diagnostic mechanism used to make 
the cell capable of continuously monitoring its chemical 
internal state through four phases of testing integrity. 
Furthermore, if the living cell is attacked by an invader 
causing a weak mutation, the effects of this mutation will be 
tolerated using a special enzyme. However, if the mutation 
is strong that cannot be repaired inside the cell and it leads 
to a cell death, the immune system will start working by 
generating two types of stem cells: adult cells B-cells and T-
cells which, are divided to produce more stem cells (self-
renewal), and embryonic cells that can be divided in order to 
generate terminally differentiated cells [15].  
    Secondly, architectural principles establish the basis of 
the formulation of operational rules for organization, and 
rules for composition for any architecture. Jackson [16] lists 
4 categories of biologically inspired resilience attributes 
namely; capacity, flexibility, tolerance, and inter-element 
collaboration to achieve resilience in architecting systems 
We extend the aforementioned attributes with 3 additional 
principles: 
1) Reorganization. Flexibility is the ability of a system to 
undergo changes with relative ease in operation while 
experiencing faults and disturbances. In contrast, 
reorganization principle says that the system should be able 
to restructure itself and/or migrate functionality in response 
to disruptions.  
2) Separation of Concerns or Partitioning. Building self-
contained units allows the disentangling of separate 
functions. In turn, they can be grouped in self-contained 
architectural units to generate stable forms. Partitioning in 
the proposed architecture separates the healing layers and 
computational activities from communication and I/O 
activities.  
3) Independence. Reducing the architectural units to their 
minimum representation as required by the application. 
     
    The design principles listed above are used throughout the 
design of the proposed bio-inspired digital system to achieve 
high levels of self-healing capacity and architect an efficient 
self-healing, engineer accessible hardware architecture. 
2. Related work  
Over the last two or three decades, a new research on the 
design and realization of self-healing digital systems 
inspired by biological concepts has been steadily growing to 
enhance dependability properties of safety-critical 
applications against different classes of faults: transient, 
intermittent, and permanent. In this section, a brief 
discussion on bio-inspired hardware-based digital systems 
that can be realized on FPGA fabric is presented. A 
relatively new emerging field that are closely related to our 
research project for the realization of resilient digital 
systems is bio-inspired system design.  It attempts to go 
beyond traditional approaches of fault tolerant computing 
and modular redundancy to learn from characteristics of 
living things and adapting them to digital electronic systems 
[17], [18]. This research topical area aims to achieve high 
levels of resilience and self-healing properties by utilizing 
the power of reconfigurable hardware computing. The self-
healing mechanism can be partitioned basically into five 
stages: fault model stage, fault detection stage, faulty 
component isolation stage, system reconfiguration stage, 
and system self-healing stage. In [19], [20] , a new 
programmable cellular architecture that performs logic and 
arithmetic operations for a self-repairing FPGA has been 
designed by Mange et al. This architecture includes four 
hierarchal levels of organization: molecular, cellular, 
organismic, and population to tolerate the transient faults in 
the molecular layer. In [4], Tyrrell et al. have presented a 
different architecture which embeds a logic block 
performing the functions by a 2-1 multiplexer and a D flip-
flop. This approach is a two-level hierarchical architecture 
consisting of a cellular level and organism level. Two modes 
of operation were used in this work to control the operation 
of the organism in a fault tolerant manner. Zhang et al. has 
developed an architecture that also works at two levels: 
cellular and organism, but they have increased the level of 
fault coverage by adding more levels of reliability to detect 
the transient faults in the configuration memory besides the 
detection of permanent faults [5]. In [6], the authors have 
designed a new architecture inspired by the biological 
process of the human immune system. Their approach used 
several routing cells and spare cells distributed among an 
array of functional cells. In [21], Wang et al. have designed 
a different approach to realize the self-healing cellular 
architecture. Their design was based on using a Look Up 
Table (LUT) as a building block for the functional unit. In 
[7], a novel self-repairing hardware architecture inspired by 
paralogous gene regulatory circuits was designed to achieve 
fast fault recovery with an efficient use of hardware 
resources. Ultimately, in [22], [23] the researchers have 
developed a new self-healing HW architecture which has 
some similarities with PLC based architecture (e.g. use of 
function blocks for programming, hardware/FPGA based), 
the proposed architecture is a completely bio-inspired 
architectural approach which includes fault aware function 
blocks, reconfiguration ability, and the ability to migrate 
functionality.  
    Most of the previous related designs presented in this 
section have focused on the later two stages of the self-
healing mechanism, in particular stage 4 and stage 5, which 
include the reconfiguration strategy of the system and how 
its structure is reconfigured so that the system can adapt to 
3 
 
the fault occurrence. However, our aim in this research is to 
design a new self-healing hardware architecture that is 
inherently resilient, capable of being verified for safety 
properties, and accessible by the industrial automation.  This 
architecture is aimed to provide resilience properties to 
multiple classes of faults and verify safety and functional 
properties for the critical aspects of the design using 
concepts of formal design assurance to an effort to qualify 
the novel self-healing system to be used in safety-critical 
applications. 
3. Expected Contributions to the Field 
The research on bio-inspired self-healing digital systems is 
expected to be noteworthy to a number of stakeholders in 
the extreme environment digital Instrumentation and 
Control (I&C) systems, industrial automation, and those 
areas concerned with safety-related Cyber Physical Systems 
(CPSs). As these different applications domain areas 
become pervasive, more CPS applications will be deemed 
critical for public services – such as, smart traffic 
automation, smart energy management, and smart cities. 
Digital embedded devices operating in these diverse 
application areas may experience harsh operating conditions 
and environmental changes where disturbances from 
random events such as High-Intensity Radiated 
Electromagnetic (EM) Fields (HIRF), extreme temperatures, 
radiation, or cosmic particle strikes are at an increased threat. 
In all these cases, the occurrence of transient and permanent 
faults occurring, simultaneously affecting digital embedded 
devices or nodes, is a significant concern. As a consequence, 
the ability to detect, and repair the experienced failure 
modes is important. In order to support all the objectives 
mentioned above, the work presented in this paper can be 
categorized in three contributions as described below:  
I. This research extends the state-of-the-art in 
resilient Very-Large-Scale Integration (VLSI) design by 
developing new Fault Management Approach to support 
resiliency not previously reported in the literature. To the 
best of our knowledge, the majority of work on bio-
inspired digital systems achieve the self-healing 
objective at decentralized level by embedding self- 
diagnostic modules inside the functional cells. These 
models are used to configure an internal control unit or 
notify a neighboring spare cell as a recovery mechanism. 
In our approach, we instead have health monitoring and 
recovery units.  
II. To date, most of the traditional biologically-
inspired self-healing approaches have not seriously 
addressed industrial automation or safety related 
Instrumentation and Control (I&C) applications. 
Specifically, we found the “programmability” aspects of 
the previous designs to be too low level for application 
engineers to understand (e.g., Very-High-Speed 
Integrated Circuit Hardware Description Language 
(VHDL) and bit level implementations), or they lacked 
the computing capacity required for control applications 
(i.e., simple logic). For example, As a result, a unique 
hierarchal self-healing architecture is designed in that 
resilience principles are derived from a heterogeneous 
perspective-combining concepts from biological systems 
(immune system, stem cells, living cell cycle, and 
genetic expression) and computer organization to 
provide a well- formed self-healing hardware 
architecture.  
III. To the best of our knowledge, we believe this 
architecture is the first to employ PLC programming 
semantics accompanied by traditional fault tolerance 
techniques in a Bio-Inspired self-healing architecture. 
Programming semantics for PLC controllers typically 
specifies the standards for PLC software and these 
standards can define the PLC configuration, 
programming, and data storage. PLC vendors typically 
specifies five basic programming languages that support 
the IEC 61131 standard and these programming 
languages are Functional Block Diagram (FBD), Ladder 
Diagram (LD), Sequential Functional Chart (SFC), 
Structured Text (ST), and Instruction List (IL).  
4. Overview of the proposed hardware 
architecture  
The proposed architecture is designed to achieve high levels 
of self-healing capacity against different classes of faults 
(transient, permanent, and hardware Common Cause 
Failures (CCFs)) with utilizing minimum amount of 
hardware resources. It also aims to provide efficient 
hardware area overhead, and verification and validation 
(V&V). The basic architecture shown in Fig. 1 is based in 
part on the way biological organisms achieve resiliency, and 
our architectural design principles. It is comprised of two 
principle divisions; (1) the Critical Functions Layer which is 
responsible for providing the intended functionality of 
critical application and (2) the Healing Layer which is not 
only responsible for monitoring the healthy behaviour of the 
functions but also responsible for triggering the required 
recovery mechanisms to heal any defected T cells present in 
the critical layer. Furthermore, the critical layer embeds two 
sublayers: the active 61131-based functional sublayer (AFL) 
corresponding to B cells and the passive redundant cellular 
sublayer (PRCL) imitated the T cells in the immune system. 
However, the healing layer is corresponding to the life cycle 
of the living cell and the embryonic stem cells. We utilize 
these two main layers to create interacting functional and 
self-healing partitions to achieve overall system resilience 
(Fig. 2).  
    Additionally, the critical functions layer executes the 
safety-critical application functions. Specifically, it contains 
sixteen functional cells: eight active B cells (designated F in 
Fig. 2) distributed among two active functions sublayers: 
left AFL and right AFR used to execute the application-
based functions. The same layer also contains eight passive 
pre-generated redundant T cells connected as passive 
redundant resources: PRCL and PRCR used as a healing 
mechanism for the faulty B cells (designated R in Fig. 2).      
    
Fig. 1.  High level perspective of the proposed architecture 
 
4 
 
    The correct execution of each B cell is monitored 
continuously by its neighboring healing layer, and once a 
fault is detected and determined to be transient inside the 
cell, it is masked/tolerated using an embedded hybrid 
redundancy unit. The hybrid redundancy unit (see Fig. 3) 
represents a first line of defense against the discovered 
transient faults defined as temporary deviations in the input 
register values. It is designed as an active redundancy 
technique to tolerate the effects of transient faults that may 
defect the input registers for the Fault-Tolerant Generic 
Function Blocks (FTGFBs).  
 
    Each hybrid redundancy unit (see Fig. 3) consists of eight 
hardware components: three registers, three error detection 
units, a monitoring switch, and a comparator circuit. A 
transient fault was simultaneously injected into four input 
registers of the four hybrid redundancy units (HRUs) 
embedded inside the FTGFB in which transient faults can be 
tolerated sequentially for an unlimited number of times 
using a self-monitoring switching unit.  
    Since this self-healing architecture is designed to realize 
the functionality of critical systems operating in harsh 
environments, radiation-induced transient faults that can 
occur at unpredictable times are the most prevalent fault 
type [24], and the process of hardening the digital device at 
the circuit level [25] is more effective, the subsequent fault- 
tolerance can occur even with an uncovered error. If they are 
not tolerated at the block level, their wrong values will be 
sensitized at the output signals of the cell level, which is 
directly connected to the external world through a network 
of I/O digital ports and can impact the safety of the public or 
the environment. 
    On the other hand, any permanent failure occurrence 
inside the functional B cells can be detected using a passive 
duplication with comparison unit embedded inside the same 
cell and an error flag is transmitted to the healing layer. 
Consequently, the failure monitoring unit embedded in the 
healing layer will sensitize that error and generates a health 
syndrome works as a self-healing mechanism. This 
mechanism includes sending one digital control signal to the 
defected AFL sublayer to deactivate the output of the faulty 
B cell (cell death). Another signal is transmitted to a routing 
unit that includes two switching units and is used to reroute 
digital input data from the faulty B cell and makes it 
available at the input of the healthy T cell (reorganization). 
Finally, the third control signal selects a genetic code stored 
in a configuration memory of the T cell located in the 
neighboring PRCL unit so that the functionality of the 
defected cell is healed and performed by this healthy T cell 
(restoration). As a result, the failure monitoring unit 
embedded in the healing layer represents the second line of 
defense against the permanent faults effecting one or all of 
the eight B cells.  
    As a third line of defense against the occurrence of 
additional permanent faults that can defect the T cells, four 
hardware components were added to the design of the 
healing layer which is responsible for fault management for 
the entire critical functions layer. These components are: 1) 
A forming health syndrome unit - which continuously reads 
the error signals from the two PRCL sublayers embedded 
inside the critical functions layer. Also, it generates eight 
syndromes used to differentiate eight 61131-based execution 
units embedded in four embryonic stem cells. 2) A 
syndrome switching circuit - selects which one of the eight 
syndromes, generated by the health unit, will be chosen to 
differentiate the embryonic stem cell and its two embedded 
execution units. 3) Two healing sublayers: LHS and RHS 
and each sublayer contains two embryonic stem cells (S0, 
S2 or S1, S3) - can be differentiated to repair any type of T 
cells in the critical layer. The healing layer fault 
management process is aimed at managing the processing 
capacity of the critical functions layer. If too many faults 
occur, then we have too few resources to maintain 
operations. In addition, this process is imitating how the 
embryonic stem cell is differentiated in the case of the 
 
Fig. 2.  The internal structure of the architectural concept 
 
Fig. 3. Hybrid Redundancy Unit 
 
5 
 
failure of the immune system in generating the T cells as a 
first line of defense against the invader. 
5. Critical applications applied to the proposed 
architecture 
Two safety-critical cyber physical applications have been 
mapped into the proposed architecture and these 
applications are Emergency Diesel Generator (EDG) Startup 
for Nuclear Power backup energy and an Automotive Cruise 
Control. These two applications are modest steps toward a 
planned at-scale-application related to complex distributed 
industrial control. Quartus Prime 15.1 Lite Edition software 
from Altera was used as a design tool that supports several 
FPGA device families and the system was embedded in a 
digital platform, the Altera Cyclone V (5CGXFC7C7F23C8) 
FPGA. 
 
5.1. Emergency Diesel Generator  
A classic example that illustrates model-based control 
functional logic for the EDG, published in Electric Power 
Research Institute (EPRI) technical report is illustrated in 
Fig. 4. The EDG receives a total of fourteen digital input 
signals and produces two output signals. The output signals 
are calculated from the input signals using basic 
combinational logic AND, OR, and NOT operations. The 
EDG digital control system within a Nuclear Power Plant 
(NPP) is a safety critical system required for reactor cooling 
and other safety functionalities. While the functionality of 
the EDG is rather simplistic, it is a highly critical system 
that must be fault tolerant. To demonstrate the resilience 
properties of the proposed architecture, the EDG critical 
application has been implemented on the proposed 
architecture. This implementation required sixteen 
functional cells to be connected together in such a way that 
two critical functions layers are interconnected with each 
other (see Fig. 4). Each one of the two functional cells (B 
cells) embedded inside the critical functions layer has to 1) 
activate a different genetic code (DNA expression) based on 
the current address of the functional cell and 2) receive 
different digital input data through the I/O routing units 
connected to the input and output ports of the EDG 
application. When the EDG is subjected to two sequential 
permeant faults into the functional units of both F0 cell and 
R0 cell in the critical function layer, the EDG application is 
healed against the first fault by time 345ns and the second 
fault by time 455ns. This about 82 % increases in time delay 
to handle 2 sequential permanent faults – and this delay 
remains relatively constant as the number of handled faults 
increases.  
 
5.2. Cruise control system 
A classic example that illustrates mode-based control seen 
in process automation applications is the automotive cruise 
control system (CCS) illustrated in Fig. 5. The CCS is a 
closed loop control system that keeps the vehicle tracking at 
a constant speed without depressing the accelerator pedal in 
spite of the external disturbances. This can be achieved by 
measuring the vehicle speed, comparing it to the desired 
speed, and then adjusting the throttle output value based on 
specific control rules like the Proportional Integral (PI) 
controller. The CCS receives a total of six digital input 
signals and produces two output signals. The output signals 
are calculated from the input signals using a combination of 
some digital control logic and a PI controller. 
 
    A block diagram for PI the controller that is used in many 
industrial control systems has been implemented on the 
proposed hardware architecture with modest investment in 
time. To implement the CCS application on the architecture, 
this application has been partitioned into three levels: level1 
 
Fig. 4. Logic diagram for starting the EDG system. 
 
 
Fig. 5. A closed loop cruise control system CCS. 
 
Table 1 CCS functionality mapping on 17 functional cells 
Implementation Level Functional Cell Operation 
Level_1 
Top 
Control Logic 
FC1, FC2 NOT, Addition 
FC3, FC4 Delay, OR 
FC5, FC6 Multiplexing, 
Subtraction 
Level_2 
PI Controller 
FC12, FC13  
FC14, FC17 
Multiplication, 
Addition 
FC15, FC16 Comparison, 
Multiplication 
FC7, FC8 Multiplexing 
Level_3 
Bottom 
Control Logic 
FC9 Delay 
FC10 Addition 
FC11 Subtraction 
 
Table 2 Operational semantics of CCS application 
Condition State Operation 
Set Operation is set Target speed = 
Actual speed 
Decrement Speed is decreased Target speed = 
target speed -1 
Increment Speed is increased Target speed = 
target speed +1 
Cancel/Brake Speed is cancelled Target speed = 0 
 
 
6 
 
(top control logic), level2 (PI controller), and level3 (bottom 
control logic). Table 1 shows the different operations that 
are required to perform the mapping process of the CCS 
application and how they are distributed on 17 functional 
cells of the proposed architecture, and the four operational 
semantics of the designed CCS application are shown in 
Table 2. As a consequence, the functionalities of these three 
different levels were distributed among three critical 
functions layers of the architecture illustrated in Fig. 2, and 
at each layer eight functional cells (designated F) are 
triggered at a specified time.  
    The experimental simulation results show that the CCS 
takes at least 35ns execution time to produce a value “50” 
for the “Target” output signal because the clock cycle time 
is 10ns. In addition, this signal is generated only by the 
functional cell FC5, in which the execution of control flow 
embedded inside the GFB needs only 3.5 clock cycles. 
Finally, the FC6 output, which represents the error signal 
based on the difference between the actual speed and target 
speed, is always connected as an input signal to the PI 
controller. This controller computes the “Throttle” output 
signal value based on the error value to generate the throttle 
output value at time 430ns. 
    The simulation results for the first fault injection case 
study in which, three sequential transient faults have been 
injected into the three input registers of FTGFB for the first 
bio operational-cell (designated F0 in Fig. 2) in the 
architecture at times: 180ns, 240ns, and 300ns, hybrid 
redundancy units are tolerating their impacts quickly, and 
the output signal is generated without producing any 
erroneous value at the “Dat_Out_B” digital output port. This 
signal represents the result of the GFB executing on the four 
digital input signals: “North”, “West”, “East”, and “South”. 
The second case study shows that two sequential permanent 
faults have been injected into the GFB units of two cells: B 
cell (designated F0 in Fig. 2) and T cell (designated R0 in 
Fig. 2). In all cases, faults are identified by embedded self-
checking units, and the system is repaired successfully. This 
type of multiple fault scenario typically occurs in industrial 
automation systems when there is cascading disturbance 
effect due to a power fluctuations, Electromagnetic 
Interference (EMI), and latch-up. 
 
6. Design practices and verification methods 
From the outset, we adopted a model-based design 
perspective for this work. Model-based design is a design 
method that establishes a useful framework for the 
development and integration of formal executable models 
system and its environment early in the design cycle. We 
chose the MathWorks Simulink Toolchain for the presented 
project. The main tools we used for the design were: 
• Simulink and Simulink Stateflow. 
• Simulink Design Verifier. 
• Simulink HDL coder (automatic code generation). 
• Altera Quartus for the analysis, simulation and 
synthesis of HDL designs. 
   A critical tool in our verification scheme is the usage of 
the Simulink Design Verifier (DV) toolbox. Design Verifier 
is formal verification tool combining both model checking 
and limited automatic theorem proving [26]. Model 
checking, as the name implies, given a model of a system 
checks to whether this model meets a given property 
specification. Usually this consists of exploring all states 
and transitions in the state model. While model checking is 
finite in nature, the number of states that can be efficiently 
searched is enormous – making it practical and applicable to 
real systems. If properties hold, the model checker outputs a 
confirmation. If a property fails to hold for some possible 
event sequences, the tool produces counterexamples, i.e., 
traces of event sequences that lead to failure of the property 
in question [27]. 
    In Simulink DV, a proof objective is generally specified 
as illustrated in Fig. 6. We have a function F for which we 
would like to prove a certain property P. As shown in Fig. 6, 
the output of function F is specified as input to block P. 
Property P is a predicate, which should always return true 
when hypotheses H set on the input data flows of the model 
are satisfied. P is therefore connected to an Assertion block, 
while H is connected to a Proof Assumption block. 
Whenever an Assertion block is used, DV attempts to verify 
whether its specified input data flow is always true. Proof 
Assumption blocks have the purpose to constrain the input 
Fig. 6. Generic Proof Structure of Simulink DV 
 
 
Fig. 7. Functional Property Proving 
 
7 
 
data flows of the model during proof construction. Proof 
Assumptions blocks are not always required, especially if 
input space does not need constrained [28].  
    Simulink DV has been used to check a number of 
properties for the architecture and its two applications. We 
show one example of functional property proving and the 
verification model for the FTGFB for one specific 
functional property proof (see Fig.7). Specifically, a 
functional specification in English for proper sequencing of 
FTGFB is given as:  
“If four digital input data lines of the FTGFB are read at 
the same time in parallel while state machine-based control 
flow is triggered, the Data-out-B signal will always produce 
a correct value with the rising edge of the done signal”.  
    This functional specification is transformed into DV 
property model and the formal temporal logic expression of 
this requirement is 
G (P1 ^ P2 ^ P3 ^ P4 ==> F (Q)) 
Where: P1, P2, P3, P4= four digital input lines, G is 
universal quantification of the expression. 
 
7. Comparative Self-healing Capacity and Area 
Overhead Assessment  
In this section we perform a qualitative comparison of the 
proposed architecture and comparable systems found in the 
literature. These reference cases include a voting-by-
majority Triple Modular Redundancy (TMR) w self-healing 
architecture, the re-routing self-healing architecture by Lala 
et al., and the self-healing architecture inspired by the 
endocrine cellular communication by Yang, I et al [29], [8], 
[30], [7]. Table 3 summarizes the comparison results when 
implementing the Emergency Diesel Generator (EDG) 
application with an array of N*N functional cells. In the 
comparison table (see Table3), all the functional cells that 
are used for either the rerouting purposes or the recovery 
processes were considered as an additional hardware 
overhead. In addition, the maximum number of defective 
functional cells that can be healed against a number of 
sequential or concurrent permanent faults has been used in 
the calculation of the self-healing capacity coverage (C). 
Each of the four self-repairing strategies has a different cell 
replacement and rerouting process in such a way that each 
one has different advantages and disadvantages which are 
presented in Table 3. The self-healing capacity coverage (C), 
is calculated based on the following formula: 
𝑆𝑒𝑙𝑓ℎ𝑒𝑎𝑙𝑖𝑛𝑔 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 (𝐶) =
𝑆𝐶𝑠
𝑆𝑃𝐹
                (1) 
    Where: - 𝑆𝐶𝑆 ∶ represents the total or available number of 
spare functional cells that can be used for self-healing at an 
instant in given time. 𝑆𝑃𝐹 ∶ represents the maximum 
number of sequential or concurrent permanent faults 
occurrences that may defect the self-healing architecture at a 
given time. In relation to the self-healing capacity coverage, 
we assume up to a maximum of 12 fault occurrences of fault 
type permanent (SPF=12) that can impact the self-healing 
architecture sequentially at different times or concurrently at 
Table 3 Self-healing capacity coverage and area overhead comparison in the EDG implementation with N*N array of cells 
Architecture Biological 
concept 
Functionality Advantages & Disadvantages No. of 
F 
Cells 
No. of 
Spare 
Cells 
No. of 
Re-
routing 
Cells 
Self-
healing 
Capacity 
Coverage 
Area 
Over-
head 
The 
Proposed 
Self-Healing 
Architecture 
Embryonic 
cells+ 
immune 
system+ 
DNA 
expression 
61131-based unit 
capable of 
implementing 
one function of 
up to N variables  
1) Tolerating transient faults in the 
registers and the defected cell can be 
used for unlimited number of times 
2) Healing against permeant faults in 
1131-based unit 
3) Recovery time is demonstrated 
N*N / 
2 
(N*N) / 2 
+ (N*N) / 
4 
---- 1 150 % 
Re-routing 
Self-healing 
(Lala) 
Elimination 
Strategy 
Immune system LUT-based 
can implement 
any function of 
up to 3 variables 
1) Healing against transient faults in the 
contents of RAMs 
2) Cells can tolerate only one fault 
3) System cannot use the defected cell for 
the second time 
4) (tolerates only one fault for each f cell) 
100% if no. of faults=no. of spares=8  
5) 66% if no. of faults= 12 > no. of spares 
6) Recovery time isn’t verified 
N*N / 
2 
(N*N) / 4 (N*N) / 
2 
0.333 150 % 
Gene 
Control 
(Yang) 
Elimination 
Strategy 
Endocrine 
cellular 
communication 
LUT-based 1) Monitoring and detecting the soft 
errors only inside the gene memory 
2) re-routing with, but not after cell 
replacement 
3) proposed to be used in outer space or 
deep sea 
4) four sequential permanent faults for 
one working cell and two simultaneous 
faults in two cells 
N*N / 
2 
(N*N) / 2 ---- 0.666 100% 
Voting-by-
Majority 
TMR 
Elimination 
Strategy 
Paralogous 
gene regulatory 
circuits 
LUT-based 1) five permanent faults and unlimited 
number of transient faults in a single 
working cell with time delay 
reconfiguring four spare cells and one 
redundant cell 
 
N*N / 
2 
(N*N) / 2 
+ 
(N*N) / 8 
---- 0.833 125% 
 
8 
 
one instant of time. The self-healing capacity coverage (C) 
has been calculated for the proposed architecture and 
compared to the other three architectures as it is shown in 
Fig. 8(a).  
    These results show that the proposed architecture has the 
potential to achieve high self-healing capacity coverage (C) 
(approaching 1), and as much as the coverage of both self- 
repairing architectures the voting-by-majority and the gene 
control architecture by Yang as the system size increases. In 
addition, our proposed architecture requires 150% hardware 
overhead and this overhead is approximately equal to the 
overhead was consumed by the voting-by-majority 
elimination architecture. The gene control self- repairing 
architecture requires 100% hardware area overhead and the 
re-routing self-healing architecture consumes 150 %, as 
shown in Fig. 8(b). The hardware area overhead that is 
shown in Table 3 was calculated based on the equation (4-6) 
is considered efficient for our architecture when compared 
to the re-routing self-healing architecture approach and the 
sel f-healing capacity coverage is considered high.  
𝐴𝑟𝑒𝑎 𝑜𝑣𝑒𝑟ℎ𝑒𝑎𝑑 = 
(𝑁𝑜.𝑜𝑓 𝑠𝑝𝑎𝑟𝑒 𝑐𝑒𝑙𝑙𝑠+𝑁𝑜.𝑜𝑓 𝑟𝑜𝑢𝑡𝑖𝑛𝑔 𝑐𝑒𝑙𝑙𝑠)
𝑁𝑜.𝑜𝑓 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑎𝑙 𝑐𝑒𝑙𝑙𝑠
 ∗   100                  (2) 
    The reason behind that pre generated T cells are 
distributed throughout the system structure in such a way 
that each functional B cell has its own T cell. As a result, no 
row or column elimination strategy is needed to recover the 
system against the failure, which is considered an inefficient 
method in terms of hardware area resources in other 
systems. When a cell goes faulty, only one of its 
surrounding redundant T cells can replace it in a self-healing 
system. However, as a drawback, the unutilized hardware 
resources embedded in the same cell when the faulty cell is 
removed from the system due to only a single fault 
occurrence in one component is considered inefficient. 
    Regarding the re-routing architecture, self-healing 
properties can be achieved for the same number of faulty 
cells defected by permanent faults as a comparison with the 
three presented architectures. However, the self-healing 
capacity coverage is less and equal to 0.333 (see Table 3) 
due to the availability of only four spare cells in a network 
comprised of sixteen functional cells. The results presented 
in Table 3 are dependent on fault classes that they were 
assumed, and the proposed architecture was designed for- 
some other self- healing hardware architectures work only 
for tolerating transient faults, soft errors, intermittent faults, 
etc. Consequently, a comprehensive comparison table 
between the proposed self-healing architectures found in the 
literature is challenging. However, we can that the proposed 
 
Fig. 8(a). Self-healing capacity coverage between different architectures as system size increases. 
 
 
Fig. 8(b). Hardware Area overhead between different architectures as system size increases. 
 
9 
 
architecture is superior to other research works in terms of 
predicting multiple fault classes, self-healing coverage, and 
area overhead.  
8. Conclusion and future work 
This paper has presented a architecture, design and 
application of a new biological inspired self-healing 
hardware machine. This machine is intended for the design 
of cyber-physical-systems where; (1) resilience to failures 
and fault tolerance are important, (2) efficiency in the 
hardware overhead is needed, and (3) flexibility to 
reconfigure and accessible to automation I&C engineers.  
The adoption of Function Block Programming allows 
existing PLC application to be translated into the proposed 
system more easily and promotes understanding by the 
automation community. The proposed architecture is a 
unique approach in that resilience principles are derived 
from a heterogeneous perspective-combining concepts from 
both biologically inspired self-healing attributes and system 
organization properties to achieve efficient and effective 
fault tolerance to multiple classes of faults. This architecture 
has been demonstrated on several systems to date with 
confirmatory evidence that the approach is feasible and is 
practical.  
    Immediate future work will include several important 
research tasks necessary for translating the research into pre-
commercial evaluation.  The first step is to select and 
develop a means to network the architecture modules to 
facilitate distributed autonomous control.  Secondly, 
conduct a large-scale fault injection campaign to collect 
fault handling data. Thirdly, critical components of the 
architecture will be formally verified to verify both the data 
flow and the control flow at runtime for the FTGFB. As a 
final observation we note the experience of designing and 
using a self-healing hardware architecture has yielded more 
information than just quantifying the self-healing and 
resiliency aspects of the system. The process itself was an 
iterative learning experience, allowing circumspection into 
how Bio-inspired systems can be pragmatic solutions to 
achieving more dependable systems. Therefore, with this 
project we have attempted to bring new insights into the 
design of bio-inspired autonomic systems for a range of 
stakeholders from automation to the Internet of Things.  
           
9. References 
[1] S. Jackson, “A Multidisciplinary Framework For 
Resilence To Disasters And Disruptions,” J Integr 
Process Sci, vol. 11, no. 2, pp. 91–108, Apr. 2007. 
[2] N. R. Storey, Safety Critical Computer Systems. 
Boston, MA, USA: Addison-Wesley Longman 
Publishing Co., Inc., 1996. 
[3] N. Leveson, Engineering a safer world: systems 
thinking applied to safety. Cambridge, Mass.: The 
MIT Press, 2012. 
[4] C. Ortega and A. Tyrrell, “Biologically inspired 
reconfigurable hardware for dependable applications,” 
in Hardware Systems for Dependable Applications 
(Digest No: 1997/335), IEE Half-Day Colloquium on, 
1997, pp. 3–1. 
[5] X. Zhang, G. Dragffy, and A. G. Pipe, “Embryonics: 
A Path to Artificial Life?,” Artif. Life, vol. 12, no. 3, 
pp. 313–332, Jul. 2006. 
[6] P. K. Lala and B. K. Kumar, “An Architecture for 
Self-Healing Digital Systems,” J. Electron. Test., vol. 
19, no. 5, pp. 523–535, Oct. 2003. 
[7] S. Kim, H. Chu, I. Yang, S. Hong, S. H. Jung, and K.-
H. Cho, “A Hierarchical Self-Repairing Architecture 
for Fast Fault Recovery of Digital Systems Inspired 
From Paralogous Gene Regulatory Circuits,” IEEE 
Trans. Very Large Scale Integr. VLSI Syst., vol. 20, 
no. 12, pp. 2315–2328, Dec. 2012. 
[8] I. Yang, S. H. Jung, and K. H. Cho, “Self-Repairing 
Digital System With Unified Recovery Process 
Inspired by Endocrine Cellular Communication,” 
IEEE Trans. Very Large Scale Integr. VLSI Syst., vol. 
21, no. 6, pp. 1027–1040, Jun. 2013. 
[9] I. Yang, S. H. Jung, and K. H. Cho, “Self-Repairing 
Digital System Based on State Attractor Convergence 
Inspired by the Recovery Process of a Living Cell,” 
IEEE Trans. Very Large Scale Integr. VLSI Syst., vol. 
25, no. 2, pp. 648–659, Feb. 2017. 
[10] S. Yin, Y. Li, Y. l Qian, and X. b Liu, “A novel cell 
mutual detection method for bio-inspired electronic 
array,” in 2017 Prognostics and System Health 
Management Conference (PHM-Harbin), 2017, pp. 
1–5. 
[11] E. Monmasson, L. Idkhajine, M. N. Cirstea, I. Bahri, 
A. Tisan, and M. W. Naouar, “FPGAs in Industrial 
Control Applications,” IEEE Trans. Ind. Inform., vol. 
7, no. 2, pp. 224–243, May 2011. 
[12] E. Monmasson and M. Cirstea, “Guest Editorial 
Special Section on Industrial Control Applications of 
FPGAs,” IEEE Trans. Ind. Inform., vol. 9, no. 3, pp. 
1250–1252, Aug. 2013. 
[13] M. Wilkening, V. Sridharan, S. Li, F. Previlon, S. 
Gurumurthi, and D. R. Kaeli, “Calculating 
Architectural Vulnerability Factors for Spatial Multi-
Bit Transient Faults,” in 2014 47th Annual 
IEEE/ACM International Symposium on 
Microarchitecture, 2014, pp. 293–305. 
[14] N. J. George, C. R. Elks, B. W. Johnson, and J. Lach, 
“Transient fault models and AVF estimation revisited,” 
in 2010 IEEE/IFIP International Conference on 
Dependable Systems and Networks (DSN), 2010, pp. 
477–486. 
[15] Bruce Alberts, Molecular biology of the cell, Sixth 
edition.. New York, NY: Garland Science, Taylor and 
Francis Group, 2015. 
[16] S. Jackson, “Resilience Architecting,” in Architecting 
Resilient Systems, John Wiley & Sons, Inc., 2009, pp. 
159–186. 
[17] D. Ghosh, R. Sharman, H. Raghav Rao, and S. 
Upadhyaya, “Self-healing systems — survey and 
synthesis,” Decis. Support Syst., vol. 42, no. 4, pp. 
2164–2185, Jan. 2007. 
[18] H. Psaier and S. Dustdar, “A survey on self-healing 
systems: approaches and systems,” Computing, vol. 
91, no. 1, pp. 43–73, Jan. 2011. 
[19] D. Mange, E. Sanchez, A. Stauffer, G. Tempesti, P. 
Marchal, and C. Piguet, “Embryonics: a new 
methodology for designing field-programmable gate 
arrays with self-repair and self-replicating properties,” 
10 
 
IEEE Trans. Very Large Scale Integr. VLSI Syst., vol. 
6, no. 3, pp. 387–399, Sep. 1998. 
[20] G. Tempesti, D. Mange, and A. Stauffer, “Bio-
inspired computing architectures: the embryonics 
approach,” in Computer Architecture for Machine 
Perception, 2005. CAMP 2005. Proceedings. Seventh 
International Workshop on, 2005, pp. 3–10. 
[21] Z. Zhang, Y. Wang, S. Yang, R. Yao, and J. Cui, “The 
research of self-repairing digital circuit based on 
embryonic cellular array,” Neural Comput. Appl., vol. 
17, no. 2, pp. 145–151, Mar. 2008. 
[22] S. S. Khairullah and C. R. Elks, “A Bio-Inspired, Self-
Healing, Resilient Architecture for Digital 
Instrumentation and Control Systems and Embedded 
Devices,” Nucl. Technol., vol. 202, no. 2–3, pp. 141–
152, Jun. 2018. 
[23] Khairullah S.S., Bakker, T., and Elks C.R., “Toward 
biologically inspired self-healing digital embedded 
devices: Bio-SymPLe,” presented at the 10th 
International Topical Meeting on Nuclear Plant 
Instrumentation, Control, and Human Machine 
Interface Technologies, San Francisco, CA., 2017. 
[24] I. Polian, J. P. Hayes, S. M. Reddy, and B. Becker, 
“Modeling and Mitigating Transient Errors in Logic 
Circuits,” IEEE Trans. Dependable Secure Comput., 
vol. 8, no. 4, pp. 537–547, Jul. 2011. 
[25] F. L. Kastensmidt, Fault-tolerance techniques for 
SRAM-based FPGAs. Dordrecht: Springer, 2006. 
[26] S. Romanov, “The Future of Railway Interlocking | 
Prover - Engineering a Safer World,” Prover. [Online]. 
Available: https://www.prover.com/. [Accessed: 08-
Dec-2017]. 
[27] A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and 
Y. Zhu, “Bounded Model Checking,” in Advances in 
Computers, vol. 58, Supplement C vols., Elsevier, 
2003, pp. 117–148. 
[28] M. Sheeran, S. Singh, and G. Stålmarck, “Checking 
Safety Properties Using Induction and a SAT-Solver,” 
in International Conference on Formal Methods in 
Computer-Aided Design, Berlin, Heidelberg., 2000, 
pp. 127–144. 
[29] P. K. Lala and B. K. Kumar, “An architecture for self-
healing digital systems,” in Proceedings of the Eighth 
IEEE International On-Line Testing Workshop 
(IOLTW 2002), 2002, pp. 3–7. 
[30] Z. Qingqi, Q. Yanling, L. Yue, W. Nantian, and L. 
Tingpeng, “Embryonic electronics: State of the art 
and future perspective,” in Electronic Measurement & 
Instruments (ICEMI), 2013 IEEE 11th International 
Conference on, 2013, vol. 1, pp. 140–146. 
 
 
 
