Fault tolerance in co-evolutionary communication of EHW modules  by Baleghi Damavandi, Yasser & Mohammadi, Karim
Computers and Mathematics with Applications 57 (2009) 1730–1735
Contents lists available at ScienceDirect
Computers and Mathematics with Applications
journal homepage: www.elsevier.com/locate/camwa
Fault tolerance in co-evolutionary communication of EHWmodules
Yasser Baleghi Damavandi ∗, Karim Mohammadi
Iran University of Science and Technology, Narmak, Tehran 16846, Iran
a r t i c l e i n f o
Keywords:
Fault tolerant design
Co-evolution
Emergent communication
Bio-inspired fault tolerance
Evolvable hardware
a b s t r a c t
Evolvable Hardware (EHW) is a new concept that applies evolutionary algorithms to
hardware design. Based on previous work on co-evolutionary communication of EHW
modules, this paper investigates the new feature of fault tolerance for this model. A
fault model is built for the communication line between EHW modules. The experiment
demonstrated in the presentation is the simulation of injecting stuck/bridging faults into
an EHW-based serial adder that has been previously developed. The outcomes imply an
outstanding feature of fault tolerance in this systemwith 100% fault coverage, which paves
the way for bio-inspired approaches to fault tolerant design instead of the classic ones.
© 2008 Elsevier Ltd. All rights reserved.
1. Introduction
Fault tolerance is the ability of electrical components to continue to function despite faults in the hardware. The goal of
this paper is to develop a system that is resistant to, or tolerant of permanent faults that may occur in the communication
lines, mostly between peripheral cards. The possible faults in this work include permanent stuck/bridging faults. The result
is a system that can suffer damage and recover without human intervention.
The approach to fault tolerance in this work is based on the concepts of Evolvable Hardware (EHW). EHW is a technique
that has led to some radical new designs for analog & digital circuits [1–6]. EHW applies special algorithms called
Evolutionary Algorithms (EA) to hardware that consists of programmable elements. In EHW, a researcher specifies the
desired output from the system and then the ‘‘EA’’ evolves a circuit on the programmable device that gives the desired
output.
This paper builds on several emerging concepts within EHW. The first concept is using EHW to provide fault tolerant
hardware. Other researchers have investigated the use of EHW techniques to provide robust hardware [7–10]. The second
concept is that the EHW, like any digital hardware, can be described by the Hardware Description Language (HDL). The
evolutionary process can be simulated as Genetic Programming (GP) of the HDL descriptors of the EHW. Thompson et al. [11]
addressed the development of such a sequential digital filter via automatically generated Verilog codes. The same approach
is utilized in this work with VHDL description for EHW to simulate the overall system. The third concept is that the EHW
modules can be used in a distributed system to co-evolutionarily enhance the system properties. This application can also
lead to the emergence of a protocol [12–18].
Based on the mentioned concepts, this paper reports the behavior of interconnected EHWmodules against the injected
faults. The simulated model was introduced by current authors [19]. The paper is organized as follows: The model of overall
system is introduced in Section 2. Section 3 describes the fault model for the communication lines between EHWmodules.
Based on thismodel the single stuck/bridging fault simulations are reported in Section 4. Considering these results the paper
concludes in Section 5.
∗ Corresponding author.
E-mail addresses: baleghi@ee.iust.ac.ir (Y. Baleghi Damavandi), mohammadi@iust.ac.ir (K. Mohammadi).
0898-1221/$ – see front matter© 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.camwa.2008.10.023
Y. Baleghi Damavandi, K. Mohammadi / Computers and Mathematics with Applications 57 (2009) 1730–1735 1731
Fig. 1. System diagram of an EHW-based, co-evolutionary serial-adder.
Fig. 2. The interface signals between the agents.
2. Systemmodel description
The model presented in [19] consists of two EHW modules. The only way for these two EHW cores to have the correct
performance of a serial adder is to communicate with each other via their I/O ports. They should understand it themselves,
to communicate. This was the goal of the primary simulation. For this purpose EHW1 is organized to perform arithmetic
operations and EHW2 treats the only control output signal: ready_out. Fig. 1 shows the overall diagram of the system,
where communication ports are fixed between the two agents. As Fig. 1 shows, this is a one-way communication in which
the ready_out signal should be high, only during the time that the output of the 8-bit serial adder is correctly available.
EHW1 has to make EHW2 aware of this state with two accessible wires.
In [19], the model was shown to be able to build up a self organized communication and evolve new hardware
configurations to reach the ideal performance. From the EHWpoint of viewwe are interested inwhat has happened between
the two agents. This would be available by tracking the signals in the I/O ports between the two EHWs. These signals (I1, I2)
are depicted together with clock and ready_out signals in Fig. 2. I1, I2 signals are generated by the sender (EHW1) sequential
circuit, while EHW2 ready_out is triggered by these signals.
After 11 generations (about 100 s) the system resulted in a valid ready_out signal rising exactly on the 9th rising edge
which is ideal for an 8 bit serial adder.
3. Fault model
In this section the idea of fault tolerance in the communication of EHW modules is put to test. For this purpose the
previously described model will face different types of faults in the communication lines (I1, I2). Assuming that the EHW
modules are two electronic boards that communicate with each other via a connector, the fault model can contain related
cases. A typical fault affecting thementioned connections is a short between two pins in the connector. Considering the type
of these pins, one of the following logical faults may happen.
1732 Y. Baleghi Damavandi, K. Mohammadi / Computers and Mathematics with Applications 57 (2009) 1730–1735
Fig. 3. The modeled bridging AND/OR, s_a_1/0 faults in EHWmodules communication.
Stuck at 1/0: A short between ground or power and the signal line can make the signal remain at a fixed voltage level.
The corresponding logical fault consists of the signal being stuck at a fixed logic value v (v ∈ 1, 0), and it is denoted by s-a-v.
Since there are some power and usually more ground pins in such a connection, a short between them and the signal lines
will cause s-a-1/0 faults and thus they are put into the fault model.
Bridging AND/OR: A short between two signal lines in the connector creates a new logic function. The logical fault
representing such a short is referred to as a bridging fault. According to the function introduced by a short we distinguish
between AND bridging faults and OR bridging faults. For example if the signals are derived by open collector gates, a short
between them will cause an AND function, where the ECL ones will lead to OR bridging faults. In many cases, the effect of
an open on a signal line with only one fan-out is to make the input that has become unconnected due to the open, assume
a constant logic value and hence appear as a stuck fault [20]. Considering a shielded, low length connection which brings
about negligible radiation noise and delay faults; the introducedmodel can covermany such kinds of physical faults thatmay
happen in the EHW communication line. Fig. 3 shows the modeled bridging AND/OR, s_a_1/0 faults respectively, occurring
in EHWmodules when put into the slot for communication.
4. Fault simulation
Fault simulation consists of simulating a circuit in the presence of faults. In classic fault simulation [20] the stage of fault
detection takes place in the test process. However in the bio-inspired approach to this work, the evolution mechanism is to
detect and recover the faults simultaneously. The mentioned model in Section 2 was simulated and resulted in 100 % fault
tolerant communication after the evolution time which was different for each fault. It is notable that the system has made
a different protocol and consequently different encoder/decoder hardware when encountering various faults.
4.1. S_a_1 fault simulation
With the same GA parameters and fitness function of [19], the system reached optimum performance in 44 generations
when I1 signal line was in s_a_1 fault condition. Fig. 4 illustrates the interface signals between the agents.
Fig. 4 implies that the I/O signals have been changed in comparison with what Fig. 2 shows. EHW2 turned to another
function to cope with the I1, s-a-1 situation. Ignoring the faulty I1, the ready_out signal is derived by an evolved NOT gate of
EHW2 which inverts I2 value.
4.2. S_a_0 fault simulation
Connecting I1 to ground, the s_a_0 fault of the model occurs. In the new circumstances the system recovered itself in 32
generations.
This time as Fig. 5 shows, EHW2 changed to a buffer where the I2 signal is transferred as the ready_out signal.
4.3. Bridging AND fault
In the third stage the bridging AND fault is injected. As Fig. 6 shows I1 and I2 made a new signal I1 AND I2 which is input
to EHW2.
Like Section 4.2 the system evolved to buffer the bridged signal for the ready_out port.
4.4. Bridging OR fault
The last fault from the fault model that is applied to the communication line of EHWmodules is the Bridging OR fault. In
this case I1 and I2 are connected to each other in such a way that an OR function is created.
Y. Baleghi Damavandi, K. Mohammadi / Computers and Mathematics with Applications 57 (2009) 1730–1735 1733
Fig. 4. The interface signals between agents in s_a_1 condition.
Fig. 5. The interface signals between agents in s_a_0 condition.
In Fig. 7 it is clear that EHW2 has inverted the bridged signal to produce the proper ready_out signal.
5. Conclusion & remarks
5.1. Conclusion
The results shown in Section 4 can be compared in Table 1. The generations parameter shows the number of generations
that it took for the system to evolve to optimumperformance. The Time row shows the same parameter in time on a Pentium
4 computer with 2.8 GHz CPU and 512 MB RAM. The logic function that is evolved in EHW2 comes in the next row and the
last parameter shows that the system was able to detect the determined fault in all tests.
1734 Y. Baleghi Damavandi, K. Mohammadi / Computers and Mathematics with Applications 57 (2009) 1730–1735
Fig. 6. The interface signals between agents in bridging AND fault.
Fig. 7. The interface signals between agents in bridging OR fault.
Y. Baleghi Damavandi, K. Mohammadi / Computers and Mathematics with Applications 57 (2009) 1730–1735 1735
Table 1
Fault simulation results.
Parameters Fault
s_a_1 s_a_0 Bridging AND Bridging OR Non-faulty
Generations 44 32 28 14 11
Time (s) 215 160 147 113 100
EHW2 function NOT BUFFER BUFFER NOT AND
Fault coverage 100% 100% 100% 100% —
The logic function of EHW1was too long to be shown in the table. The Co-evolutionary process, inwhich EHW1andEHW2
reconfigured to form new hardware to function well in a faulty environment made an ideal fault coverage. Considering the
main goal of the plan, which points to protocol-free communication between peripheral cards, the result can be interpreted
as follows: If some of the communication routs become damaged, the system may try other ways to perform, but at the
expense of more evolution time.
5.2. Remarks and future work
Most of the communication protocols in computer peripheral devices are not immune to the permanent faults that may
occur in the communication channel. For example one can assume a ‘‘stuck at 1’’ fault in one of the communication lines
of RS232, RS485, USB, I2C, which undoubtedly will lead to a failure. Highlighted by this paper’s simulation, a new way
can be introduced to tolerate permanent faults because this method is able to reconfigure the sender/receiver hardware
architecture so that it can overcome the permanent faults (as shown in Section 4).
Of course there are some limitations, for example in the model of Section 2, if there were less than 3 flip flops in the
structure of EHW1 the GA could never achieve optimum performance because it needs to count 8 sequences. Similarly in
multiple faults, if all communication lines are faulty, a failurewill occur. Another remarkable property is that there is no need
to follow the classic fault tolerant algorithms that contain fault detection, location and recovery respectively. For example
in classic fault detection a TPG is required, while the fault will be detected in the fitness evaluation process of EHWmodules
in this work.
Though solving the simple problem does not guarantee the generalization, it is an encouraging start. The next step is to
elaborate more complex problems. Testing the behavior of two processors in a faulty communication condition will give a
better sense of how useful the idea is.
References
[1] A. Thompson, I. Harvey, P. Husbands, The natural way to evolve hardware, in: Proc. IEEE Int. Symp. Circuits Syst., Atlanta, GA, USA, 1996, pp. 37-40.
[2] X. Yao, T. Higuchi, Promises and challenges of evolvable hardware, IEEE Transactions on Systems, Man and Cybernetics, Part C, Applications and
Reviews 29 (1) (1999) 87–97.
[3] T. Higuchi, M. Iwata, D. Keymeulen, H. Sakanashi, M. Murakawa, I. Kajitani, E. Takahashi, K. Toda, M. Salami, N. Kahihara, N. Otsu, Real-world
applications of analog and digital evolvable hardware, IEEE Transactions on Evolutionary Computation 3 (3) (1999) 220–235.
[4] J. Koza, M. Keane, M. Streeter, W. Mydlowec, J. Yu, G. Lanza, Genetic Programming IV: Routine Human-Competitive Machine Intelligence, Kluwer,
Norwell, MA, 2003.
[5] J. Hereford, C. Pruitt, Robust sensor systems using evolvable hardware, in: Proc. NASA/DoD Conf. Evolvable Hardware, Seattle, WA, 2004, pp. 161–168.
[6] J. Hereford, Fault-tolerant sensor systems using evolvable hardware, IEEE Transactions on Instrumentation and Measurement 55 (3) (2006) 846–853.
[7] C. Ortega, A. Tyrrell, Biologically inspired fault tolerant architectures for real-time control applications, Control and Engineering Practice 7 (5) (1999)
673–678.
[8] D. Keymeulen, A. Stoica, R. Zebulum, Y. Jin, V. Duong, Fault tolerant approaches based on evolvable hardware and using reconfigurable electronic
devices, in: Proc. IEEE Int. Integr. Reliab. Workshop, 2000, pp. 32–39.
[9] R.O. Canham, A. Tyrrell, Evolved fault tolerance in evolvable hardware, in: Proc. IEEE Congr. Evol. Comput., Honolulu, HI, 2002, pp. 1267–1271.
[10] R. Zebulum, D. Keymeulen, V. Duong, X. Guo,M. Ferguson, A. Stoica, Experimental results in evolutionary fault-recovery for field programmable analog
devices, in: Proc. NASA/DoD Conf. Evolvable Hardware, Chicago, IL, 2003, pp. 182–186.
[11] R. Thomson, T. Arslan, Evolvable hardware for the generation of sequential filter circuits, in: Proceedings of the 2002 NASA/DOD Conference on
Evolvable Hardware, EH’02, 2002, pp. 17–25.
[12] A. Cangelosi, Evolution of communication and language using signals, symbols, and words, IEEE Transactions On Evolutionary Computation 5 (2)
(2001) 93–101.
[13] N. Neubauer, Emergence in a multi-agent simulation of communicative behavior, Publications of the Institute of Cognitive Science 11 (2004).
[14] J. Gmytrasiewicz, M. Summers, D. Gopal, Toward automated evolution of agent communication languages, in: Proceedings of the 35th Annual Hawaii
International Conference on System Sciences (HICSS-35.02), 2002, pp. 10.
[15] J. Thangavelautham, D.T. Barfoot, G. Eleuterio, Coevolving communication and cooperation for lattice formation tasks, in: 7th European Conference
on Artificial Life Dortmund, Germany, 2003, pp. 14–17.
[16] B.J. MacLennan, The emergence of communication through synthetic evolution, Technical Report UT-CS-99-431, 1999.
[17] C.H. Yong, R. Miikkulainen, Cooperative co-evolution of multi-agent systems, Technical Report AI01-287, 2001.
[18] L. Perlovsky, J. Fontanari, Evolution of communication in a community of robots, in: 2005 IEEEWorkshop on Advanced Robotics and its Social Impacts,
2005, pp. 2–7.
[19] Y. Baleghi Damavandi, K. Mohammadi, Co-evolution for communication: An EHW approach, Journal of Universal Computer Science 13 (9) (2007)
1300–1308.
[20] M. Abramovici, M.A. Bruer, A.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press, 1990.
