Design diversity has long been used to protect redundant systems against common-mode failures. The conventional notion of diversity relies on "independent" generation of "different" implementations. This concept is qualitative and does not provide a basis to compare the reliabilities of two diverse systems. In this paper, for the first time, we present a metric to quantify diversity among several designs. Based on this metric, we derive analytical reliability models that show a simple relationship among design diversity, system failure rate, and mission time. We also perform availability analysis of redundant systems using our metric. In addition, we present simulation results to demonstrate the effectiveness of design diversity in duplex systems. For common-mode failures and design faults, there is a significant gain in using different implementations -however, as our analysis shows, the gain diminishes as the mission time increases. For independent multiple-module failures, we show that, mere use of different implementations does not always guarantee higher reliability compared to redundant systems with identical implementations -it is important to analyze the reliability of redundant systems using our metric. Our simulation results also demonstrate the usefulness of diversity in enhancing the self-testing properties of redundant systems. 
.1. Self-testing properties of diverse and non-diverse duplex systems 18
INTRODUCTION
The use of redundancy techniques for designing systems with high data-integrity and availability has been studied extensively [Siewiorek 92 ][Pradhan 96]. A duplex system in the form of a self-checking pair is an example of a classical redundancy scheme ( Fig. 1.1 ). As long as only one module fails, the system either produces correct results or indicates error situation. In a redundant system, common-mode failures (CMFs) result from failures that affect more than one module at the same time, generally due to a common cause. These include operational failures that may be due to external (such as EMI, power-supply disturbances and radiation) or internal causes. In addition to these common-mode failures, with the increasing complexity of the various designs, design mistakes are becoming very significant. It has been pointed out in [Avizienis 84 ], although the use of redundant copies of hardware has proven to be quite effective in the detection of physical faults and subsequent system recovery, design faults are reproduced when redundant copies are made. Simple replication fails to enhance the system reliability against design faults.
Design diversity has been proposed in the past to protect redundant systems against common-mode failures. In [Avizienis 84 ], design diversity was defined as the independent generation of two or more software or hardware elements (e.g., program modules, VLSI circuit masks, etc.) to satisfy a given requirement. Design diversity was also proposed in [Lala 94 ] as an avoidance technique against common-mode failures.
Design diversity has been applied to both software and hardware systems. In this paper, we address problems related to design diversity and examine their effects on the reliability of a redundant system. Some preliminary ideas related to this work were reported in [Saxena 98 ] and [Mitra 99 ]. Our main contributions are: (1) developing a metric to quantify diversity among several designs; and (2) using this metric to perform reliability and availability analysis of redundant systems. In Sec. 2, we introduce a design diversity metric and perform reliability and availability analysis of redundant systems using this metric. Section 3 presents some preliminaries related to the stuck-at fault model and illustrates our analysis with the help of an example. We present simulation results in Sec. 4. Section 5 examines the effect of design diversity on the selftesting properties of a duplex system. We present experimental results demonstrating the advantages of using design diversity in configurable systems in Sec. 6. Finally, we conclude in Sec. 7.
Design Diversity Metric And Reliability Analysis

D: A Design Diversity Metric
In this section, we introduce a metric to quantify diversity among several designs.
We define the metric for a system with two designs implementing the same function.
The metric has application in estimating the reliability of NMR systems with masking redundancy. Before defining the diversity metric, we first define the notion of diversity between two implementations with respect to a fault pair.
For two designs implementing the same function, the diversity with respect to a fault pair (f i , f j ), d i,j , is the probability that the designs do not produce identical error patterns, in response to a given input sequence, when f i and f j affect the first and the second implementations, respectively.
For a given fault model, the design diversity metric, D, between two designs is the expected value of the diversity with respect to different fault pairs. Mathematically, we
is the probability of the fault pair (
D is the probability that, in response to a given input sequence, the two implementations either produce error-free outputs or produce different error patterns on their outputs. For the first case, the duplex system will produce correct outputs. For the second case, the system will report a mismatch so that appropriate recovery actions can be taken.
However, for the third case, the system will produce an incorrect output without reporting a mismatch -thus, for the third case, the integrity of the system is lost due to the presence of faults in the two implementations. In the literature on fault-tolerance The above illustration of the design diversity metric can also be extended to sequential circuits and software programs. For small or medium-sized systems, the exact value of the diversity metric can be calculated manually or using computer programs.
For large systems, the value can be estimated by using simulation techniques.
For two identical implementations of the same function, a common-mode failure (e.g., a design mistake) can be modeled as the same fault f i affecting the two implementations. Let m be the number of input sequences for which these two implementations produce identical error patterns at the outputs. If the second implementation is different from the first, for any fault f j affecting the second implementation (and f i affecting the first implementation), we cannot have more than m input sequences that produce identical error patterns at the outputs of the two implementations. Hence, d i,i ≤ d i,j . This property is useful for enhancing the reliability of a redundant system against common-mode failures by using diversity.
Reliability Analysis
In this section, we calculate the reliability of duplex systems using the diversity metric described in Sec. 2.1. We define the reliability of a duplex system as the probability that the system is fault-secure. The reliability calculation is independent of whether the redundant components are exact replicas or different implementations. We assume a discrete time model for the system. In such a model, the time axis is broken up into discrete time cycles and we apply inputs and observe outputs only at cycle boundaries.
As shown in Fig. 2 .1, input combination (vector) v i is applied at the beginning of the i th cycle. Also, in Fig. 2 .1, the first system becomes faulty (f 1 ) during cycle i and the second system becomes faulty (f 2 ) during cycle j. Let p be the probability that a particular module is affected by a fault at any cycle. For simplicity, we assume that this probability p is the same for all the modules in the system at all times. The probability p can be looked upon as the failure rate per cycle. In Fig. 2 .2, faults f 1 and f 2 affect modules 1 and 2, simultaneously at cycle i. It may be argued that if a random fault appears in a particular module, then chances are high that the second fault will also appear in that same module. However, we do not assume any such correlation in this paper. At time 0, everything is fault-free. So, before cycle i the system will produce correct results. However, starting from time i, in each cycle, the system will produce correct results with the probability equal to d 1,2 . The probability s 1 (f 1 , f 2 , t) that the system is fault-secure up to time t, even in the presence of the two faults f 1 and f 2 , is given by:
The derivation of the above expression is shown in the appendix. Next, we consider the case where f 1 and f 2 appear at different cycles.
As discussed earlier, in Fig. 2 .1, Module 1 becomes faulty during cycle i and Module 2 becomes faulty during cycle j. It is clear that up to time j, a duplex system will be fault-secure. Hence, starting from time j, the system will be fault-secure with probability d 1,2 . Thus, the probability s 2 (f 1 , f 2 , t) that the system is fault-secure up to time t, in the presence of the two faults f 1 and f 2 is given by the following equation.
The derivation of the second case is also shown in the appendix. This case is more complicated than the first case and is useful when we consider random independent faults in multiple modules. We have:
Here s(f 1 , f 2 , t) is the probability that a duplex system is fault-secure up to time t, when Module 1 is affected by fault f 1 and Module 2 by fault f 2 .
We can characterize a duplex system using our diversity metric. In the following calculations, we assume that once a module becomes faulty, no other fault appears in that module. This assumption is simplistic and allows us to obtain closed-form reliability expressions. We calculate the probability that, up to time t, a duplex system is faultsecure. It is given by the following expression:
The above expression follows from the fact that, in a duplex system, when none of the modules fails the system produces correct outputs. When only one of the modules fails (due to single or multiple faults), the system is fault-secure. When both modules are faulty, then we have to consider the d 1,2 value for the fault pair (f 1 , f 2 ) in the two modules. P(f 1 , f 2 ) is the probability that faults f 1 and f 2 appear in modules 1 and 2, respectively.
Mission . Along the Y-axis, we
show the probability that the duplex system is fault-secure. The classical analysis of duplex systems is pessimistic since it assumes that the system ceases to be fault-secure when two modules are faulty.
The above expressions can be modified for common-mode failures (CMF). The probability that a duplex system is fault-secure against common-mode failures up to time t, is given by the following expression:
Here, p is the probability that a CMF affects the two modules. In the above expression, z(f 1 , f 2 , t) is given by the following formula:
The above expression is maximized when d 1,2 is of the order of (1-p). This suggests that, for a common-mode failure that can be modeled as fault pair (f 1 , f 2 ), we can obtain appreciable reliability improvement over classical systems when the value of d 1,2 is of the order of (1-p). The following observations can be derived from this relationship.
9. When the failure rate is high, even a small diversity can help enhance the system reliability over traditional replication. 10. If the failure rate is low, then d 1,2 must be extremely high for appreciable reliability improvement over classical systems. As a limiting case, consider the situation when the CMF failure rate is 0. In that case, diversity will not buy us any extra reliability against CMFs. In Fig. 2 .5, we show how the reliability improvement obtained from diversity depends on mission time. On the Y-axis of the graph in Fig. 2 .5, we plot the ratio of the following two quantities. 9. The probability that a duplex system is not fault-secure at time i, for a fault pair (f 1 , f 2 ) with d 1,2 = 1-10 -11
.
10. The probability that a duplex system is not fault-secure at time i, for fault pair (f 1 , f 2 ) with d 1,2 = 1-10 -12
. The failure rate per cycle is 10 -13
We call this ratio the gain. On the X-axis, we plot the mission time. As Fig. 2.5 shows, the gain diminishes with longer mission times. This analysis allows us to derive relationships between the reliability of a redundant system, the diversity incorporated to protect the system against common-mode failures and the mission time. The relationship between diversity and mission time can also be used to determine checkpoint intervals in a redundant system. For example, referring to Fig. 2 .5, we can checkpoint the state of the system when the gain is close to 1. Thus, our design diversity metric is a very fundamental property and can be used to understand different trade-offs associated with the design of dependable systems using redundancy.
Next, we estimate the error latency using our design diversity metric. Consider a duplex system with two implementations N 1 and N 2 of the same logic function. Let us suppose that the faults f 1 and f 2 affect the two implementations at cycle c. The error latency is defined to be the number of cycles from c after which both the implementations produce the same error pattern at the output. For more discussions on error latency, the reader is referred to [Shedletsky 76 ]. The probability that the error latency is t (t > 0) is given by: d 1,2 t −1 (1− d 1,2 ) . Here, the assumption is that d 1,2 value is strictly less than 1. If the d 1,2 value is equal to 1, then the error latency is always equal to T, the mission time.
The expected error latency is given by the following formula:
∑ From this expression, it is clear that for long mission times (i.e., large values of t), the probability value approaches to 0 when the d 1,2 value for the fault pair is less than 1. Thus, the fault pairs which have their d i,j values equal to 1 (i.e., the compensating fault pairs) play a dominant role in determining the error latency for long mission times.
Hence, the value of the expected error latency is determined by the percentage of compensating fault pairs. Simplification of the above expression produces the following expression for the expected latency of a duplex system in terms of the diversity metrics with respect to the different fault pairs.
Expected error latency
Consider the case of design mistakes that are special cases of common-mode failures. For these cases, the fault is always present. Simple analysis reveals that the probability that a duplex system is fault-secure up to time t, in the presence of design mistakes, is:
Thus, for design mistakes, for a given fault pair (f 1 , f 2 ), the more the value of d 1,2 , the more is the system reliability. This implies that, for design mistakes, diversity among the two implementations in a duplex system helps to increase the probability that the system is fault-secure.
While diversity in hardware designs is the main focus of this paper, the above 
Availability Analysis
In this section, we perform availability analysis of duplex systems with repair capabilities using our diversity metric. For the purpose of our analysis, we assume that p is the probability that a (common-mode) failure affects the system during a particular cycle. The failure can manifest as fault f 1 and f 2 affecting Module 1 and Module 2, respectively. In our analysis, we use the following quantities, as described below. The metric d 1,2 is the probability that the two modules do not produce the same error pattern (at their outputs) in response to a given input sequence, when they are affected by the faults f 1 and f 2 .
We define another quantity, t 1,2 , which is the probability that the two modules do not produce any error at their outputs in response to a given input sequence, when they are affected by the faults f 1 and f 2 .
The quantity d 1,2 -t 1,2 is the probability that the two modules will produce non-identical error patterns (at their outputs) in response to a given input sequence, when they are affected by the faults f 1 and f 2 .
The Markov chain used for our analysis is shown in Fig. 2 .6. In the Markov chain, the system starts at the Good state. As long as a fault does not appear, the system remains in the Good state. However, as soon as a fault appears, the system goes to the Faulty Correct state. The probability that both the modules produce correct outputs, in spite of the presence of the fault, is t 1,2 . The probability that the modules produce identical errors at their outputs is 1-d 1,2 . Thus, with probability d 1,2 -t 1,2 , the modules produce non-identical erroneous -this means that the presence of the fault is detected.
Once the fault is detected, the system enters the Repair state. We have assumed that the expected number of cycles required to repair the system is 1 m . For modeling the repair operation, we could as well use a repair rate. However, in the context of re-configurable systems, we can have bounds on repair time, which we can use during the above Markov analysis. The availability is given by the probability that the system is in the Good or the Faulty Correct state. In the following graph ( Fig. 2.7) , we show the dependence of availability on the values of d 1,2 and t 1,2 . This analysis implications on the usefulness of diversity for enhancing the self-testing property and hence, the availability of duplex systems. The analysis can be extended for other redundant systems (e.g., NMR systems). However, one of the systems (shown in Fig. 2.7) has the value of t 1,2 equal to d 1,2 and the other one has the t 1,2 value equal to half of d 1,2 . As can be seen in Fig. 2 .7, initially the system having t 1,2 = d 1,2 has a higher availability (since the probability that it stays in the Faulty Correct state is high). However, as time increases, the availability of the system with t 1,2 = 0.5*d 1,2 decreases at a much smaller rate compared to the system with t 1,2 = d 1,2 . This is because, for the system with d 1,2 equal to t 1,2 , there is no repair capability in contrast to the other system. We validate our observations using simulation data in Sec. 4. For simulation purposes, we used the stuck-at fault model. In the next section, we introduce the preliminaries related to the stuck-at fault model and illustrate the calculation of our diversity metric using an example.
Good
Example
Research in the area of digital testing and diagnosis of combinational and sequential logic circuits has demonstrated the effectiveness of the logical stuck-at fault model. In this model, the failures in a logic circuit behave as if as some lines in the circuit assume constant logical values, either 1 or 0, independent of the logic values on other lines of the circuit.
For the rest of this paper, we assume that all failures manifest as stuck-at faults in the circuit. We also assume that the failures are permanent; i.e., if a stuck-at fault shows up at some time instant t, then the fault remains at all time instants greater than t. For circuits made from SRAM-based FPGAs, unless we re-initialize the SRAMs (reload a given configuration), a transient fault in the configuration SRAM persists. Thus, the assumption of the permanent fault behavior is reasonable. 
A Simulation-Based Approach
As we noted earlier, it is difficult to model the entire complex system mathematically. Even with the stuck-at fault model, it is difficult to derive the exact reliability equation for the following reasons: 9. For a given pair of faults (f 1 , f 2 ), the calculation of d 1,2 is an NP-complete problem [Gary 79 ]. The problem is related to the NP-complete test generation problem.
10. If multiple stuck-at faults appear in the modules at different cycles, then the reliability expressions will become complicated. In fact, it may not be possible to obtain a closed form.
Hence, we developed a simulation environment to examine the reliability of a redundant system in the presence of multiple faulty modules. For generating different designs, we minimized the truth tables corresponding to some MCNC benchmark circuits (clip, inc, Z5xp1, apex4 and rd84) using espresso.
Then, we synthesized logic circuits after applying multi-level optimizations using the rugged script available in sis [Sentovich 92 ]. We subsequently mapped the multi-level logic circuits to the LSI Logic G-10p technology library [LSI 96]. Next, we complemented the outputs in the truth tables of the benchmark circuits to generate new truth tables. We used the same synthesis procedure for these new truth tables. Finally, we added inverters at the outputs of the new designs obtained. Table 4 .1 summarizes the characteristics of the different simulated designs.
In the fourth column of Table 4 .1, we report the number of candidate single stuck faults for the implementations of the circuits, obtained by synthesizing the given specification. The fifth column shows the number of candidate single stuck faults for the implementations of the circuits, obtained by synthesizing the given specifications with complemented outputs.
Simulation 1
For Simulation 1, for each benchmark circuit, we built duplex systems with identical and different implementations. For each of these systems, we performed 100,000 experiments. In each experiment, we randomly picked up a single stuck-at fault pair ( f 1 , f 2 ) such that the fault f 1 affects Module 1 and f 2 affects Module 2. We injected these faults into the modules, applied input patterns from a counter (with random seed) and calculated the error latency (the number of cycles after which the system ceases to be fault-secure). The expected error latency for the injected fault pairs is shown in Table   4 .2. We also calculated the percentage of fault pairs for which none of the two modules produced the same erroneous outputs at the same time (compensating fault pairs). These are the fault pairs (f 1 , f 2 ) that have d 1,2 equal to 1. As shown in Table 4 .2, a duplex system consisting of different implementations of the Z5xp1 circuit has a higher percentage of compensating fault pairs, compared to the non-diverse version -however, that is not generally true. For example, for the clip benchmark, the non-diverse duplex system has a higher percentage of compensating fault pairs. For compensating fault pairs, the error latency is strictly infinity -we assumed the value to be 10,000 cycles for our experiments. This is because, the number of inputs of the benchmark circuits under consideration lie between 7 and 9. Thus, the total number of input patterns is between 128 and 512. Note that, the expected error latency is dependent on the number of compensating fault pairs. This dependence of error latency on the number of compensating fault pairs has been explained earlier in Sec. 2.1.
In [Sakov 87 ], for a given combinational logic function, the fault detectability profiles for different implementations have been reported. Further studies are needed to synthesize circuit structures with high values of d i,j for different fault pairs. It has been proved in [To 73 ] that, for fanout-free combinational logic networks, all internal single stuck-at faults are either equivalent to or dominate single stuck-at faults on the primary inputs of the network. Thus, if we want to implement two diverse fanout-free networks implementing the same function, the d i,j values of the different fault pairs will be strongly dependent on the input combinations detecting the single stuck-at faults on network inputs and outputs. For both the networks, the set of patterns that detect the input or output stuck-at faults is independent of the network structure and is directly determined by the function the networks are implementing. Thus, chances are low that for fanoutfree networks and stuck-at faults, the diversity metric is going to achieve appreciable high values for networks synthesized in different ways, compared to simple replication. Thus, it appears to be important to focus on achieving diverse fanout structures of different networks to obtain high values of the diversity metric for fault pairs.
Simulation 2
Our previous simulation results mainly focused on independent faults in multiple modules of a duplex system. However, it has been observed in the literature [Avizienis 84 ] [Lala 94 ], that design diversity is useful for handling correlated failures and commonmode failures. Since we did not find any data on common-mode failure mechanisms, we performed the following sets of experiments to estimate the effect of diversity in the presence of common-mode failures. T, T  10  2  T, C  1711  3  C, C  14  clip  5  T, T  35  6  T, C  372  7  C, C  48  8  T, T  16  inc  9  T, C  1645  10  C, C  17  11  T, T  35  rd84 12 T, C 301 13 C, C 21
In a duplicated system with identical implementations, we can find a one-to-one correspondence between the leads of the two copies. Hence, for these duplicated systems, we injected fault pairs (f 1 , f 2 ) such that f 1 and f 2 affect lead i of Module 1 and Module 2, respectively. Note that, in the presence of f 1 and f 2 , the two modules behave exactly in the same way. Hence, they can be called common-mode faults. With this setup, we found the error latency for these common-mode faults. For duplex systems with different implementations, we cannot establish such a one-to-one correspondence between the leads of the two copies. Hence, for each fault f 1 in Module 1, we found the fault f 2 in Module 2 with the minimum value of d 1,2 using exhaustive simulation. Thus, for f 1 affecting Module 1, we have the least error latency when f 2 affects Module 2.
Hence, the fault pair (f 1 , f 2 ) is called the worst-case fault pair with the worst case latency.
Then we averaged the worst-case latencies over all the worst-case fault pairs -this number is reported in the fourth column of Table 4 .3.
The results in Table 4 .3 show a distinct advantage of using different implementations over non-diverse designs for common-mode faults. This is because, the worst case error latency of a common-mode fault in a duplex system with different implementations is at least an order of magnitude larger than the error latency of a common-mode fault in a duplex system with identical implementations.
In order to bring into perspective the significance of this increased error latency, we consider the execution of an application that uses the Z5xp1 circuit of Table 4 .3. If the mission time of the application is of the order of hundreds of cycles, then the system with two identical implementations will fail in the presence of CMFs. However, a system with two different implementations of Z5xp1 will be able to finish the task, on an average, in the presence of CMFs. Finally, if the mission time is of the order of thousands of cycles, then in the presence of CMFs, none of these systems will be able to finish the task successfully. This result can also be explained from the properties of the diversity metric discussed in Sec. 2.1. The relationship of this result with the CMF rate is explained in Sec. 2.2.
Suppose that we have a system for which the common-mode failures affect only the inputs. In such a scenario, the systems with different implementations that we considered are not diverse so far as the inputs are concerned. Thus, such systems do not provide no extra protection against the common-mode failures of interest (affecting only the inputs) compared to systems with identical implementations. This argument motivates research in developing common-mode fault models and designing redundant systems with sufficient diversity against the modeled common-mode faults.
Self-testing Property
In this section, we discuss the possible effects of having design diversity on the self-testing property of a duplicated system. A duplicated system is called self-testing with respect to a fault pair ( f 1 , f 2 ) (f 1 affecting Module 1 and f 2 affecting Module 2) if and only if, there exists an input combination for which the two modules produce different outputs in the presence of the faults.
For the purpose of the experiment, we assume that the failures show up as singlestuck faults in each of the two modules under consideration. The self-testing property ensures that, in the presence of failures that affect the two modules under consideration, we can detect the presence of the failures. This detection is important for the system to take corrective action and directly affects the system availability as shown in Sec. 2.3.
The fourth column of Table 5 .1 shows the number of non-self-testable fault pairs in duplex systems with identical and different implementations. It is clear from Table 5.1 that with different implementations it is possible to achieve high self-testing properties of the designs under consideration. In fact, an interesting synthesis problem is to synthesize two implementations of a given logic function such that the number of self-testable fault pairs is maximum. Figure   6 .1 shows the test-bed. used to map the designs on the test-bed. For each duplex system, we injected stuck-at faults in the lookup tables of the implementations and for each fault pair, we calculated the error latency of that fault pair. We picked the worst-case fault pairs (just like
Simulation 2) and plotted the cumulative distribution showing the percentage of worstcase fault pairs having the error latency less than or equal to a particular value.
In Fig. 6.2(a) we show the cumulative distribution of the worst-case error latencies for a duplex system with two identical implementations of the MCNC benchmark circuit cps.pla with 23 inputs. Note that, the X-axis is in the logarithmic scale. Figure 6 .2(b) shows a similar cumulative distribution for a duplex system with different implementations of the same logic function (cps.pla). The faults were injected by modifying the contents in the FPGA lookup tables. We also calculated the mean error latency and it can be seen that the mean error latency is at least an order of magnitude greater for diverse duplex systems.
The significance of the curves in Fig. 6 .2 can be explained with the help of the following example. Consider an application with a mission time of 10 6 cycles. For a system with identical implementations (Fig. 6.2(a) ), the data-integrity of the system will be compromised before the mission-time is reached for around 85% of the cases in the presence of CMFs. In other words, for only around 15% of the CMFs, the system is expected to successfully complete the task before data corruption occurs. In contrast, if
we use a duplex system with different implementations, then for around 65% of the cases, the system is expected to successfully finish the task before impacting data-integrity ( 
Conclusions
In this paper, we addressed the problem of design diversity in redundant (software or hardware) systems in order to handle common-mode failures and failures in multiple modules. In order to protect fault-tolerant systems against common-mode failures, design diversity has been used commercially. In the past, design diversity was defined to be "independent" generation of "different" designs. This notion of diversity is qualitative and has limitations because it does not provide any quantitative basis to compare reliabilities of different diverse systems. Hence, the need for a metric to quantify diversity between different systems has been expressed in the past.
In this paper, for the first time, we have introduced a metric to quantify diversity among different designs under a particular fault-model, and explained how to calculate the overall system reliability in terms of this metric. In our example of the calculation of diversity for combinational logic circuits (Sec. 2.1), we have assumed that all the input combinations are equally likely. In the absence of any information about the relative frequency of the different input combinations, this is a reasonable assumption. However, for a particular application, if we have information about the relative frequencies (in the form of input traces, for example), then we can appropriately modify the above expression to incorporate this extra information (by changing the weights associated with different input combinations).
We have also produced simulation for duplex system. Our theoretical and simulation results indicate that, in the presence of independent multiple module failures in redundant systems, mere use of different implementations does not guarantee higher reliability compared to redundant systems with identical implementations. It is more important to evaluate the reliability of the systems using our metric. On the other hand, for common-mode failures and design faults, there is a significant gain with different implementations. However, the gain decreases with increasing mission time. Our analysis technique can be used to derive relationships between system reliability, diversity, mission time and system failure rate and compare reliabilities of multiple diverse systems. These relationships can help understand the cost and reliability tradeoffs while designing redundant systems with diversity.
For common-mode failures, diverse systems have no worse reliability compared to replicated systems. However, there is a further need to characterize common-mode failure mechanisms in the circuit level. With a good CMF fault model, (logical or layoutlevel) synthesis techniques can be used to incorporate sufficient diversity to protect systems against the modeled faults.
Our simulation results demonstrate that diversity plays an important role in enhancing the self-testing property of duplex systems. This can prove to be useful if we can apply specific patterns to the system during idle cycles. Computing.
Acknowledgments
