Abstract: An in-depth study of RESO (recomputing with shifted operands) theory is conducted which leads to the extension of the theory so that efficient CED designs for two-dimensional array structures and complex functions can be achieved. Based on the enhanced version of RESO, a systematic design strategy has been developed to allow the designer to take advantage of knowledge of fault configurations.
Introduction
With the ever-increasing complexities of digital applications, the issue of reliability has become very important in today's VLSI designs. Reliability can be improved by sophisticated testing schemes to weed out faulty circuits [l, 21 . However, such ofline or static tests can identify permanent faults only, but not transient and intermittent faults. It is obvious that mechanisms for concurrent error diagnosis (CED) must be installed to detect and locate such faults before they cause undesirable results.
All CED schemes detect errors through conflicting results generated from operations on the same operands. CED can be achieved through space or time redundancy, or a space/time hybrid redundancy [3-71. A time redundancy CED technique, RESO (recomputing with shifted operands) [S-lo], has been developed specially for interative logic arrays (ILAs). Among existing CED schemes, RESO has the unique features of transient-fault detection and inherent fault location capabilities, and requires only a moderate increase in hardware. However, the issue of exactly how much redundancy is needed has not been addressed. In this paper, a systematic strategy for specifying the amount of redundancy based on the analysis of faults in the ILA is presented.
Enhanced version of RESO
Time redundancy employs only one single set of hardware to carry out the repeated operations. Since the same hardware is used, the repeated operation, in the presence of faults, is liable to produce the same erroneous result as that of the first step. To avoid this problem, the operands must be coded in the repeated cycle, and the result thus obtained must be decoded back to the appropriate form for meaningful comparison. Consider a time redundancy technique shown in Fig. 1 [SI. Let x be the input to the computation unitf, and let f(x) andf'(x) be the outputs with and without encodingdecoding operations, respectively. Two fundamental requirements must be satisfied in these operations. First, the coding function c must not interfere with the original function5 In other words, for a selected coding function c, there must exist a decoding function c-' such that
in the absence of faults. This is the concept of 'mappable correct output'. Secondly, for the purpose of fault detection, the coding operation c must transform the input operand(s) x in such a way that when subjected to the same faulty conditions, the output in the repeated step, though still erroneous will be different from the first step. This is the concept of 'disjoint error sets'. This concept is formalised into the theorem below. Theorem I : Let E, and E , be the sets of all possible erroneous outputs of fdx) and fdx) due to a fault F , respectively. Errors of the computational unitfdue to the fault condition F are detectable under the time redundancy technique if and only if E , n E , = 9.
Proof: An error is detectable if and only if fdx) = c-'CfF(c(x))) Zfdx). In other words, any possible output of the repeated step,fdx), must not be an element in E , , and any possible output,fJx), must not be an element in E , , i.e. E , n E , = 9. Q.E.D.
In RESO, left-shift and right-shift are chosen as the coding and decoding functions. Let k,, k , , . . . , k, be the number of bits to be left-shifted for each of the m input operands of the host ILA, and let r be the number of bits to be right-shifted in the result. The values of k j s and r are dependent on the function as well as the architecture of the host ILA. Thus, the design parameters of a RESO scheme for a specific ILA structure are specified by RESO ({k,, k , , . . . , k,}, r) instead of RESO -k.
An easy way to determine the appropriate number of bits to shift in RESO is to analyse the potential error sets E, and E, of the unshifted and shifted results, respectively. From the 'disjoint error sets' requirement in theorem 1 and the characteristics of most ILAs, the potential error set E, of the first unshifted step can be formulated as (Fig. 2 ) ~, = { f 2 ' q l q = l , . . . ,~} ,
where i is the minimum of the bit-slice index of fault modules, and u is the maximum error factor. The Error weights offaulty bit-slices in a single cluster in an ILA, maximum error factor reflects the integer value of the affected output bits due to the fault. For example, u = 3 for a faulty RCA (ripple carry adder) with a single fault [SI. More specifically, if the effects of a fault region in the ILA can be seen in two consecutive output bits, then u is equal to 2' -1, or 3.
In the recomputation step, the operands are leftshifted by r bits with respect to the oriqnal, unshifted result. Thus, the same faulty jth bit-slice now generates an output bit that is equivalent to the (i -r)th bit of the original, unshifted result. In other words, the weight of the jth bit-slice in this step must be reduced by 2' (or, right-shifted by r bits before comparison). Since the function performed by the ILA is the same in both steps, the corresponding potential error set E, then has the same attributes as E,, except that all the weights of the entries are reduced by 2'. Therefore the potential error set of the recomputation step is E , = { + 2 ' -' q l q = 1, ..., u}. (2) Note that in the original RESO scheme, the error sets E, and E, contain the case of q = 0 [9]. This definition is valid from a fault modelling point of view. However, from a functional error point of view, an output that contains zero error by definition is not an error, even though there may be cells at fault. Therefore, in our definition of error set, the q = 0 case is not included. This definition allows the concept development of 'error set disjointedness', i.e. once the error sets are made sure to be disjointed, then errors caused by any fault (permanent and intermittent alike) can be. detected under the enhanced RESO scheme. Theorem 2: Let r be the number of bits to be right-shifted in the recomputed result, Then r is determined by
where u is the maximum error factor. Proof: To assure disjointedness of the two error sets, E, and E,, the maximum element in E, must be less than the minimum element in E,, i.e., 2'-'u < 2', or r > Llog,(u)].
Q.E.D. Eqn. 3 indicates that the lower bound on the value of r can be derived from the worst error situation specified.
88
The kjs values can subsequently be determined from r. The relationships among the kj values, as well as between each kj and r, are function-dependent. In general, the relationships can be expressed in the following theorem.
Theorem 3: Let xis, i = 1, 2, . . ., m, be the m input operands of a computation unit J and ki be the number of bits to be left-shifted for the operand x i . Suppose that r is the number of bits to be right-shifted in the recomputation result, i.e.
2'f(x,, x , , ..., x,) =f(2k'x1, 2k1x2, ..., 2'-x,)
Iff is an addition unit, then there exists an equality relation among the parameters r and kis such that (5) (b) Iff is an multiplication unit, then there exists a summation relation among the parameters rand kpuch that
Proof: Iffis an addition unit, i.e. 
Similarly, iffis a multiplication unit, It is obvious that k, = k, = r is implemented for most bit-slice logical arrays, RCAs, and CLAs, and k , + k , = r is implemented for multipliers. Although theorem 3 considers functional units of addition and multiplication only, it can be extended to other functions [12] .
Q.E.D.
Systematic design strategy
In order to meet the time redundancy requirement of theorem 1, a new parameter r has been defined as a key parameter. The relationship between r and kis is also delineated in theorem 3. To assure full exploitation of the potential of r, a design strategy is developed as follows:
(a) based on the overhead tolerance and the reliability requirement, specify the extent of faulty stage-cells that can be tolerated in the target ILA (fault-allowance) (b) based on the fault-allowance specification from a, characterise the fault configuration in terms of the maximum error factor u (c) from the maximum error factor u, determine the minimum r value (d) determine the kj values for the m operands of the host ILA, based on (i) the architectural and functional constraints of the host ILA; and (ii) the goal of optimal overhead
In this design strategy, step a is application-dependent and must be worked out by both the users and the designers. Once the design decision is made, the other three steps can be followed. Steps c and d are just straightforward application of the theorems 3 and 4, respectively. In a realtime application environment, the targeted faults for CED are transient faults which are characterised by their purely random nature both in location and duration. As knowledge about such faults accumulates, it is possible to develop more cost-effective CED schemes, specific for various functions and structures, by taking advantage of such knowledge. In this respect, RESO presents a considerably large degree of flexibility in allowing such development. This, however, requires a deeper understanding of the interactions between the function, structure, and fault geometry of the system. An analysis technique that allows a more cost-effective implementation of RESO in 2D-ILA is presented and illustrated through the design example of a multiplieraccumulator (MAC).
Design example
The structure of an MAC is an adder (ID-ILA) placed directly in serial with a multiplier (2D-ILA), with fulladders (FAs) as the primitive cells of the ILAs. The multiplier design in this example is based on the Baugh-Wooley algorithm (BWM) [ll] . In order to design a MAC with CED capability using the enhanced RESO technique, fault analysis, fault coverage, and design implementation are discussed.
Fault analysis
To simplify our discussion, various fault configurations are assumed to be in a single cluster. The diagnostic condition for the case of multiple clusters will be considered in theorem 5. The possible configurations of faulty FAs in a MAC can be covered by a few regular geometries. In fact, for a BWM/RCA MAC, six most common regular fault geometries are sufficient. As shown in Fig. 3 , they are . .
I -.
. . Type 6 : multiple adjacent faulty FAs in adjacent bitslices, in a rectangular configuration
The effects of these geometries are characterised by the maximum error factor u which is used to derive the lower bound on r. The maximum error factor for each type is derived as follows. Theorem 4 : Let u be the maximum error factor for the potential error set, then u is given below according to the geometry of faulty modules, Type 1 : u = 3
Type 2:
Type 3: u = 3(2b -1) Type 4: u = 2b+' -1
where s is the number of faulty stages in a single bit-slice and b is the number of faulty bit-slices. Proof: For type 1 fault geometry, the maximum error occurs when both the sum (2') and the carry-out (2'+') are erroneous; so, the maximum error factor is 2t+ 2ifl = Tu, or u = 3. For type 2, since the erroneous input to a faulty FA cannot change the fact that such FA can produce erroneous output, the ill-effect of the sum of each faulty FA, except the lowest one, will carry no real weight. In other words, there is only one final erroneous output for each faulty bit-slice. But each faulty stage-cell in that faulty bit-slice can contribute a carry-out error with weight of 2'" to its left neighbouring cell. Therefore, the maximum error is dependent on all the carry-outs and only one sum, and is 2' + 2"'s = 2'u, or u = 1 + 2s.
For type 3, since none of the output of a faulty FA is fed into another faulty FA, all the sums and carry-outs must be counted, giving a maximum error of
where b is the number of faulty bit-slices involved.
Type 4 is the converse of type 2. In this case, all the erroneous carry-outs, except that of the lowest (leftmost) faulty FA, are absorbed by the intermediate faulty stagecells, and thus have no real weight. Therefore, the maximum error is 2'{(2O + 2' + . . + 2b-') + 2b} = 2'14,
The maximum error factor for type 5 can follow the similar arguments as in the previous cases. The only outputs with true weights are all the carry-outs from the leftmost faulty bit-slice, and the sums from all the lowest faulty FAs; so the maximum error is {2bs + (2O + 2' + . ' . + 2b-')}2' = 2'u, or u = 2'(s + 1) -1.
Type 6 is the most severe one, since all the carry-outs from the left-most faulty bit-slice and the carry-outs and sums from the lowest FAs must be counted; the maximum error is {2'(s -1) + (2' + 2')(2' + 2l + . . . + 2b-')j2' = 2'u, or u = 2b(s + 2) -3.
Note that the faulty geometry discussed in [9] is of type 1 and thus its maximum error factor is 3. The results of theorem 4 can be further generalised by introducing a configuration factor g. A general equation for determining the maximum error factor of a MAC is concluded in the following corollary. 
where s and b are defined in theorem 4, and the configuration factor g is defined as follows: g = 1 (2) for rhombic (rectangular) area.
90
The corresponding b, s, and g values for the six faulty geometries are summarised in Table 1 . It can be seen that types 3 and 4 are special cases of types 5 and 6, respectively, while types 1 and 2 can be treated as either type 5 or type 6. 
Fault coverage
With eqns. 3-5, the appropriate value of r can be determined for any specified fault areas, including nonadjacent ones. In practice, it is desirable to know the maximum coverage of a given r value in terms of the basic faulty FA configurations. The coverage in terms of b and s for the first seven I values is tabulated in Table 2 , and the results are plotted in Fig. 4 . Note that the range of values considered here is by no means exhaustive; however, additional information can always be extracted from eqns. 3-5. -2  1  1  1  3  1  3  3  2  -1  4  1  7  7  2  2  3  3  -1  5  1  15  15  2  6  7  3  2  3  4  -1  6  1  31  31  2  14  15  3  6   7   4  2  3  5  -1  7  1  63  63  2  30  31  3  14  15  4  6  7  5  2  3  6 -1
Applying theorem 1, it is obvious that the choice of r must satisfy the error set disjointedness requirement for error detection, is. the effect of the faulty bit-slices on the two computation steps must be different. In the above discussion of fault configurations, single cluster of faulty modules was assumed. However, once the r value has been fixed, additional information about the covered faulty configurations is possible. Specifically, the following criteria for diagnosability in RES0 ({kl, k,, . . . , k,,,) , r) can be deduced :
(a) the hazard zone, or the number of bit-slices affected by the worst faulty FA configurations encountered, must be at most r bit-slices in extent (b) in the recomputation cycle, a safety zone, free from any faulty effects, must exist for the r bit-slices that were exposed to the hazard zone in the first cycle (c) the safety zone must also be of the same extent as the hazard zone In other words, RESO ({kl, k , , . . . , k,}, r) is vulnerable not only to those faulty FA configurations beyond its design target, but also to other fault configurations appearing within another r bit-slices of the worst configuration targeted. Proof: In order for any errors to be detectable, it is required to have, at least, a safety zone between any two hazard zones. If all faulty FAs are confined in a single region, then the errors are detectable. On the other hand, if there exist two or more hazard zones, then the robust distance is required at least 2r -1 (the width of the interposing safety zone plus the hazard zone minus one.) Q.E.D.
Theorem 5 describes the robustness of a RESO ({kl, k , , . . . , k,,,}, r) against its targeted fault configurations. Since the minimum values of r for Logical AND arrays, RCAs, and BWMs with a single fault are 1, 2, and 2, respectively; thus, by theorem 5, the robust-distance is 1 for Logical AND arrays and 3 for both RCAs and BWMs.
Implementation
For the n x n x (2n + 2)-bit MAC design to compute A x B x C, where k,, k, and k , are the numbers of shifted bits of A, B and C, respectively. By theorem 4, RESO ({k,, k,, k3}, 2) can be achieved through RESO ((1, 1, 2}, 2), RESO ((2, 0, 2}, 2) , or RESO ((0, 2, 2}, 2). Note that with {r, 0, r} configuration, one of the operands need not shift, and thus save the area of one multiplexer. The implementation of the RESO scheme for BWM is shown in Fig. 5 . In the first step the additional RESO-bits of the shiftable operand are the most significant bits. Because of the two's complement calculation, they must be sign-extended. In the recomputation step, however, they are the two least significant bits and are simply zero-filled (Fig, 5) . Both sign extension and zerofilled operation can be combined with the shift operation and hardwired through a multiplexer with negligible performance penalty.
The design of an n x n x (2n + 2)-bit MAC using RESO ((2, 0, 2}, 2) is illustrated in Fig. 6 . Incidentally, with RESO ({r, 0, r}, r), since one of the operands does not have to be shifted, the burden on the auxiliary registers will be less.
5

Conclusions
In this paper, an efficient and optimal design of iterative logic arrays with concurrent error diagnosis (CED) capability has been presented. The CED capability is achieved by the time redundancy approach. The concepts of 'error set disjointedness' and 'correct output mappability' have been established in this study as the fundamental properties of the coding and decoding functions in the time redundancy CED approach on a solid mathematical base. Using these concepts, the powerful time redundancy technique RESO has been significantly augmented. This includes the characterisation of the effects of faulty area geometries in terms of a maximum error factor, a parametrisation scheme for RESO, and a set of formulas that link these parameters to the physical fault configurations of elementary modules. Based on these results, a systematic design procedure for implementing RESO in an ILA has been developed. The proposed systematic method has been developed to transform a reliability requirement in terms of assumed fault configurations into a RESOdesign. As fabrication and testing technologies continue to improve, and as knowledge about faults in VLSI circuits further accumulates, this design strategy of implementing fault tolerance requirement in hardware at PE/ILA level appears to have a more promising future.
It should be noted that the need of CED is even more acute in computational systolic arrays. A systolic array is comprised of repeating regular computational cells, or processing elements (PES). These PES are interconnected in such a manner as to facilitate the implementation of pipelining and parallel-processing, as well as to minimise the 1/0 channels through localised data exchange. In certain computational systolic arrays, the expediency in data-processing results in under-utilisation of the hardware in any one time-step. With an appropriate scheme, those idle PES can be harnessed to achieve efficient CED.
As mentioned in Reference 13, the only drawback of RESO is the approximately 100% overhead in time, which is inherent to all the time-redundancy schemes. Judging from the VLSI performance measure of AT' (where A is the chip area and T is the operation cycle time), this is rather a high price to pay. However, the performance penalty associated with time redundancy can be absorbed by the inherent idleness of the PES in a systolic array. Because each PE performs the diagnosis independently, targeting CED at the PE/ILA level instead of today's more prevalent system/PE level can provide a more robust diagnosis capability [12] . This property leads to a future research direction of designing concurrent error diagnosable systolic arrays based on the proposed design strategy. 
