Introduction
Parity checking was the first error detection method used in digital computers [Bloc48] , [GarnSE] . Recently, the use of parity checks in conjunction with other hardware schemes such as checkpointing, and software techniques like retry, has gained new momentum [PaulOZ] . Unfortunately, however, nonpreservation of parity during arithmetic operations makes it necessary to strip the parity bit before, and to restore it after, such operations. This either leaves the arithmetic parr unprotected or else necessitates using complex, self-checking code conveners [Rao74] , [PujiSS] . [ParhOO] , [LalaOl] (see Fig. la ). An altemative is to use a parity prediction circuit ( Fig. lb We thus consider a three-step methodology for checking of arithmetic operations on parity-encoded data ( Fig. IC): 1. Convening even-parity 2's-complement input numbers to intermediate even-parity redundant representations.
Performing arithmetic operations on redundant operands
in such a way that the even parities are preserved. 3. Converting the even-parity redundant final results to even-parity standard 2's-complement output numbers.
To the extent possible, each step is carried out via local transformations [Thor97] . This is important not only for fault isolation, and hence greater fault tolerance, but also for high performance through parallel processing.
In this paper, we review theoretical results that enable us lo perform the aforementioned steps, illustrate how each step might he implemented in hardware, and demonstrate some applications in the design parity-checked adderkubtractors and multipliers. We then briefly discuss how our method' can be extended to checking of more complex, arilhmetic operations, how checksums can be used in lieu of single parity bits, and how parity checking can be combined with other redundancy methods in hybrid arrangements for greater fault tolerance.
. . 
Paritv-
k - _ _ _ _~ f -U k
Even-Parity BSD Encoding
Low-overhead parity checking for arithmetic circuits is made possible by the observation that certain redundant representations possess enough redundancy to allow the generation of the output results with specified parities. Thornton [Thor97] exploited this property to derive a BSD adder design that always produces an even-parity output. This is made possible by a special encoding of the radix-2 BSD digit set (-1, 0, 1). where 0 is assigned two codes with odd and even parities ( Table 1) . This encoding allows contiguous BSD digits, comprising radix-4 digits in [-3, 31 , to be represented in 4 bits with even parity. In other words, the BSD adder can be designed to produce pairs of output digits by means of the encoding of Table 2 . 
Input Conversion
We begin by showing how a 2's-complement input number x = ( X~. , X , .~. X , X~)~~~, of even width k, can be converted to BSD format with the same parity. That is, if an evenhdd number of the input bits x, are is, then an evenlodd number of the yj and z, bits, resulting from encoding the equivalent BSD number according to Table I , will be Is. Dividing the number into W2 radix-4 digits, we can encode the resulting radix-4 digits separately, as shown in Fig. 2 . For our input conversion process. we use Table 3 to encode the 2 MSBs of the 2's-complement number, one of which is negatively weighted, and Table 4 for all other positions. If the 2's-complement input x comes with the parity bit p.
09
we can accommodate p in the encoding of the two MSBs (Fig. 3) . Table 5 shows the encoding of x,-, and x,, so that the parity of the 4 bits y,, zk., yk-> z ,, is the same as that of the 3 hits p x,., x , .~ In this case, there are four coupled don't-care entries and 16 possibilities;
leads to the simplest circuit implementation which is also acceptable in light of fault tolerance considerations.
... 
Output Conversion
Consider a k-digit BSD number (k even), with its digits in positions 2i and 2i + 1 encoded per Table 2 ; i.e.. with even parity for each 4-bit group in the 2k-bit encoding. A simple way of converting a BSD number to 2's-complement is through separating the positive and negative components of the operand and performing subtraction.
So, conversion from BSD to 2's complement can be performed via parity-checked addition [Fuji81].
[Nico93], provided that both the positive and negative components are generated with associated even-parity bits. These parity bits, p*(x') and p,(x3, are derivable from the Zk-bit BSD representation of x by noting that x* has a 1 in positions where y,' z, = I, while (x-)' has a 1 wherever y,' v z: = I.
Because k is assumed to be even, the parity of the number of Os in x' and x-can be determined instead. This strategy makes the inputs supplied to the parity generation trees complements of those used in forming x' and x-, thus improving fault tolerance by avoiding single-point failures and making it less likely that unidirectional multiple errors produce compensating bit inversions.
Parity prediction for an adder's sum output is based on the fact that each carry ci = 1 inverts the parity of the sum bit s, which in lhe absence of a carry, would have the same parity as (x')~ Q (x-);. If p. is the even-parity function, we have: 
Two-Operand Addition
Even though Thornton's adder design, discussed earlier, was offered as an even-parity output adder, it is clear that the adder is in fact a parity-preserving arithmetic block if supplied with even-parity inputs. Both Thomton's original encoding and the complemented one used here also allow for checking of subtraction, given that negation of a BSD number through inverting bit y in the encoding of Table 1 preserves an operand's parity.
For completeness, we present an overview of the paritypreserving BSD adder. In general. restricting transfers to adjacent digit positions leads to a three-stage hardware circuit for BSD addition [ParhOO], the result being that the sum digit s, is formed as a function of three pairs of input operand digits ut.>, v,.~, U ,.,, v c,, U,, vi (Fig. 5 ) . The number of signal lines between blocks and the internal block designs vary according to the addition algorithm chosen, leading to tradeoffs between interconnection and circuit complexities.
To preserve parity, pairs of adjacent blocks in stage 3 must be merged in view of the coupled generation of output digit pairs with desired parity. Thus, excessive complexity and delay are avoided if we minimize the number of signal lines between stages 2 and 3. The particular algorithm chosen by
Thornton satisfies this requirement.
Conceptually, the addition algorithm outlined above might be described in terms of two stages of radix4 operations proceeding from a position sum in [-6, 61 
Multioperand Addition
Multioperand addition can be performed by repeated use of the two-operand adder discussed in Section 5. provided that the parities of output digit pairs are checked after each addition step, or a small number of steps, to ensure that fault effects remain localized. The parallel (combinational) version of the scheme above consists of a binary tree of two-operand adders. reducing the number of operands by a factor of 2 at each level (Fig. 6) . It is also possible to design parallel compressors [ParhOO] , similar to those for standard binary numbers, to reduce the number of operands to two before adding them in a parity-preserving adder. It is easily seen that three BSD digits cannot be compressed to two BSD digits in a parity-preserving manner. So, compressionbased approaches must involve more than three inputs.
In binary multioperand addition, a 2-bit slice of five numbers can be reduced to a 4-bit number via a circuit known as ( 5 , 5 ; 4)-counter. Figure 7a shows the function of such a unit in dot notation [ParhOO] . Figure 7b depicts the BSD counterpart to Fig. 7a . Here, each black-and-white "dot" represents a BSD digit with range of values [-1. I ] .
The sum of the five 2-digit BSD numbers, each of which has even parity by assumption, can be represented by two pairs of BSD digits, with each pair having even parity.
Thus, the even parity of input data is preserved. One way to realize the (5, 5; 4)-counter of Fig. 7b is to use two copies of the counter in Fig. 7a , one for the positive and another for the negative digit components. The outputs of these two counters are then supplied to a special encoding circuit that ensures even parities for output digit pairs.
It is natural to ask whether larger counters might be applicable here. Unfortunately, the next larger counters of the type shown in Fig. 7b . that is, with parity preservation feature, are impractically large and complex: (85. 85; 8) and (17, 17, 17, 17; 8) . These are also hard to design in a way that single faults always lead to isolated output errors.
Thus, we advocate the use of (5. 5; 4 ) parity-preserving counters in multiple levels for reducing larger dot matrices; Ztwo levels for 11, three levels for 26, and four levels for 65 inputs. It is also possible to reduce 7 BSD digits in the same column to a 3-digit BSD number, as shown in Fig. Sc , while preserving the input parity. However, this method is inferior to that based on Fig. 7b (Parh021. 
Multiplication
A BSD multiplier can be designed based on sequential shift-add algorithm or parallel tree reduction. In the case of the shift-add scheme, parity preservation is simpler if we perform the multiplication in radix 4, using the following standard recurrence for partial products:
sw"= (~* ' + y , x 4 '~) 4~~w i t h~~' = O a n d~~~' = p r o d u c t
Radix-4 shifting of BSD numbers encoded as in Table 2 preserves the even parity of digit pairs. The only additional complication over a standard radix-4 multiplier architecture is in the need for a parity-preserving doubling circuit to accommodate 2x and 3x multiples; as usual, 32 is formed as 2x + x at the outset or else avoided via digit recoding. The doubling circuit is a highly simplified carry-free BSD adder. Referring to Fig. 5 , we note that possible position sum values are in (-2, 0, 2). thus reducing the adder to its stage 3, with both bit-inputs at digit position i coming from position i -1. Note that doubling is a form of recoding to preserve the even parity of 4-bit groups after a one-position left shift. With radix-2 multiplication, a doubling or halving scheme must be used in lieu of simple left-or right-shifting of the cumulative partial product.
An alternative to slandard radix-4 multiplier architecture outlined above is using the compression scheme of Fig. 7b to combine the cumulative partial product and four other values, two of which are from the set (0, x) and the other two from (0, -XI. This will accommodate the five multiples 0, &r, and kZr, thus implying the need for a recoding circuit to avoid k3x. For tree or partial-tree multiplication, a binary tree of limited-carry adders [Parh90] or circuits based on the compression scheme depicted in Fig. 7b can be used.
Array multipliers are not as attractive for BSD operands as they are in binary arithmetic. given that the advantages of structural regularity and ease of pipelining are already provided by the much faster tree-based reduction scheme with limited-cany BSD adders.
Conclusion
Through redundant BSD representation with inherently even parity, arithmetic operations can be checked against fault induced errors with low circuit redundancy and virtually no added latency. except in the final conversion to nonredundant format. In case redundant representation is used for performance reasons anyway, even the latter overhead becomes insignificant. A similar parity scheme is applicable to carry-save numbers, thus providing an alternative to borrow-save (BSD) representation used in this paper. Whether this alternative leads to any speed-up or simplification remains to be established. The following extensions to this work are currently under investigation:
(1) Possible advatanges of carry-save over BSD encoding;
(2)Use of checksums, also known as generalized parity;
(3)Combination of parity, residue, and other check types; (4) Parity preservation with nonredundant representation.
Other Arithmetic Operations
More complex operations can be synthesized from adder and multiplier blocks or else handled directly by producing a number of component terms and combining them using multioperand addition. However, in OUT case, the relative benefit of merged implementation for complex operations is much less compared to its use with binary arithmetic, given the use of redundant, limited-carry arithmetic.
Arithmetic in a wide array of signal processing applications is dominated by addition and multiplication. Hence, techniques discussed in the preceding sections are adequate for designing parity-checked circuits for many applications. Radix-4 division, square-rooting, and CORDIC algorithms can be added to our list of operations with moderate effort.
Modifications to conventional radix-2 or radix-4 division, square-rooting, and other function-evaluation architectures parallel those of multiplication discussed in Section 7.
