Abstract-In this paper, a methodology for the development of fault-tolerant adders based on the Radix 2 Signed Digit (SD) representation is presented. The use of a number representation characterized by a carry propagation confined to neighbor digits implies interesting advantages in terms of error detection, fault localization, and repair. Errors caused by faults belonging to a considered stuck-at fault set can be detected by a parity-based technique. In fact, a carry-free adder preserving the parity of the augends can be implemented allowing fault detection by using a parity checker. Regarding fault localization, the "carry-free" property of the adder ensures the confinement of the error due to a permanent fault to only few digits. The detection of the faulty digit has been obtained by using a recomputation with shifted operands method. Finally, after the fault localization, graceful degradation of the system intended as the reduction of the performances versus a correct output computation can be obtained by using two different procedures. The first one allows obtaining the correct output by recomputing the result performing two different shift operations and using the intersection of the obtained results to recover the correct output, while the second one is based on a reduced dynamic range approach, which allows us to obtain the result in only one step, but with fewer output digits.
INTRODUCTION
T HE increasing scaling rate of the microelectronic technologies observed in recent years pushes for the use of fault-tolerant techniques, traditionally used in high reliability applications such as aerospace and avionics and also in commercial applications. For example, the effects of neutrons at sea levels when subnanometric microelectronics technologies are used seems to be not negligible. In fact, as semiconductor technology advances, the amount of charge that is stored in specific nodes in the circuitry continues to decline [1] . Many vendors, such as Xilinx, are starting to furnish special tools and architectures to face these problems in the FPGA case [2] . Due to these facts, since adders are essential building blocks in all data processing systems, the design of arithmetic structures with online error detection and correction capabilities represents an important research topic. In the literature, a number of selfchecking adder implementations have been proposed, such as based on residue codes [3] , [4] , parity codes, [5] , [6] , [7] , or Berger codes [8] . Other error detection techniques are based on recomputing with shifted [9] and/or rotated operands, as shown in [10] , [11] .
Other solutions based on carry-free self-checking adders have been proposed in the literature such as in [12] , [13] . In particular, in [12] , an inherent parity coding scheme to code the digits is proposed, while, in [13] , a one out of three scheme is investigated. Besides, not many works propose adders which provide combined error detection and correction capabilities. The most widely applied techniques to obtain error correction in adder circuits are based on time-redundancy [11] or on the residue number system representation [14] , [15] . The objective of this paper is the design of a self-checking adder architecture by combining parity checking techniques and SD number representation. The parity coding scheme proposed in this paper [16] allows implementing a selfchecking adder with fault localization and graceful degradation capabilities. The main idea is that, in SD representation, the carry propagation is limited only to the neighbor digits, allowing us to set up a procedure to locate the faulty digits by means of ad hoc algorithms. Moreover, the locality of the carry propagation allows us to obtain graceful degradation. The paper is organized as follows: In Section 2, a brief overview of the SD arithmetic is reported, while, in Section 3, the chosen digit coding is reported and discussed with respect to fault detection. In Section 4, the self-checking adder architecture is proposed, together with an analysis of its implementation overhead. Section 5 reports the procedure to obtain fault localization and shows the possible graceful degradation approaches. Finally, in Section 6, the conclusions are drawn.
BACKGROUND
The general theory and application of the SD representation is reported in [17] , [18] . In this section, a brief illustration of its basic theory is shown. In a radix r SD representation, a number x can be represented as
where the digit set is x i 2 fÀa; . . . ; À1; 0; 1; . . . ; ag, with d rÀ1 2 e a r À 1.
The original motivation for introducing SD representation was to eliminate the carry propagation chains in addition and subtraction [19] . In fact, given two operands x and y, the addition operation can be split into the two operations
where
with w i being an auxiliary variable. For r ¼ 2, there is only one possible digit set: fÀ1; 0; 1g, i.e., a must be equal to 1. Even if the condition a ! d rÀ1 2 e cannot be satisfied, the sum can still be performed without carry propagation by using the modified rules for radix 2 SD addition proposed in [20] and reported in Table 1 .
This representation allows using an architecture such as described in [17] , [16] to implement a carry-free adder. The main elements of the adder are the so-called blocks ADD1 and ADD2, where ADD1 implements (2), providing the intermediate outputs c i and w i , while ADD2 implements (3).
PARITY IN THE CARRY-FREE SUM
The radix-2 SD digit x i can assume three values, hence its binary representation requires two bits. A convenient coding choice is considering digit 0 represented by either bits 00 or 11 and digit 1 and -1 represented by bits 01 and 10, respectively. This coding has two advantages:
1. Conversion from binary to SD representation is straightforward as the LSB has the same value in both representations while the MSB is put to 0 in the conversion. 2. With the proposed coding, only the MSBs of a iÀ1 ; b iÀ1 are necessary and the ADD1 circuit can be implemented with 6 bits of input instead of 8. In fact, the function performed by ADD1 on a i ; b i needs to evaluate the sign of the operands a iÀ1 ; b iÀ1 in order to determine the outputs of ADD1. As in a conventional number representation [7] , with the SD representation, the parity properties of the arithmetic operations can also be used to check the correctness of arithmetic results. To define the parity of a digit, we refer to its binary coding. Thus we define the parity Pða i Þ as the XOR of the bits representing the digit a i and the parity of a SD number P(A) as the XOR of the parity Pða i Þ of all digits a i . Starting from these definitions for the parity of an SD number and assuming that P(C) and P(W) are the parities of carry and partial sum and P(Z) is the parity of the result, the following property holds:
In fact, evaluating the parity of w i and c iÀ1 with the chosen coding, we have that P ðz i Þ ¼ P ðw i Þ È P ðc iÀ1 Þ for any possible combination of w i and c iÀ1 [16] . Since the property is true for every z i , it is true also for Z. In fact, due to the associative property of the XOR operator, the following statement holds:
This demonstrates Property 1. With a similar procedure, the following Property 2 can be demonstrated [16] :
In fact, for any possible combination of a i and b i , we have
SELF-CHECKING IMPLEMENTATION OF THE SD ADDER
In this section, the self-checking implementation of the SD adder is presented. The implementation results have been obtained by using Synopsys Design Compiler [21] and the standard cell library provided by Mississippi State University [22] . The architecture of the self-checking adder is shown in Fig. 1 and it is composed of the following blocks:
1. Parity Prediction, generates the value of P ðCÞ; 2. Error Indicator 1, checks (6) and issues an error signal in case of a mismatch; (4) and issues an error signal in case of a mismatch; in addition to the standard ADD1 and ADD2 blocks used to implement an SD adder.
The Parity Prediction block is implemented by performing the XOR operation on all P ðc i Þ. The computation of c i (and then of its parity) depends on the value of six variables. The Boolean function P ðc i Þ ¼ f½a i ð1Þ; a i ð0Þ; b i ð1Þ; b i ð0Þ; a iÀ1 ð1Þ; b iÀ1 ð1Þ for computing P ðc i Þ has been obtained from Table 1 . To detect a fault using a parity checker, we must avoid having an erroneous result x i x i that has the same parity as the correct one x i . With the chosen coding scheme, this event occurs only if the result x i changes from -1 (01) to 1 (10) or vice versa. It can be noticed that the event of a change of the binary representation from 00 to 11 or vice versa does not introduce an error in the output value as 00 and 11 represent the same value 0. Moreover, with the chosen coding, the stuck-at in the input/output of a block can only change the value of the input digit from 0 to AE1 or vice versa. The assumption that at least one of the parities of W and C changes when a fault occurs allows us to detect the fault by implementing (4) and (6) . This assumption can be guaranteed with a suitable implementation of the blocks ADD1 and ADD2. In fact, the effect of faults inside the blocks ADD1 and ADD2 is strongly technology-dependent and, therefore, the standard cell implementation must be correctly analyzed to predict the behavior of these blocks in case of faults. First of all, it can be noticed that a fault inside an ADD1 or ADD2 block can modify more than one bit of the same digit and, in particular, can modify the value of a digit from -1 to 1 or vice versa. This case must be avoided because the parity of 1 and -1 is the same and, consequently, this kind of error cannot be detected by using a parity checker. This case occurs when a fault affects a logic cell with a fan-out greater than one.
In the literature [23] , [24] , different results on obtaining parity checking capability for circuits with logic resource sharing have been presented. Starting from these results, some simple considerations can be made. For the blocks ADD1 and ADD2, the resource sharing for the different bits of the same digit has been avoided. For the ADD1 block, this choice implies that the block must be split into two blocks, one providing the MSB of c i and w i , the other providing the LSB of c i and w i . The resource sharing is allowed inside each block, while it is not allowed between different blocks. With this choice, a fault inside one of the two blocks composing ADD1 can change the parity of c i , of w i , or of both the digits and therefore can always be detected by the parity checker. The ADD2 block provides only the digit z i ; therefore, it is synthesized as two independent blocks providing the MSB and the LSB of z i .
To evaluate the introduced area overhead, Synopsys Design Compiler has been used for the synthesis and the results are reported in Table 2 .
To avoid the occurrence of undetectable faults, ADD1 and ADD2 blocks have been synthesized by using partial [7] and assuming a Carry Lookahead structure as proposed in [25] is reported in Table 3 .
In this table, the first row reports the area occupation of the considered adders without self-checking capabilities as a function of the number of digits N, while the second row reports the required computation time. The area occupation of the self-checking SD adder increases linearly, while, in the carry lookahead case, it increases as N log 2 N; therefore, for a high number of digits (N > 16) the area occupation of the two solutions becomes comparable, while the timing performances of the SD adder are always better.
FAULT LOCALIZATION PROCEDURE
In this section, the fault localization procedure for the selfchecking SD adder is illustrated. It is based on the recomputation with shifted operands method presented in [9] . To improve the clarity of the exposition without loss of generality, we take as an example an 8-digit adder. To correctly localize the faults inside the SD adder, we must divide the faults into three types, depending on the number of outputs affected by the fault. We remark that this classification depends on the implementation described in the previous section. In fact, each fault can affect a different number of outputs, depending on the location of the fault and on the paths from the failure point to the output ports.
1. Stuck-at fault on the ADD2 output or in one ADD1 output: The error produced by the fault is limited only to the bits of the binary representation of z i . 2. Stuck-at in the LSB of an input digit: The fault is limited to only one ADD1 block and leads to the modification of the parity of both w i and c i with respect to the correct ones. 3. Stuck-at in the MSB of an input digit: A fault on an input of ADD1 can modify the value of four output digits. A stuck-at on a line of a i can change the value of both w i , c i , and w iþ1 c iþ1 . Regarding the bits of weight i þ 1, it can be seen that the only changes that can be introduced from an error on bits of the lower level are the modification of the outputs (w iþ1 ; c iþ1 ) from (-1, 0) to (1, -1) and vice versa or from (1, 0) to (-1, 1) and vice versa. In all these cases, the parity value of w iþ1 does not change as P(1) = P(-1). Therefore, the parity of the erroneous value P ð W W Þ depends only on the parity P ð w i w i Þ of the faulty digit w i w i . We define Z as the correct output and Z as the faulty output (i.e., the output when an error indicator signal is active) and define Z LS , Z RS as the correct outputs obtained by using the Left and Right Shifted Inputs (LSI and RSI), respectively. Finally, Z LS , Z RS are the outputs obtained with the shifted operands when an error indicator signal is active. It must be noticed that the shifted inputs can activate the error detection again or not depending on the occurred fault. The digits composing the operands are referred to with the corresponding lowercase letter. The equality relation zðiÞ ¼ z LS ði þ 1Þ is valid for 0 i 6, while a similar relation zðiÞ ¼ z RS ði À 1Þ is valid for 3 i 8. For the left shifted output, the equality is not valid for zð7Þ and zð8Þ because the most significant digits (að7Þ and bð7Þ) are lost with the shift operation. For the right shifted output, the equality is not valid for zð0Þ and zð1Þ and zð2Þ because the less significant digits (að0Þ and bð0Þ) are lost and their carries can also be lost. As stated before, the reported procedures are possible because of the carry-free features of the SD adders. Once a parity error is detected, the operation is performed again with the LSI and two different cases can be considered:
1. The parity is correct, i.e., the output is Z LS (CASE A). 2. The parity is wrong, i.e., the output is Z LS (CASE B). CASE A: If the error indicator does not indicate the occurrence of the fault, the new computed value Z LS and the old wrong value Z satisfy the following relation: For 0 j 6, z LS ðj þ 1Þ ¼ zðjÞ ¼ zðjÞ for any digit that is not affected by the fault. Instead, for all the faulty digits i, we have z LS ði þ 1Þ ¼ zðiÞ 6 ¼ zðiÞ. This relation allows both locating the faulty digits and correcting the output values. The number of inequalities depends on the location of the fault which caused the error. In particular, a type 1 fault corresponds to one inequality, a type 2 corresponds to two inequalities, and a type 3 to three inequalities. If no difference is found between the Z LS and Z for 0 j 6, then the fault can be localized in the two most significant digits (zð8Þ; zð7Þ). To localize and correct these faults, the operation is performed again with RSI and the digits of Z RS and the digits of Z will satisfy the following relation: For 3 j 8, z RS ðj À 1Þ ¼ zðjÞ ¼ zðjÞ for any digit that is not affected by the fault. Instead, for all the faulty digits i, we have z RS ði À 1Þ ¼ zðiÞ 6 ¼ zðiÞ. Thus, due to the procedure followed, we can assume that an error must be detected in the second shift operation (RSI). If, again, no difference is detected, the checker is assumed to be faulty.
CASE B: If the error indicator indicates the occurrence of the fault also in Z LS , then the localization procedure is as follows: For 0 j 6, z LS ðj þ 1Þ ¼ zðjÞ ¼ zðjÞ ¼ z LS ðj þ 1Þ holds for any digit that is not affected by the fault. Instead, the digits affected by the fault produce some inequalities between z and z LS . These inequalities can be generated by both an error in the computation of the original result and an error in the recomputed result. Therefore, a minimum of two and a maximum of four inequalities show up, depending on the considered type of fault of the above defined set. In particular, a type 1 fault corresponds to two inequalities, a type 2 corresponds to three inequalities, and a type 3 to four inequalities. The fault can be localized as the first different digit found by performing the comparison between Z and Z LS and the type of fault can be detected by counting the number of inequalities. As in case A, if the faulty digit is the seventh or eighth or the fault is localized in the checker, no difference between Z and Z LS is detected. Recomputing with RSI allows us to detect if the fault affects the remaining digits or the checker. The procedure which has been developed can also detect the faults occurring on the error indicator blocks. In fact, if an error is detected by the parity checkers, but no inequality has been observed between the original computed value and the values obtained with both RSI and LSI, then the fault affects the checker itself. Thus, the outputs are correct, but the selfchecking capability has been lost. The algorithm of fault detection, localization, and correction is summarized in the graph reported in Fig. 2 .
For clarity of exposition, the algorithm does not report the border cases in which the inequality of the results affects the digit in position 6. In this case, the wrong digit could be only 6 (type 1 fault) or 6, 7 (type 2), or 6, 7, 8 (type 3) . This case requires the further diagnosis step which uses the RSIs. The graceful degradation capabilities of the adder are related to two main aspects of this algorithm: First of all, the algorithm always allows us to detect and localize the fault; in many cases, the correct output can be obtained by accepting a performance degradation due to the time needed to the repetition of the operation with LSI and (if needed) RSI, as shown in Fig. 2 . Moreover, even in those cases when it is not possible to obtain the correct output from Z RS and Z LS , the fault localization allows us to use the adder with a reduced dynamic range. In fact, assuming that n fd 2 f1; 2; 3g faulty digits are detected in an eight digits adder, it can still be used as an ð8 À n fd Þ digits adder applying suitable modifications to the input vectors A and B. For instance, if the digits f3; 4; 5g of the adder are faulty, the five digit input vectors starting from the eight digit inputs should be set up as A ¼ fað4Þ; að3Þ; að2Þ; 0; 0; að2Þ; að1Þ; að0Þg and B ¼ fbð4Þ; bð3Þ; bð2Þ; 0; 0; bð2Þ; bð1Þ; bð0Þg; while the output with reduced dynamic is: Z ¼ fzð5Þ; zð4Þ; zð3Þ; À; À; À; zð2Þ; zð1Þ; zð0Þg The reported example is related to a type 3 fault in the i ¼ 3 digit. The digits f0; 2g and f6; 8g are not affected by the fault, while, in order to correctly compute the zð3Þ output, the digits að2Þ and bð2Þ are repeated in position 5 to provide the correct carry to the ADD1 block in position 6.
CONCLUSIONS
This paper proposes a methodology for developing faulttolerant adders by using the SD number representation. A self-checking implementation of the SD adder is illustrated and the algorithms to achieve error correction, fault localization, and graceful degradation are proposed. The main idea is to take advantage of the confined carry propagation in SD adders. This characteristic has been used to perform an error propagation analysis and to set up localization and correction procedures. The area overhead introduced for the detection of faults in the SD adder (checker) is about 60 percent. Regarding the error correction and graceful degradation capabilities, the proposed algorithm localizes the faulty digit(s) by means of a recomputation with the shifted operands method. After fault localization, two procedures have been proposed to provide graceful degradation of the system. The first one performs two different shift operations and uses the intersection of the obtained results to recover the correct output, while the second one is based on a reduced dynamic approach, which basically allows us to obtain the result in only one step, but with fewer output digits.
Marco Re received the Laurea degree in electronic engineering from the University of Rome "La Sapienza" in 1991 and the PhD degree in microelectronics and telecommunications engineering from the University of Rome "Tor Vergata" in 1996. In 1998, he joined the Department of Electronic Engineering of the University of Rome "Tor Vergata" as a researcher. He was awarded two one-year NATO fellowships with the University of California at Berkeley in 1997 and 1998. His main interests and activities are in the area of DSP algorithms, fast DSP architectures, fuzzy logic hardware architectures, hardware-software codesign, number theory with particular emphasis on residue number system, computer arithmetic, and CAD tools for DSP, fault-tolerant, and self-checking circuits. He has authored or coauthored more than 80 papers. He is a member of the IEEE.
Adelio Salsano is currently a full professor of microelectronics at the University of Rome, "Tor Vergata," where he teaches courses on microelectronics and electronic programmable systems. His present research work focuses on techniques for the design of VLSI circuits, considering both CAD problems and the architectures for ASIC design. In particular, of relevant interest are the research activities on fault-tolerant/fail-safe systems for critical environments such as space, automotive, etc., on low-power systems considering the circuit and architectural points of view, and on fuzzy and neural systems for pattern recognition. An international patent and more than 90 papers in international journals or presented at international meetings are the results of his research activity. At present, he is the president of a national consortium named U.L.I.S.S.E., among 10 universities, three polytechnics, and several of the biggest national industries, such as STMicroelectronics, ESAOTE, FINMECCANICA. He is responsible for contracts with the ASI, Italian Space Agency, for the evaluation and use in the space environment of COTS circuits and for the definition of new suitable architectures for space applications. He is also involved in professional activities in the field of information technology and is also a consultant for many public authorities for specific problems. In particular, he is a consultant for the Departments of Research and of Industry, of IMI, and of other authorities for the evaluation of industrial public and private research projects. Professor Salsano was a member of the consulting Committee for Engineering Sciences of the CNR (National Research Council) from 1981 to 1994 and participated in the design of public research programs in the fields of "Telematics," "Telemedicine," "Office Automation," "Telecommunication," and, recently, "Microelectronics and Bioelectronics." He is a member of the IEEE.
. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
