Abstract-This paper proposes a low-cost fault-tolerant Carry Look-Ahead (CLA) adder which consumes much less power and area overheads in comparison with other fault-tolerant CLA adders. Analytical and experimental results show that this adder corrects all single-bit and multiple-bit transient faults. The Power-Delay Product (PDP) and area overheads of this technique are decreased at least 82% and 71%, respectively, as compared to adders which use traditional TMR, parity prediction, and duplication techniques.
I. INTRODUCTION
Embedded processors are widely used in various safetycritical systems [1] , such as X-by-wire applications, in which failures could endanger human life or property [2] . These processors and VLSI circuits are very susceptible to transient faults [3] which are mostly caused by alpha particles and neutrons [4] . Particle strikes in a sensitive region of combinational circuits may lead to Single-Event Transient (SET) [5] or Multi-bit Event Transient (MET) [6, 9, 19] errors. Occurrence probabilities of these errors are very high in modern processors. On-line error detection techniques are widely used to detect transient faults in VLSI circuits [7] .
Adders are the essential part of data processing systems in safety-critical applications [11, 12, 13, 15] . They are commonly found in the critical path of arithmetic and logical units (ALUs) and address generation units. Therefore, the design of adder structures with on-line error detection and correction capabilities is an important research topic [3, 9] .
The Carry Look-Ahead (CLA) adder is one of the high speed adders among other adders like Carry Select Adder (CSA), carry skip adder and some implementations of the parallel-prefix adders [12, 14, 15, 16] .
So far, few error detection techniques have been especially proposed for the CLA adder. Existing techniques usually use the parity prediction technique in combination with a two-rail code [17, 18] . In [8] , the two-rail code is used for carries and the parity prediction is used for the outputs. Both the parity prediction and checkers of two-rail codes decrease the performance drastically, because both of them need XOR-tree structures. In [17] , the parity code is used to detect input operands error and the two-rail code is used for the output error detection. In addition to these techniques, duplication with comparison [17] can also be used. This technique has about 100% area and power overheads. All of the mentioned techniques only detect errors and simply use re-execution, the most popular and simplest correction technique. The reexecution cannot correct permanent faults, and it increases the delay about two times. Furthermore, many single-bit errors cannot be detected in CLA adders, which are checked by arithmetic codes [16] .
An alternative to re-execution [10] is TMR technique which is an almost traditional technique to cope with error occurrence in all circuits as well as the adders. This technique has high power consumption and area overheads; thus, a faulttolerant adder is achieved at the cost of highly increasing the PDP and the area.
In this paper, a low-cost fault tolerance technique for CLA adders is proposed. The proposed technique detects or corrects both permanent and transient single-bit errors. The CLA adder protected by the proposed technique is compared with the conventional Triple Modular Redundancy (TMR) and time redundancy techniques as correction techniques and with the parity-prediction and duplication as detection techniques. The Power-Delay Product (PDP) and area overhead of the proposed technique are compared with efficient error detection or correction techniques. The proposed technique has 82% and 71% improvement in PDP and area overheads, respectively.
Organization of this paper is as follows. In section 2, the basic CLA adder is introduced. In section 3, the new faulttolerant technique is proposed. Analysis and experimental results are presented in section 3 and 4, respectively. Section 5 concludes this paper.
II. THE BASIC CLA ADDER
The main idea behind the CLA adder is an attempt to generate all incoming carries in parallel; therefore, it increases the performance. The CLA adders consist of three components: P/G generator, CLAU, and Sum generator (Fig.  1) .
The First unit, P/G generator, generates P i and G i simultaneously using relations (1) and (2) .
The Second unit, Carry Look-Ahead Unit (CLAU), generates carries in a parallel manner using Relation (3). The Third unit, Sum generator, calculates the sum products of the addition by exploiting Relation (4). Considering Figure 1 , delay estimation of the CLA adder is derived from (5). In (5), ᇞ denotes the total delay of the CLA adder, ᇞ Ȁୋ denotes the delay of the P/G generator unit, ᇞ େ is the delay of CLAU and ᇞ ୱ୳୫ is the delay of sum generator unit.
Values of ᇞ Ȁୋ ൌ ሼο ௫ ǡ ο ௗ ሽ and ᇞ ୱ୳୫ ൌ ο ௫ are identical in any n-bit CLA adder. Relation (3) shows that increasing bit-widths of the input operands does not affect the delay of the CLAU; because this delay completely depends on its implementation. Relation (3) shows that each carry can be implemented by two-level logics. This means the delay of each carry will be identical is equal to ο େ ൌ ο ୟ୬ୢ ο ୭୰ . High fan-in is the major drawback of this type of implementation. As a result, the designer uses hierarchical implementation for the CLAU to cope with high fan-in issue of the one-level logic implementation. Hierarchical implementation results in increasing the delay and area of the CLA adder. The CLAU designed using the two-level hierarchical implementation is shown in Fig. 2 . Both L1 and L2 CLAUs generate carries based on (3) . Two extra signals which are called Block-Generate (BG) and BlockPropagate (BP) are generated in L1 CLAUs. The BP i and BG i signals can be derived from (6) and (7) . In (6) and (7), Mn and Mx denote the LSB and MSB of the Ps and Gs entering the ith unit, respectively.
III. THE LOW-COST TECHNIQUE FOR ERROR DETECTION OR CORRECTION IN THE CLA
The major area of the basic CLA adder is occupied by the CLAU. Therefore, using fault-tolerant techniques without considering the importance of the CLAU, results in high overheads. This paper proposes a new fault-tolerant technique which concentrates on the CLAU (both one-level or hierarchical implementation of the CLAU).
The proposed method modifies Relation (3) to generate more than one adjacent carry signals simultaneously. For example, using Relation (8) results in one preceding carry signal to be generated in each carry generator logics. In (8), C i-1 is constructed based on (3).
Ǣ ൌ Fig. 3 shows the structure of the modified carry generator logic, based on (8), which generates one preceding carry signal. This modification results in two gate-levels increase in the carry generator logic. This technique is called Low-Cost Error Detection (LCED). One redundant carry signal only provides the error detection ability, but more than one redundant carries provide the ability of error correction, which is called Low-Cost Error Correction (LCEC). For example, in (10) two redundant carries are generated; the modified carry generator logic is shown in Fig. 4 .
Generation of two redundant carries results in single-bit error correction. If three or more redundant carries are extracted, multi-bit errors can be corrected.
Fig. 4. LCEC carry generator logic to obtain two extra carries simultaneously
Relation (11) shows the gate-level increase in the CLAU; in this relation, D shows number of increased gate-levels, N ec denotes number of extra carries which are extracted, and L is the hierarchical depth of the CLAU.
Using (11), one redundant carry results in eight gate-levels increase in a two-level hierarchical implementation of the CLAU. Increasing gate-levels increases the delay of the CLA adder according to Relation (12) . The ο ୌେ denotes the delay of the hierarchical CLAU; L shows hierarchy levels of the CLAU, ο and ο ௗ are delays of OR-gate and AND-gate, respectively.
Till now main concepts of the LCED and LCEC techniques have been described. Although CLAU needs more attention than other units of a CLA adder and affects of other units in the reliability of the CLA adder cannot be ignored. Therefore, all units must be considered in our design. Fig. 5 shows the structure of the CLA adder with the Low-Cost Error Detection (CLA-LCED) technique. It uses the LCED technique in its CLAU and duplication in Sum and P/G generator units which is inevitable and is discussed later in Section IV.
In shows both hierarchical and non-hierarchical implementations of CLAU with LCED technique. Fig. 6 .a depicts a two-level hierarchical LCED-CLAU. Internal structure of level-1 LCED-CLAU is depicted in Fig. 6 .a. Structure of level-2 LCED-CLAU is analogous to the structure shown in Fig. 6 .b. As fault detection is not sufficient to have a reliable design, it should be followed by a correction mechanism. In general, re-execution is one of the best choices for coping with transient errors. Although re-execution may be useful for transient errors, it is not efficient in front of the permanent errors. Another design method for the fault-tolerant circuits is to apply an error correction technique to them in order to recover them from errors, i.e. TMR.
TMR is commonly used to mask errors which are occurred in the circuits. TMR can mask all single-bit transient and permanent faults. The LCEC technique which has been explained in the above is a low-cost technique which can be used in CLAU. Redundant Sum and P/G generator units have to be used in order to have a fault secure CLA adder with error correction characteristic, like CLA-LCED. The proposed structure is called Carry Look-Ahead adder with Low-cost Error Correction (CLA-LCEC).
Error correction ability in LCEC technique is achieved by setting N ec greater or equal to two. Single-bit error correction ability is achieved by N ec = 2. Considering N ec = 2 results in four gate-levels increase in carry generator logics. Complete structure of the CLA-LCEC with N ec =2 is depicted in Fig. 7 .
As shown in Fig. 7 and discussed in Section IV, the P/G and Sum generator units are triplicate units. In addition, in Fig.  7 , CLAU is modified by LCEC technique. The CLA-LCEC may seem to be similar with the CLA-LCED, but they differ slightly in their structures, i.e. the CLA-LCED has lower delay in comparison with the CLA-LCEC. Similar to the CLA-LCED, implementation of CLAU in the CLA-LCEC does not affect its reliability. Both hierarchical and non-hierarchical implementations of CLAU in the CLA_LCEC are depicted in Fig. 8. Fig. 8 .a shows the hierarchical implementation of the LCEC-CLAU. It can be seen that the architecture which is proposed in Fig. 8 .a is somehow the same as the architecture proposed in Fig. 6 .a.
IV. ANALYSIS AND RESULTS
In this section we are going to describe the effects of using LCED and LCED techniques on the CLA adder. The parameters which are going to be discussed are Fault tolerance, Performance, Power Consumption, and Area. Considering the architectures of the CLA-LCED, which is proposed and described in Section IV, the error detection coverage of the proposed technique is 100%. This can be proven by observing the behavior of the CLA-LCED adder in front of single-bit errors according to the following theorems:
Theorem 1, Occurrences of the single-bit errors in the P/G generator unit of the CLA-LCED adder can be detected.
Proof, Occurrence of single-bit errors in the P/G generator unit of the CLA-LCED adder results in one erroneous bit in P 0 , G0, P1, or G 1 signals. Considering Figures 10 and 11 , it can be seen that none of these pairs are used continuously. This means that single-bit errors in P 0 or G 0 affect on LCED carry generator logics. Therefore, this error can only have an effect on C 1 Theorems 1, 2 and 3 show that the CLA-LCED adder is completely fault secure and can detect any single-bit error which occurs in the circuit. These theorems include both permanent and transient errors. Behavior of this adder in the case of the multi-bit errors completely depends on where errors occur.
Using LCEC technique in the CLA adder results in 100% error correction, and it is described in Section V. Having high error detection or correction capability has indisputable effects on other parameters of the system.
Carry Generator BP and BG Generator
BP and BG Generator
Carry Generator LCEC C [3] Carry Generator The effects of using LCED technique as the error detection technique in the CLA adder is compared with the parity prediction and duplication. These techniques are chosen because of two reasons; first, both of them have 100% error detection similar to the LCED technique. Second, they are more popular error detection techniques. Fig. 9 shows the effects of applying error detection techniques on the CLA. As it can be seen, the power consumption and area overheads of the LCED technique are much less than the other error detection techniques. The results also show that average power consumption of the LCED technique is about 94.39% and 53.20% lower than the duplication and parity techniques, respectively. Notable decrease also can be seen in the area overheads of the LCED technique in comparison with duplication (91.10%) and parity (91.95%). Fig. 9 also depicts the effects of the error detection techniques on the performance of the CLA adder. As it can be seen both duplication and LCED techniques have much less delay overheads than parity prediction technique. The duplication has lower delay overhead than LCED technique Table 1 shows the delay overhead differences between LCED and duplication techniques. it shows that the maximum difference in delay overhead is 3.547% which belongs to the 4-bit CLA adder. Besides, increase in bit-width results in a noticeable decrease in delay overhead. Considering this, for 32-bit and 64-bit CLA adders which are mostly used in processors and DSPs, the delay overhead reduces up to 0.15% which is negligible. Although the delay overhead of LCED is higher than duplication (up to 3.54%), the Power-Delay Product (PDP) of the LCED is much better than the duplication and also parity prediction. Fig. 10 shows the PDP reduction of the LCED in comparison with parity prediction and duplication. This figure shows that the PDP reduction of LCED error detection technique is at least 43.48% while that of the parity prediction is 76.68%. and beneficial in comparison with parity prediction and duplication. In section III, it has been said that the proposed method can be configured into an error correction technique (LCEC). This error correction technique which consists of two extra redundant carries within the architecture and is discussed in the Section III has been implemented. The LCEC technique is compared with Triple Modular Redundancy (TMR). The TMR is almost the only error correction technique that can be applied to the CLA adder. Table 2 shows the effects of applying the LCEC and conventional TMR to the CLA adder. As depicted in Table 2 , it can be seen that LCEC acts exactly like the LCED technique. It is more beneficial in terms of power consumption, area and also PDP. Concerning the delay overheads, using LCEC increases the delay more than the conventional TMR, but the amount of this delay is ignorable because of the following reasons: 1-The maximum value of the delay overhead difference between conventional TMR and LCEC error correction techniques is 5.09%. The delay overhead decreases by increase in the bit width. 2-More than 175.32% decrease in PDP may cover the negligible decrease in performance. Above results let us to this conclusion that using LCED and also LCEC techniques are more beneficial in order to create a fault-tolerant carry look-ahead adder with minimum cost regarding PDP and area overhead.
V. CONCLUSION
Conventional adders suffer from carry propagation problem which has a deep effect on performance and also fault tolerance. The CLA adder produces carries in parallel and is among high speed adders. In this paper, a new low-cost fault tolerant technique is proposed which can be applied either as an error correction (LCEC) or an error detection technique (LCED). LCED and LCEC both have focused on the CLAU to reduce the power consumption and area overheads. The LCED-CLA decreases the PDP up to 180% in comparison with parity prediction and duplication. The LCEC-CLA reduces the PDP up to 160% in comparison with the traditional TMR.
