Online repair through reconfiguration is a particularly advantageous approach in the nanoelectronic environment since reconfigurability is naturally supported by the devices. However, precise identification of faulty locations is of critical importance for fine-grain repairs.
INTRODUCTION
As CMOS is facing its physical limits in shrinking below today's 90nm scale, nano devices have been actively researched and proposed as the basis for the next generation electronic systems [1, 2] . However, due to their small scale, nanoelectronics are projected to be highly unreliable [2, 3] . Consequently, aggressive fault tolerance schemes are necessary.
One of the significant advantages of nanoelectronic systems is the abundant hardware resources, which support not only massive parallelism and high performance, but also the use of hardware redundancy for fault tolerance purposes. Typical hardware redundancy based fault tolerance schemes include fault masking and online repair schemes.
Fault masking approaches require multiple redundant copies of a computation and tolerate online faults through a majority vote. It is shown that the application of simple fault-masking schemes, such as NMR and N-MUX, necessitates immense hardware redundancies to tolerate the large fault rates in a nanoelectronic system [4, 5] .
Online repair schemes, on the other hand, depend on the online detection and identification of a fault, leading to a subsequent repair process to ensure correctness. In comparison to the fault masking The work of the first two authors is supported in part by NSF Grant 0082325. schemes, online repair schemes require significantly less hardware overhead [4] , especially for regular systems where replacement of faulty components can be performed with common backup units [6, 7] . In fact, as bottom-up fabrication, the expected mode of future nano circuit assembly, can only produce regular structures [8, 9] , a reconfiguration capability is necessary in any nanoelectronic system for the purpose of functionality formation in a post-fabrication stage, thus supporting repair online as well [10] .
The repair process can be performed either after online detection of a fault or after a field test carried out once in a while in the system. Such online repair schemes, however, hinge on the fault identification granularity. If a fault can be precisely localized within a small component, then the repair procedure can replace the small faulty component. A coarse-grain fault identification results in the replacement of a big chunk of hardware. Therefore, fine-grain fault identification is crucial to the effectiveness of online repair based fault tolerance.
It is well known that an adder constitutes the fundamental arithmetic building block. Since fault occurrence in the current CMOS systems is extremely rare, existing fault tolerance schemes for adders can typically afford to utilize coding based approaches or fault masking approaches [11, 12] . Related previous work also includes online detection of faults in adders [13, 14, 15, 16, 17 ]; yet online identification of the faulty locations has not been tackled in CMOS adders.
Since the fault rate in the nanoelectronic environment is dramatically higher, conventional fault tolerance schemes either fail to provide sufficient fault tolerance, or consume prohibitively expensive resources for doing so. Based on the reconfigurability of nanoelectronics, online repair based schemes can provide powerful fault tolerance capabilities with less hardware overhead, thus positioning them as a promising choice in fault tolerant adder design.
In adder designs, the various candidates provide multiple tradeoff points between the main optimization criteria of area and delay. In the particular environment of nanoelectronics, due to the new characteristics such as the abundant hardware and the pervasive unreliability, a re-evaluation of the various existing adder design candidates under the set of new challenges and benefits is necessitated.
Because of the abundance in hardware and massive parallelism supported by nano devices, the area constraint diminishes in relative importance in the nanoelectronic environment. Therefore, parallel adders, providing an optimal performance with latency, are promising in that the carries are computed in parallel so as to avoid the performance loss due to carry rippling. Among the parallel adders, carry lookahead adders (CLA) show particular advantages for the nanoelectronic environment with its regular structure, which not only fits for the regular structure based nanofabric, C12 [8, 11] C8 [4, 7] [0, 3] [12, 15] Figure 1: Hierarchical implementation of a 64-bit CLA but also support efficient online repair based fault tolerance. Since a CLA is constructed with regular blocks with inputs disjoint across the blocks, precise component-level fault identification within the adder can be approached with low hardware overhead. An aliasing analysis is provided at the end of this paper for the proposed fault identification scheme. Through the analysis of the repair hardware overhead and the fault coverage loss ratio caused by fault identification aliasing in the proposed scheme, it is shown that high fault identification resolution can be achieved at a significantly reduced amount of repair hardware with small fault coverage loss.
THE CARRY LOOKAHEAD ADDER
In order to avoid the delay of the rippling carries, a carry-lookahead adder (CLA) [18] uses the carry-in bit and the generate (g) and propagate (p) signals to predict the most significant carry bits. Since the prediction hardware of carries at the most significant bits becomes exceedingly expensive when the width of a CLA is large, the CLA is typically organized hierarchically with multiple levels. Figure 1 shows a typical 64-bit hierarchical CLA composed of 3 levels. The g,p signals of various blocks are generated hierarchically: initially, at the lowest level, the g,p signals of each bit are calculated; then, at the second level, the g,p signals of the 4-bit blocks are generated using the one-bit g,p signals; finally, the g,p signals of the 16-bit block are generated using the g,p signals of the four 4-bit blocks.
The carry bits are also calculated in 3 stages from the lookahead carry generators (LCG). Initially, the highest level LCG uses the lowest carry and the 4 highest level block g,p signals to generate the carry-in bits, , and , for the lower level LCGs. Then these carry-in bits and are used by the lower level LCGs to further generate 12 carry-in bits for the lowest level LCGs. In the third stage, the 16 LCGs at the lowest level produce the remaining 48 carry-in signals, thus completing the carry generation for all the bits.
As is shown in Figure 2 , a 4-bit LCG takes as inputs the carryin of the least significant bit and the g,p signals generated from 4 lower-level LCG blocks. 
FAULT IDENTIFICATION IN CLA
Fault identification relies on the differentiability among the manifestations of various faults, known as the fault syndromes. When distinct faulty components produce identical fault syndromes, fault aliasing occurs, resulting in an ambiguity set requiring consequently the replacement of all the possibly faulty components in the ambiguity set. Therefore, fault aliasing generally leads to extra hardware overhead in the repair process. Under a multiple fault occurrence scenario, aliasing may even lead to incorrect identification of faulty components, resulting in loss of fault coverage.
A CLA contains a number of LCGs as its constituent components. All the LCGs are identical in their internal structure. Owing to their regularity, reconfiguration can be easily supported by replacing faulty LCGs with spare ones. Their regularity also helps construct low-cost, high-resolution fault identification schemes.
As is shown in figure 2 , an LCG is composed of a g,p signal generation block and a carry-bit generation block. For the carry generation blocks, the existence of inherent redundancy enables us to perform precise fault identification. For the g,p signal generation blocks, we take advantage of the characteristic that different blocks use disjoint input signals and utilize a REcomputing with Rotated Operands (RERO) scheme to identify faulty components.
Carry generation block fault identification
In a hierarchical CLA, the carry-in to a LCG block is generated by the higher level LCG for performance purposes. However, the same carry-in signal can also be generated through another path at the same level with a constant delay.
Consider the 4-bit LCG block shown in figure 3 . The three carry generation blocks, and , independently generate the carry-in bits and , for the lower level LCG numbered 4, 3 and 2, respectively. In fact, another copy of (indicated as in figure 3 ), can be generated with a small latency, from the carry-in of and the g,p signals of the lower LCG block 3. Specifically, is generated in the same way that is generated in the higher level LCG by using . Since two copies of the same signal are generated using disjoint hardware blocks, faults can be precisely identified by comparing these carry-bit pairs. Figure 4 shows the fault identification process in the carry generation blocks. For instance, carry bits and are generated using block . If block is faulty, both the carry bit pairs of and conflict. The conflicting results on both sets of computation implicate as the faulty component. The syndromes of all the faulty components as well as the combinations of multiple faulty components are shown in the syndrome in figure 4 .
It is worth noticing that all single-faults, double-faults and triplefaults can be precisely identified without aliasing since the corresponding syndromes are all distinct. Basically, each hardware block participates in a combination of comparisons, thus forming a participation signature vector. In the carry generation process, all the signature vectors of the hardware blocks are linearly independent. Specifically, participates in 1-out-of-3 comparisons and
Proceedings of the Eleventh IEEE European Test Symposium (ETS'06) can be represented by a participation vector of (1,0,0). Similarly, the participation vector for is (1,1,0) and for is (0,1,1). A fault syndrome observed in this case is a vector formed by the results of the three comparisons, which is mathematically the linear combination of the participation vectors of the faulty blocks. Since all the participation vectors constitute the basis of a vector space, any fault syndrome vector can be uniquely represented by a linear combination of all the participation vectors. Therefore, all the fault syndromes are distinct and aliasing is prevented inherently.
Overall, the above analysis shows how the inherent redundancy in CLA carry generation can be exploited to provide precise fault identification resolution. Since the redundant computations reuse the existing data paths in a CLA, it entails very little hardware overhead. The additional hardware for each LCG for fault identification in the carry generation part is basically a few exclusive-or gates plus the logic implementing the syndrome table. The hardware overhead can be traded-off for fault identification precision by implementing a reduced subset of the syndrome table for the cases when it is not necessary to consider multiple faults.
Fault identification in g,p blocks with RERO
The g,p generation block in a CLA does not bear an inherent redundancy as the carry generation block. Therefore, redundancy needs to be explicitly added for fault identification. Similar to the well-known recompute with shifted operands (RESO) approach, recompute with rotated operand (RERO) is a time redundancy based technique to detect faults online [19, 13, 20] .
Typically, RERO for CLA requires significant hardware even for online fault detection since it is constrained by the dependencies among the input bits due to the carry propagation [13] . In this paper we achieve the precise identification of the faulty blocks among the g,p generation components by exploiting the absence of dependencies among inputs of the g,p generation blocks.
When components are performing the same type of computation with independent inputs, the faulty component can be identified by recomputing using a different permutation of the inputs and comparing the results. A simple form of permutation is rotation. In a CLA, the g,p generation blocks at the same level use disjoint input signals; therefore, a rotated recomputation can be used to perform fault identification. In figure 5 , we show two possible RERO scenarios. Numbers 1 to 4 correspond to computations while letters to denote the components. Each computation is carried out twice by distinct components at different time slots and the results compared. The outcomes of the comparisons are the fault syndromes.
RERO using a 2-bit rotate is shown in figure 5a . Computation 2 is executed by component in time slot t1 and is re-computed by component in time slot t2. Similarly, computation 4 is executed by in t1 and by in t2. When component is faulty, can be seen to be the fault syndrome, which is identical to the syndrome of a faulty . This results in a fault aliasing effect between and . Similar aliasing exists between the faults in and as well. In order to precisely identify faulty components under a singlefault assumption, in an -bit computation, the rotation amount should be chosen such that . As is shown in figure  5b , for RERO with 1-bit rotate, each faulty component exhibits a distinct syndrome, thus avoiding aliasing.
Generalizing RERO to hierarchical CLA's
If only a single level is considered, the selection of the rotation amount in RERO only needs to consider the aliasing criterion. However, in a hierarchical CLA, the rotation amount for RERO needs to consider the g,p blocks at multiple levels. During RERO the original and the duplicate computations need to use the same set of computations with different permutations over the components. A higher level g,p block takes inputs from 4 consecutive blocks at the lower level. Therefore, in order to guarantee that the same set of computations is applied at this higher level during RERO, the rotation amount should be chosen such that the same subset of consecutive inputs is not split across the two computations.
In a 3 level 64-bit CLA, the first level consists of 16 g,p generator blocks, each processing 4 bits from the input operands. The second level consists of 4 g,p generator blocks, each taking the output of 4 consecutive first level blocks. The third level has only 1 block, taking the output from the 4 blocks in the second level. RERO can identify a faulty component in the second level using a one-bit rotate. This in turn corresponds to RERO with a 4-bit rotate in the first level.
In general, in a hierarchical CLA, faulty blocks at all the levels except for the single block at the highest level 1 can be precisely identified by setting the rotation amount top down starting from the second-to-highest level. Since a higher level g,p block uses 4 lower level g,p blocks, a 1-bit rotate at the second-to-highest level translates into an -bit rotate at the lowest level where is the number of bits in the operands for the CLA.
Half fault manifestation & field test vector selection
Faulty units can be precisely identified by RERO when a faulty component manifests its fault in both the computation and the recomputation. However, since a component executes the two different computations in two different time slots, it is very likely that the fault manifests in only one of the two computations. We denote such a partial fault manifestation as Half Fault Manifestation (HFM). Typically, the manifestation of a faulty component depends on the input to the computation. At run time, transient faults and single event upsets can also result in HFM's. HFM precludes precise fault identification due to the aliasing of faulty syndromes. Figure 6 shows aliasing due to half fault manifestation.
Since the highest level of a CLA is composed of a single block only, RERO is neither applicable nor necessary for the fault identification of this specific block. We recommend the use of simple fault masking schemes for the fault tolerance of this block. During online fault identification, the HFM caused by different inputs of the two computations leads to an indistinguishable fault syndrome between the two possibly faulty components. Therefore, the subsequent repair process needs to use double the amount of backup hardware to replace both faulty component candidates. During field test, however, test vectors are pre-generated and stored to be applied once in a while during system execution, to identify faulty components and perform repair. Therefore, when RERO is used in field test, test vectors can be selected in such a way that HFM's are avoided.
For an -bit RERO on an -bit computation, the input bits should be partitioned into equal-sized groups of size . Field test should guarantee the application of identical inputs for every group for each test vector. Within each group, test vectors can be generated deterministically or pseudo-randomly using BIST. Applying RERO during field test avoids storing the test responses. This also significantly reduces the amount of fault identification hardware. Figure 7 shows the fault identification hardware for 1-bit RERO on a 4-bit computation. An exclusive-or gate is required for each bit pair to be compared. The fault syndrome is generated by the corresponding pair of comparison results; thus the XOR output pair is connected to an AND gate to indicate the faulty bit. In addition, we require two registers, one for the results of the basic computation and the other for the recomputation result. In a 3 level 64-bit hierarchical CLA, the proposed scheme can check the g,p signal generation of the 20 (= 16+4) blocks in the two low levels. This translates into two 20-bit registers, 20 exclusive-or gates and 20 AND gates for the syndrome detection circuitry shown in Figure 7. 
Fault identification hardware

FAULT IDENTIFICATION ANALYSIS
Let us consider -bit RERO without loss of generality, since aliasing introduced by single-faulty-block and two-faulty-blocks scenarios exists only within the four components that form a rotation. Figure 8 shows -bit RERO among four components and will serve as a template for analysis. Table 1 shows the faulty syndrome analysis under the singlefaulty-block scenario. The computation indices as well as the components refer to the ones in figure 8 . In the table, full fault manifestation is represented in upper-case letters. HFM is represented For instance, the second column shows that, when computation 1 and 2 conflict under the single-faulty-block scenario, component is the only faulty candidate and its fault is manifested fully in both computations. Therefore, the faulty component is precisely identified. On the other hand, the last column shows that, when a conflict exists only in computation 4, either component shows HFM in the first time slot, or component shows HFM in the second time slot. Therefore, the ambiguity set includes both and . In this case, the repair costs double the amount of hardware since and are both replaced due to the ambiguity.
Single-faulty-block scenario
Essentially, HFM results in an enlarged ambiguity set, thus requiring extra hardware to perform repair. In this case, double the amount of hardware is required due to HFM. There is however no loss of fault coverage in the aliasing cases, since a faulty component is always included in the ambiguity set.
Suppose the probability of a fault manifesting on any input is ; then the probability of full fault manifestation is and HFM is . The expected amount of hardware to perform the repair is thus the amount of repair hardware with full resolution. Tables 2 and 3 show the fault syndrome analysis for double-fault scenario among the 4 components. These tables only show the new aliasing introduced by the double-fault manifestation in the 4 components. Otherwise, the effect of double faults can be simply analyzed by treating them as two single faults. Table 2 shows how aliasing due to the ambiguity in the faulty component candidate set can result in additional hardware to repair. For instance, as shown in the second column, the full manifestation of faults in and results in conflicting values in computations 1 and 3 only, since the common manifestation of faults of and in computation 2 cancels the fault effect. However, when has an HFM in time slot t1 and B has an HFM in time slot t2, it results in the same syndrome. In fact, all the combinations listed in a column result in the identical faulty syndrome. In all the cases shown in table 2, all 4 components need to be replaced. Table 3 shows that some of the fault syndromes in the presence of two faults are identical to the fault syndromes in the presence of a single fault. These syndromes are in turn interpreted as single faults resulting in a consequent loss in fault coverage.
Two-faulty-blocks scenario
Based on the type of fault manifestation, namely full and half, there are three scenarios: (i) both faults fully manifested, (ii) both faults half manifested and (iii) one fully while the other half manifested. Following is a detailed analysis for the required hardware and loss in fault coverage for these three cases.
Both faults fully manifested
As can be seen in table 2, there are altogether 6 possible cases of two fully manifested faults: AB, CD, BC, AD, AC and BD, which cost double the amount of hardware to repair. Considering the case of an -bit RERO, there are altogether ( ( -1))/2 possible combinations of two fully manifested faults. Among these combinations cases require double the amount of hardware to repair. Therefore, the expected amount of hardware required for repair is . Table 3 also shows that no fault coverage loss is incurred when two fully manifested faults are considered.
One fully and one half manifested fault
There are altogether 16 cases of one fault fully manifested with the other half manifested, as is shown in columns 4-7 of table 2, which result in double the amount of repair hardware. From table 3 another 8 cases are shown in the rightmost 4 columns, which result in the loss of half the fault coverage, i.e., one of the two faults is not identifiable. These disjoint 16 cases and 8 cases constitute all the possible 24 possible combinations of one fully and one half manifested fault among among components
and . An extra 50% amount of repair hardware is required in this case since the HFM demands a replacement of two components and the full fault manifestation uses only one.
For an -bit RERO, there are in total possible combinations for this fault manifestation case, among which of them result in double the repair hardware, of them result in half fault coverage loss, and the remaining ones require 50% extra amount of repair hardware with no loss in fault coverage. Therefore, the expected hardware requirement in this case is the amount of required repair hardware with full resolution. The expected rate of 
Both faults half manifested
Similar to the analysis in the last subsection, we can observe from tables 2 and 3 that 8 cases of both faults being half manifested result in doubling the amount of repair hardware and 16 cases result in loss of fault coverage. Specifically, among the 16 cases, as is shown in columns 2-6 of table 3, 8 cases result in complete loss of fault coverage and the other 8 cases result in half fault coverage loss. The average fault coverage loss ratio in these 16 cases is 0.75.
When two faults are both half manifested, ambiguity is always incurred regardless of the positioning of the faults. The expected amount of repair hardware under this specific fault manifestation is always double the amount of required repair hardware under full resolution and the expected fault coverage loss ratio is .
Fault identification analysis for a 64-bit CLA
In the case of a 3-level 64-bit CLA, the fault identification for the lower two levels of g,p generation blocks consists of two RERO, performing -bit rotation. Table 4 shows the required repair hardware amount as well as fault coverage loss in the two RERO cases of and . From table 4 we can deduce the following expressions under the two-faulty-block scenario: n=4: HW amount = 2 n=4: fault coverage loss ratio = n=16: HW amount = n=16: fault coverage loss ratio = Figures 9(a) and 9(b) respectively plot the repair hardware and fault coverage loss ratios for =4 and =16 as a function of the fault manifestation probability, . It can be seen from figure 9(a) that when =4, under the scenario of two-faulty-blocks, double the amount of hardware is always required for repair, because there are 4 components in total and any fault position combination leads to aliasing. When =16, aliasing decreases with increasing , i.e., faults tend to fully manifest. Under the single-fault assumption, the required hardware reaches the lower bound of 1 when =1. In other words, aliasing is entirely caused by HFM. For the double-fault case, however, even under full fault manifestation ( =1), aliasing still exists depending on the positions of the two faults. This translates into 20% extra repair hardware.
The loss in fault coverage is due to a combination of HFM and the relative positions of the two faults. It can be seen from figure 9(b) that under the single-fault assumption, there is no loss in fault coverage. Under the double-fault assumption, there is no loss in fault coverage when there are no HFM's ( =1).
Overall, when faults are fully manifested, precise identification can be achieved and the faults can be repaired with almost zero hardware overhead. Consequently, for the proposed RERO in field test, fine-grain online fault identification and precise repair can be achieved. For the online identification of the faults that manifest rarely, the resolution is less precise due to possible aliasing, but still within acceptable upper bounds for various . In the worst case, double the amount of hardware is required to perform a repair, which is still quite efficient since the components are of fine-grain size in nanoelectronic CLA design.
When is small the fault loss ratio is higher, costing relatively more hardware to repair. Effectiveness of fault identification, both in terms of repair hardware and loss in the fault coverage, is better for larger . In fact, when is small, only a few components are involved; thus the likelihood of having multiple faults occurring among the components is also small. A large naturally demands more powerful and precise fault identification since the likelihood of faults occurring is much higher. Particularly, in the nanoelectronic environment where massive parallelism is supported and fault rates are dramatically high, the proposed scheme is promising due to its capability of providing powerful fault identification with low repair hardware overhead for large CLA's.
CONCLUSIONS
We propose in this paper a fault identification approach for nanoelectronic carry lookahead adders (CLA). The proposed approach is shown to be the crucial part in performing any reconfigurationbased component-level online repair scheme for the fundamental arithmetic elements. Such fault tolerance schemes are advantageous in nanoelectronics because: 1) reconfigurability is a required characteristic in nanofabric, and 2) it is very efficient due to the utilization of the least amount of hardware to perform the repair.
We deal with the two main parts of a CLA by proposing two distinct approaches. By utilizing time redundancy for recomputation exploiting the inherent redundancy, the hardware overhead to implement these approaches is quite small. The fault identification capability of the carry generation blocks is proven to be perfect with zero-aliasing. For the g,p signal generation blocks, a thorough analysis is provided to show that very high resolution can be achieved with low aliasing and small loss of fault coverage. The proposed approach sets up the basis upon which efficient online reconfiguration repair can be applied for the basic arithmetic blocks to provide powerful fault tolerance capability for nanoelectronic systems.
