Abstract
Introduction
Built in Self Repair (BISR) is a fault tolerance technique in which spare modules are provided to supplement the core operational modules. Using BISR techniques, yield can be improved by replacing defective modules with spares before packaging. Altematively, reliability can be improved by automatic replacement of failed modules with spare ones, so that the overall system can continue to function correctly. This is especially important for military systems or space exploration missions, where it is critical that there are no system failures.
BISR techniques are regularly used in memories and bit sliced designs [l] . So far they have not received much attention in ASIC design, but with higher levels of integration they will soon be important for improving yield and/ or reliability in this area as well.
High level synthesis techniques have previously been used to address a variety of design goals [2] , but little work has been done on techniques for fault tolerant design. While previous high level synthesis methods have addressed intermittent and transient faults [3, 4] , this work concentrates on permanent faults.
The main concepts introduced are based on exploiting the flexibility provided by high level synthesis during design space exploration. Although this paper focuses on BISR design, the techniques introduced have a high potential to facilitate the synthesis of Application Specific Programmable Processor (ASPP) datapaths. Intelligent strategies to use the flexibility of solutions is a crucial component for achieving small overhead designs of reconfigurable datapaths in ASPP design. Most often, hardware overhead is minimized by identifying a set of configurations which are similar in terms of the required hardware resources. Consider, for example, the design of an ASPP to implement the 2 different computations A and B. Let Ai and Bj represent particular implementation solutions for the computations A and B. The ASPP implementation is the union of the hardware, A i U Bj , for any i and j. The goal is not to find the Min ( A i ) U Min ( B j ) implementation, but to find the Min ( A i U B j ) solution, which often is one for which Ai and Bj have similar hardware requirements.
Before proceeding, a number of assumptions are presented. In this paper the hardware model [5] shown in Figure 1 will be used. To stress the importance of interconnect minimization, the model clusters all registers in register files, connected only to the inputs of the corresponding execution units. We assume that there is no bus merging. If these techniques are used for fault tolerance against permanent faults, it is assumed that an error checking mechanism exists. If the methods are instead used for yield enhancement, it is assumed that manufacturing testing will detect the faulty units. In either case, the controller is reconfigured upon detection of a fault. The controller is assumed to be either reprogrammable or to lie on a separate chip.
Adder (A)
Step
BISR Algorithms for ASIC Design
Multiplier (M)
A I M
The most straightforward approach to BISR is to provide a spare for each hardware instance, resulting in full duplication of the hardware. Fortunately, the BISR overhead need not be so high. If the number of faulty units, K, is 1, for example, the assignment step provides the flexibility under which it is clear that only 1 spare for each hardware class is necessary. The operations from the failed unit will be assigned to the spare of the same type.
Considering the additional flexibility brought by scheduling, however, even fewer spares can often be used. When a failed unit is detected, instead of reassigning only those operations implemented by the failed unit, a complete reassignment and rescheduling of all operations of the computation is performed.
The example of Figure 2 illustrates how BISR overhead can be greatly reduced by exploiting scheduling flexibility brought by one type of spare unit to alleviate the need for another type of redundant unit. For this and all subsequent examples, assume that each operation takes 1 control cycle, and K=l. The minimum required hardware consists of 1 adder and 2 multipliers, or 2 adders and 1 multiplier. If scheduling flexibility is not exploited, the minimum BISR hardware will be 2 adders and 3 multipliers, or 3 adders and 2 multipliers. However, if we allocate only 2 adders and 2 multipliers, a complete BISR implementation The basic idea of the allocation mechanism is to start with an initial allocation, add hardware until a feasible allocation is found, then remove all unnecessary redundant hardware. The pseudo-code for the global flow of the algorithm is shown below.
A sharp minimum bound, Mj, on the necessary amount of hardware of each class j is used as the initial allocation.
Mj is defined as Mi = mi+ K where mj is a minimum bound on the amount of hardware j necessary for any non-BISR implementation and K is the number of faults. For each hardware class, j , relaxation based scheduling tech- RedundancyRemovalWithLookAheadPruning(); be understood by observing that any implementation requires at least m; units, and since up to K units of type J can be bad, at least (mj + K) units are needed.
If the initial allocation fails, the expansion phase is entered. In this phase, new hardware units are added one by one until the allocation succeeds. Good selection methods have a crucial impact on the speed of the algorithm and the quality of the solution. The primary criteria for deciding which hardware type to add next involves using a measure called the global stress of a hardware resource class. The global stress is composed of three measures: Minimum Bounds Stress, &-Critical network Stress, and Scheduling Stress [7] .
At the completion of the expansion phase, there is no guarantee that the feasible allocation is minimal. It is possible that a subset of the allocation, A' c A is also a solution. To assure that a local minimum has been reached, it is necessary to assure that if any units are removed from the current solution, the solution is no longer valid. In general, the units with minimum stress are tried first. It is also imperative, however, to incorporate a remember-and-lookahead technique, so that time is not wasted attempting allocations that will definitely fail. We remember all allocations and child allocations that failed, and use this information whenever considering an allocation A'. Before attempting A', a look-ahead to its child allocations will determine if there is any overlap between the children of A' and any known allocations that have failed.
For a successful allocation, a feasible schedule for each child allocation must be found. The schedules are ordered in decreasing order of difficulty, so that we can exit as fast as possible in the event that there is an insufficient allocation. This ordering is based on the global stress function described earlier.
Transformation-Based BISR
Sometimes, it is not possible to reduce BISR overhead using the allocation-based techniques of Section 2. In that case, transformations can be used to reduce the overhead. Transformations are alterations in the computational structure such that the behavior is maintained [2] . The key new idea is to transform the computation in different ways according to the needs imposed by the available hardware, for each possible scenario of failed units. The example in Figure 3 illustrates this idea. The following identity is used to transform 3a into 3b: All operations lie on the critical path, so it is not possible to reduce BISR overhead using the techniques of Section 2. If only computation 3a is considered, then 3 adders and 2 subtractors are needed for the BISR implementation. However, if both implementations are considered, only 2 subtractors and 2 adders are needed. If the subtractor fails, we can use computation 3a which needs 2 adders and 1 subtractor, and when the adder fails we can use computation 3b which needs 2 subtractors and 1 adder. In general, transformations to reduce BISR overhead can be classified into two groups:
1. Transformations to increase the resource utilization (and therefore reduce the need) of the units of the same type as the failed EXU,
2.
Transformations to reduce the number of operations of the same type as the failed resources.
Available Time = 2cc
A detailed explanation of how various transformations are used for BISR design is presented in [7] . Associativity, inverse element law, commutativity, and retiming are used in the transformation-based BISR design. Note that HYPER simulation tools can be used to verify that relevant numerical properties (e.g. numerical stability and overflow control) are maintained in all transformed designs. A probabilistic sampling algorithm [5] is used as the core transformation engine. Weighting in the cost function is affected by the number of resources available. The algorithm is thus driven to apply transformations that alleviate the need for resources that are in short supply. A key novelty is that transformations are tried for the various scenarios in decreasing order of estimated difficulty, as determined by the stress function of the previous sec tion.
Experimental Results
The BISR techniques were tested on the set of examples shown in Table 3 shows several examples designed using the transformation-based methods of BISR design. The average area increase is only 10.4%, and an average of 1.5 out of 3.75 additional hardware units were needed.
The consequences of the BISR techniques on yield and chip productivity are calculated, using the procedure presented by Stapper [81. An initial yield of 10% was assumed. The BISR designs had average yield and productivity improvements of 87.4% and 61.0% respectively [7] . 38.49 I 6.9 Table 3 : Results for Transformation-based designs
Conclusions
New techniques to compose a reconfigurable heterogeneous BISR implementation have been presented. The approach is based on the flexibility of high level synthesis during design space exploration. The experimental results and high yield and productivity improvements demonstrate the potential of this approach.
