AbstractÐThis paper addresses the problem of synthesizing fault-secure controller/data path circuits from behavioral specifications. These circuits are guaranteed to either produce the correct output, or to flag an error. We use an iterative improvement-based behavioral synthesis framework that performs functional unit selection, clock selection, scheduling, and resource sharing with the aim of minimizing the area of the synthesized circuit, while allowing multicycling, chaining, and functional unit pipelining. We present a dynamic comparison selection algorithm that can be used during behavioral synthesis to determine which intermediate results in the computation need to be secured in order to enable maximal resource sharing. Previous work on synthesizing fault-secure data paths has focused on ensuring that aliasing (a condition when the circuit produces an incorrect output and does not flag an error) cannot occur in any part of the design. We demonstrate that such an approach can lead to unnecessarily large overheads. In order to alleviate the overheads incurred for fault security, our behavioral synthesis framework uses ALiasing Probability analysiS (ALPS) in order to identify resource sharing configurations that reduce area while introducing a very low probability of aliasing (of the order of IH ÀIH for a bit-width of 32) in the resultant data path. Experimental results performed for several behavioral descriptions demonstrate that our techniques synthesize more compact circuits than techniques available in the literature, e.g., double moIdular redundancy or zeroaliasing techniques.
INTRODUCTION
W ITH the increasing use of VLSI circuits in critical applications such as automotive electronics, process control, implantable medical devices, and avionics, the need for fault-tolerant circuits is increasing. Fault tolerance can be considered a design metric at any level of the design hierarchy ranging from the circuit level to the system level. However, considering fault tolerance late in the design cycle often involves large overheads in area, delay, and power consumption. This motivates the application of fault tolerance at higher levels. Fault tolerance covers detection, diagnosis, and recovery from faults.
A well-known method to achieve fault detection is duplication with comparison [1] , [2] , where the problem at hand is solved by two identical circuits and the outputs of the two circuits are compared to check for errors. A disadvantage of this technique is that it involves twice the amount of resources of the simplex system, in addition to the cost of a comparison element. Previous work has explored the integration of fault tolerance during the behavioral synthesis process. In [3] , algebraic transformations were used to alleviate the hardware costs of n-modular redundancy. In [4] , checkpointing and rollback were integrated into the scheduling task of behavioral synthesis. A constrained assignment for self-recovering data paths was investigated in [5] .
A formal characterization of fault security for multiprocessor schedules was presented in [6] . The concept of fault security was first integrated into behavioral synthesis of application-specific integrated circuits in [7] , where a technique for securing intermediate results in a computation was presented in order to ease scheduling constraints and introduce additional opportunities for resource sharing. Allocation and assignment for ensuring fault security were addressed in [8] . Schemes for error recovery through checkpointing and rollback were presented in [9] , [10] , and [11] . Techniques for behavioral synthesis of reconfigurable or self-repairing data path structures were presented in [12] .
In this work, we focus on behavioral synthesis of faultsecure controller/data paths for data-dominated circuits. The data path is made fault-secure by having two separate threads of computation and comparing the results using equality checkers. In order to reduce the overheads required to provide fault security, we allow resource sharing between the two threads and secure intermediate results by inserting comparison operations appropriately. We present an algorithm to determine which intermediate results to secure in order to maximize the benefits of resource sharing. Unlike previous work, where the decision of which intermediate results to secure was performed statically before behavioral synthesis, our method performs this task dynamically (during behavioral synthesis). Thus, our algorithm also naturally explores cost trade-offs between comparators and the other data path components, such as functional units, registers, and multiplexers. We demonstrate that it is possible to significantly reduce the area overhead required for fault security if we are willing to tolerate a very small nonzero probability of aliasing. In this direction, we present an aliasing analysis procedure that guides behavioral synthesis by identifying configurations that are very unlikely to lead to aliasing. Allowing architectures with a very low probability of aliasing leads to a substantial increase in the feasible solution space and results in better solutions. Our dynamic securing and aliasing probability analysis techniques have been integrated into an iterative improvement-based behavioral synthesis framework that performs functional unit selection, clock selection, scheduling, allocation, and assignment, while exploring the tradeoffs that result from the interaction of these tasks. To the best of our knowledge, no other work targets functional unit or clock selection in behavioral synthesis for fault tolerance. We also allow for loops in the behavioral description, unlike many previous methods in this area. The aliasing analysis techniques that we have developed differ in flavor from those in the built-in self test area published in the past [13] . Unlike past methods, our aliasing analysis procedure does not assume any particular error distribution at the output of a faulty module; this makes our analysis procedure much more robust than past methods and applicable to a much wider class of faults.
The controller finite-state machine (FSM) is made faultsecure by constraining the state encoding procedure to assign codes having the same parity to all the states and, in addition, constraining the primary outputs and next state outputs of the controller to be implemented by disjoint cones of logic [14] . The parity of the primary outputs of the controller is also generated as a controller output. This ensures that a failure in any one of the logic cones will produce a single error either in a next-state line or in a primary output. This is detected using a totally selfchecking (TSC) parity checker which computes the parity of all the controller outputs (all the next-state lines, all the primary outputs, and the generated parity of the primary outputs).
The rest of this paper is organized as follows: In Section 2, we present some background material. In Section 3, we describe the behavioral synthesis framework. In Section 4, we discuss aliasing probability analysis and show how this analysis can be utilized in the behavioral synthesis framework. In Section 5, we discuss our dynamic securing approach. In Section 6, we present a complete illustrative example. In Section 7, we present experimental results and conclude in Section 8.
BACKGROUND
In this section, we introduce some basic concepts that we use in our work.
Behavioral Synthesis: Behavioral synthesis is the process of transforming a behavioral description of the system into a register-transfer level (RTL) implementation. Behavioral synthesis can be divided into several tasks, such as scheduling, allocation, assignment, functional unit selection, and clock selection. We assume that the behavioral specification, which is usually provided in a hardware description language, has been compiled into a control-data flow graph (CDFG). Vertices in the graph represent operations and edges in the graph denote data or control dependencies. The process of scheduling assigns a cycle (or set of cycles) of execution to each operation in the CDFG. Clock selection refers to the procedure of choosing a value for the system clock period. Functional unit selection involves choosing a functional unit library template or functional unit type for each operation in the CDFG. The processes of functional unit allocation and assignment decide how many instances of each functional unit template (e.g., ripple_carry_adder, carry_lookahead_adder, array_multi-plier, wallace_tree_multiplier, etc.) to use and map the operations of the CDFG to the allocated instances.
Fault Security: A circuit is said to be fault-secure with respect to a specified class of faults if, on the occurrence of any fault from the class, the circuit produces either the correct output or an output that will be detected as erroneous (i.e., an erroneous output never goes undetected).
The fault model that we use in this work assumes the fault to be confined to a single unit in the circuit (e.g., adder, multiplier, register, multiplexer, etc.). The fault can last for any duration (i.e., it may be permanent or transient) and can cause an arbitrary error at the unit's output.
Duplication with comparison [1] is the traditional method of providing fault security to a system. The two circuits that execute the same computation will, henceforth, be referred to as the original and the copy. This method guarantees fault security against any fault that affects either the original or the copy, but not both. However, duplication and comparison after synthesis can lead to excessive overhead in circuit area. Moreover, having identical implementations for the original and copy leaves the circuit highly susceptible to common mode failures [15] . We use a variant of the duplication and comparison technique where we duplicate the CDFG. We compare the outputs of the original and copy CDFGs for error detection by inserting a sufficient number of equal-to comparison operations in the CDFG. We assume that these comparison operations are performed by totally self-checking (TSC) equality checkers and, hence, faults in the units implementing these comparisons need not be explicitly considered during behavioral synthesis. Depending on how the duplicated CDFG is scheduled, many equalto comparison operations can share the same equality checker. The outputs of all the TSC equality checkers can be reduced to just two outputs using a TSC two-rail checker.
Securing of Operations and Edges: Securing of operations refers to a technique in which the outputs of some operations in the original part of the duplicated CDFG are compared with the results of the corresponding operations in the copy. The operations whose results are compared are referred to as secured operations. Securing of operations has been used to ease scheduling constraints and create additional opportunities for resource sharing [7] . In the sections that follow, we use the term ªsecured edgeº to mean an edge whose source node is secured. Example 1. In Fig. 1 , we have a CDFG that computes the sum of three primary inputs, I, P, and Q. Two copies of the CDFG are executed and the results are compared. The operation in the copy, corresponding to operation Op in the original, is labeled as Op_c. In this example, nodes nI and nP are secured. The nodes denote comparison operations. The outputs produced are the primary output yut and the comparison outputs hekI and hekP. If the adder executing operation nI fails, an error is flagged at hekI and hekP. If the adder which executes nP fails, an error is flagged at hekP.
Aliasing: If a fault causes the circuit to fail in such a manner that the outputs of the original and the copy are equal, but erroneous, aliasing is said to have occurred.
Example 2. Consider the circuit of Fig. 1 again. Let us suppose that we had only one comparison operation, which compares the outputs of nodes nP and nP . Also, assume that nI and nP are mapped to the same functional unit. If this functional unit fails, then aliasing occurs if the inputs of the comparison operation are equal but erroneous.
Given a duplicated CDFG and an RTL circuit that implements it, Observation 1 below gives the necessary condition for aliasing to occur due to a fault in any functional unit.
Observation 1. Aliasing can occur only if there exists a pair of operations nI in the original and nP in the copy that are performed by the same functional unit such that there exist unsecured paths from both nI and nP, the node that corresponds to nP in the original, to the same primary output, y, or a loopout, 1 v, in the duplicated CDFG.
Definition 1.
A set i of secured operations is said to separate two sets of operations e and f in the duplicated CDFG which are mapped to the same functional unit if, upon securing of each of the operations in i from the CDFG, the necessary condition for aliasing presented in Observation 1 is not satisfied for any pair of elements P eY P f.
Aliasing is undesirable for fault security. One way to avoid aliasing is to impose the following constraint during resource sharing: Operations in the original and the copy cannot be assigned to the same functional unit. This defaults to the approach of duplication and comparison after behavioral synthesis. To illustrate this approach, consider the duplicated scheduled CDFG shown in Fig. 2 . The operations in the CDFG are annotated with the names of the functional units that perform them. From the figure, we can see that four multipliers, two adders, and one equality checker are required to implement the duplicated CDFG.
A further refinement of the above idea presented in [7] is the following: Nodes in the original and the copy which are not separated by secured operations cannot share a functional unit. The above constraint was imposed in [7] by first scheduling the original CDFG, then ªdelineatingº the duplicated CDFG into several ªregions,º and, finally, scheduling the regions in the copy. Operations from the original and copy that belonged to different regions were allowed to be assigned to the same functional unit. These constraints were incorporated into an assignment procedure in [8] .
Example 3. Consider the duplicated CDFG in Fig. 3 . This CDFG was synthesized while allowing sharing between the original and the copy. We can see that a fault in functional unit wP could lead to an error in the original and the copy. However, if the extra equality checker eqP is added, then the unsecured path between nI and output yut is secured so that the necessary condition for aliasing is not satisfied. This ensures that any fault in functional unit wP that leads to an error at the final output will be caught by eqI or by eqP. A secured operation ªcutsº a path from the nodes it separates to primary outputs and loopouts, thus preventing aliasing (see Observation 1) . The insertion of the extra TSC equality checker saves a multiplier, which is a trade-off worth making. Securing of operations can also be utilized to increase the scheduling freedom available to various operations in the CDFG and, hence, the opportunities for resource sharing [7] , as shown next.
Example 4. Consider the transformed CDFG on the right in Fig. 4 . Here, we make use of the same instance of operation nQ in both the original and the copy. The corresponding operation in the copy, nQ , is now required to feed only the comparison operation with output hekP and, hence, the scheduling freedom available to operations in the transitive fanin of this operation is significantly enhanced. For instance, operation nQ , which was required to complete in the second control step initially, can now be scheduled in control steps P or Q. Scheduling it in control step Q reduces the number of required multipliers to implement the duplicated CDFG from two to one and the number of required adders from four to two at the expense of an extra TSC equality checker. The decision of which operations to secure can significantly affect opportunities for scheduling and assignment in several ways. Moreover, the TSC equality checkers that implement the securing function themselves come with an associated cost and, hence, excessive use of operation securing can result in a suboptimal design. In effect, the 1 . A loopout of a CDFG is an output which serves as an input to its next iteration. securing of intermediate operations interacts with the various behavioral synthesis tasks extensively. Previous work has ignored this interaction by performing the tasks separately. We present a procedure for dynamically securing intermediate operations that can be used during behavioral synthesis to explore the above trade-offs.
Previous work has focused on ensuring that aliasing does not occur in the synthesized circuit. As we shall illustrate later, enforcing this (pessimistic) condition may lead to excessive overheads in several cases. We demonstrate that it is possible to significantly lower the overhead required by performing an analysis of the probability of aliasing in the circuit. Our analysis bounds the probability of aliasing for various resource sharing configurations. The information provided by this analysis is used to perform behavioral synthesis such that the area of the resultant RTL circuit is as low as possible while still maintaining a very low probability of aliasing.
Example 5. In the example of Fig. 5 , we allow sharing between nodes in the original and copy even when there is no secured operation separating them. This is because, as illustrated in Section 4, the probability of aliasing, while not zero, can be shown to be extremely low. This saves one TSC equality checker compared to Fig. 3 .
While the use of comparisons to secure intermediate results can lead to area reduction, securing is not always desirable. This is owing to the fact that sharing of a functional unit by multiple operations might require more than one operation to be secured and thus, potentially, require more than one TSC equality checker. Hence, the gain in area obtained by sharing may be nullified by the extra equality checkers. Moreover, secured operations lead to extra edges in the CDFG, i.e., the number of data values that need to be transferred is increased. This often translates to an increase in the interconnect. As feature sizes shrink, interconnect area becomes an increasingly important component of circuit area. Finally, using securing of operations to ease scheduling constraints prolongs the lifetimes of the secured variables. This can lead to an increase in the number of registers that could offset the gains due to fewer functional units.
The above arguments suggest that the decision of which variables to secure should be considered as an integral part of behavioral synthesis and its interaction with the other behavioral synthesis tasks cannot be neglected. The complexity of the problem is such that an optimum solution cannot be guaranteed. Therefore, we use an iterative improvement-based behavioral synthesis framework that interleaves scheduling with functional unit selection and resource sharing, considering the interaction among these tasks. The algorithm also explores several clock periods and selects the one that yields an RTL circuit with the lowest area that meets the sampling period constraints. While our algorithms do not guarantee optimality, they are capable of escaping local minima through hill-climbing and, hence, yield high quality solutions in practice. We have augmented the behavioral synthesis framework by developing a dynamic procedure that selects the best operations to secure during synthesis. In order to better explore the design space, we allow multicycling, chaining, and functional unit pipelining. The use of aliasing probability analysis helps us to perform hardware sharing without comparators to secure intermediate results when the probability of aliasing in the resulting RTL circuit is extremely low and, thus, enables us to avoid all the associated overheads. Our method makes the RTL circuit secure against faults in registers and multiplexers in addition to functional units. We also make the controller secure against single faults.
BEHAVIORAL SYNTHESIS FRAMEWORK
We focus on data-dominated behavioral descriptions, as are common in the digital signal and image processing domains. Two important characteristics of such descriptions are: 1) They consist mainly of arithmetic operations like addition, subtraction, multiplication, and delay operators, and 2) the rate at which the design should process input samples is typically fixed, i.e., it is crucial to meet the input sample rate requirement, but it does not pay to be able to process input samples any faster. Thus, we attempt to provide fault security while minimizing area under hard real-time performance constraints.
We have incorporated our techniques into the framework of an existing behavioral synthesis system which is based on iterative improvement [16] , which has been used with great success in the area of high-level synthesis [17] . A related methodology, based on simulated annealing, has also been published [18] . The synthesis methodology is outlined in this paper and the parts that deal with incorporating fault-security are detailed. An overview of the behavioral synthesis framework for fault-secure data paths, called ALPS, is given in Fig. 6 . ALPS accepts as input a CDFG and a constraint on the input sample period. The CDFG is first duplicated and the operations that compute primary outputs and some intermediate operations are secured with a view of minimizing the number of TSC equality checkers in the RTL circuit. The algorithm explores the clock period space by considering a subset of those candidate clock periods that divide the given sample period constraint evenly. The fastest clock period cannot be any less than the register-to-register delay of the fastest functional unit in the functional unit library. Further pruning in the clock period space is performed based on the delays of various components in the library. Details and the rationale behind the clock period pruning method we use can be found in [16] . For each clock period, ALPS first derives an initial solution that meets the given sample period constraint. It then searches for a sequence of moves (as opposed to a single move) that maximizes the cumulative improvement in the quality of the solution. This improvement is measured by a quantity called Gain. In our case, since we want to optimize the area, the gain is the reduction in area. At any point, we choose the move that gives the maximum improvement in the gain. Note that, at some points, there might exist no moves which can improve the quality of the solution. In this case, we pick the move which causes the least degradation in solution quality. A move which temporarily degrades the solution quality might present opportunities for the application of moves which improve the overall solution quality. The various moves that we explore are described in Section 3.1. While exploring moves that affect resource sharing, we use our aliasing probability analysis method to ensure that the aliasing probability as a result of the move is very low. The working of this algorithm is illustrated through an example in Section 6.
Moves in the Iterative Improvement Procedure
In order to explore the functional unit selection, scheduling, and resource sharing search spaces, we define three types of moves for our algorithm: class e, f, and g. At every stage of iterative improvement, the best e, f, and g moves that can be applied to the solution are determined. The best move is picked at each stage, irrespective of type. Note that the best move can have a negative gain, i.e., it could degrade overall solution quality. The ability to select and apply such moves is a key feature of our iterative improvement algorithm and enables it to escape local minima.
Moves of Class A transform the data path by replacing an instance of a functional unit tI by a functional unit tP.
For example, the replacement of a ripple_carry_adder by a carry_lookahead_adder is a move of class e.
Moves of Class B either replace two functional units fuI and fuP by a single functional unit fu, or split a single functional unit into two separate functional units. For merging of functional units, the following three conditions should be satisfied:
C1. There exists a functional unit which can execute operations executed by both fuI and fuP. C2. There exists a schedule which makes the sharing feasible. In general, sharing between two functional units might involve a reschedule of the operations mapped to fuI and fuP. C3. The probability of aliasing due to a fault in the merged functional unit is extremely low. The manner in which condition C3 is checked is the topic of Section 4. If this condition is violated, then some intermediate results are appropriately secured to reduce the probability of aliasing to acceptable levels. Operations are secured in such a manner that the nodes in the original, which are assigned to a functional unit, and some other nodes in the original, which correspond to those in the copy, that are assigned to the same functional unit are separated by secured operations. The procedure for selecting which intermediate operations to secure is presented in Section 5.1. Figs. 2 and 3 illustrate the application of a move of class f. Here, two operations, one from the original and the other from the copy, which were initially assigned to different functional units, M2 and M4, are now assigned to the same functional unit, M2. It can be easily seen that the operations satisfy conditions C1 and C2. Our aliasing probability analysis techniques presented in Section 4 reveal that condition C3 is also satisfied in this case.
Moves of Class C secure a specific operation yp with an aim of easing the scheduling constraints on operations in the duplicated CDFG. Moreover, all the fanouts of yp, as well as of the corresponding operation in the copy, yp , are now fed by yp. While a move of class g in itself may not lead to a decrease in area, it can enable other subsequent moves of classes e and f due to the extra scheduling freedom of the operations in the transitive fanin of yp (including yp ). Fig. 4 is an example application of a move of class g followed by a reschedule. In the figure, securing of operation nQ enables rescheduling of operations nI , nP , and nQ in the copy. Fig. 7 shows the architecture of the fault-secure controller that we synthesize. An FSM specification for the controller is generated based on the schedule and assignment information after data path synthesis. Logic synthesis tools are then used to generate an implementation. The following constraints are imposed on the logic synthesis process in order to result in a fault-secure implementation. During state encoding, all valid controller states are constrained to have the same parity [14] . The outputs are also constrained to be encoded with the same parity with the help of an extra output. The combinational logic of the controller is constrained such that no logic sharing is performed among its various output cones. The present-state lines and the controller outputs are fed to a TSC parity checker. A fault within any one of the logic cones or a single fault at any state flip-flop or a single fault in the TSC parity checker itself is detected by the TSC parity checker. The outputs of this checker can be combined with the outputs of other equality checkers in the datapath through a TSC two-rail checker. Thus, only two extra pins are required for fault tolerance purposes.
Fault Security for the Controller

ALIASING PROBABILITY ANALYSIS
In this section, we describe our aliasing probability analysis procedure which is used to identify additional resource sharing opportunities that do not significantly compromise fault security. The basic features of our procedure are as follows:
. Aliasing analysis tries to determine whether a given resource sharing configuration can cause aliasing. Its inputs are a duplicated CDFG, a resource sharing configuration, and a faulty module and the output of the procedure is a yes if the configuration can result in aliasing with a high probability and no if the probability of aliasing is extremely low. . The primary inputs to the circuit are assumed to be independently distributed. Note that the inputs to individual functional units and multiplexers might be correlated. . Our fault model is general. A fault can affect the functionality of a module in any arbitrary manner. If the aliasing analysis procedure returns a no, then the probability of aliasing is extremely low, independent of the manner in which the fault affects the functionality of the module. Error Model: Consider a functional unit fu that is affected by a fault p . Let one of the operations performed by fu be sI yp sP, where sI and sP are variables and yp is an operation in the duplicated CDFG. Then, the output produced by fu under the influence of the fault is modeled by the equation:
yutputfu sI yp sP fsIY sPX I Note that the above additive error model does not forsake any generality since no assumption is made about the nature of the error function fsIY sP. No assumption is made about the nature of p either (i.e., we do not assume any specific fault model). We, however, do assume that the fault is combinational in nature, i.e., the effect of the fault does not change over time. If a functional unit performs several operations, the outputs of any arbitrary subset of these operations may be erroneous.
The Aliasing Equation
Our aliasing analysis procedure is based on the derivation of the aliasing equation that captures the conditions which need to be met for aliasing to occur. In this section, we explain how to derive the aliasing equation for a duplicated CDFG, given the candidate set of operations that we desire to assign to the same functional unit. A separate aliasing equation is derived for each primary output and loopout of the CDFG. The aliasing equation is derived by expressing the value of each primary output and loopout of the original and the copy in terms of the primary inputs and the error function. The error function captures the effect of a fault in a functional unit on the results of the operation(s) it performs. The effect of the faulty nodes in the original and copy CDFGs appears at their respective primary outputs or loopouts.
Effect of Single Faulty Node
In this section, only one node in the CDFG is assumed to be faulty. We evaluate the error due to the fault in that node at a primary output or loopout. The procedure is illustrated by the following example: Example 6. Consider the duplicated CDFG in Fig. 8 .
Suppose the functional unit that executes addition operation nI is faulty and nI is the only operation that is performed by it. Assuming that all other operations are executed by fault-free functional units, we can find the value at the primary output as shown next. In the equations that follow, ni stands for the output value of operation ni. The values at the nodes are:
nI I P fIY P nP I P Q fIY P nQ I P T fIY P nR SI P T fIY P nS I P fIY PI P Q fIY P nT I P fIY PI P Q fIY P R nU I P fIY PI P Q fIY P R SI P T fIY PX
The error at the primary output can be calculated by subtracting the expression for the error-free output from that for the erroneous output. In this case, the error turns out to be f P IY P fIY PPI PP Q S.
The error at the primary output due to a faulty node ni may not depend on the edges that feed paths from the erroneous operation to the output at dd or sutrt operations. This is because these terms may get cancelled out when subtracting the expression for the error-free output from that for the erroneous output. For example, consider the error at nU in Example 6. The error does not depend on primary input T which feeds the add operation, nQ, on the path from erroneous operation nI to the output of operation nU.
Effect of Multiple Faulty Nodes
In this section, we derive an expression for the error at a primary output or loopout when multiple operations are affected by a fault in the functional unit that they are assigned to. The calculation is illustrated by the following example:
Example 7. Consider the example CDFG shown in Fig. 9 . In the CDFG, the faulty addition operations are shaded. We evaluate the error at the primary outputs, nW and nW . The outputs of the faulty nodes are:
The expressions for the errors at the primary outputs are:
error t nW fRY SI T U fTY UR S I fRY S and error t nW WfIY P. Aliasing would occur if these two errors are equal.
Observation 2. The error at a primary output or loopout can be written as a polynomial in the error functions of the faulty operations and the primary input variables. Mathematically, the expression can be written as follows:
where H is the number of terms and i is the ith product term, that is, a product of some primary inputs and some error functions.
F o r t h e l a s t e x a m p l e , f o r t h e e r r o r a t nW , I IXfRY S, P TfRY S, and so on.
Summary
Now that we can find the error at a primary output or loopout in terms of the errors at the faulty nodes, we can equate the error at each such output/loopout in the original with the error at the corresponding output/loopout in the copy to obtain the aliasing equation as follows:
irror due to nodes in H t the output À irror due to nodes in HH t the output HY Q where H is a set of nodes in the original and HH is the set of nodes in the copy that are assigned to the faulty functional unit.
Aliasing Conditions
Our aliasing analysis procedure consists of deriving the aliasing equation and then analyzing it to draw conclusions about the probability of it being satisfied. Exactly evaluating the probability of aliasing requires information about the nature of the error functions (which in turn depends on the fault model) and the joint probability density function (PDF) of the input variables. In general, such information may not be available for several applications and, even when available, the computational requirements for performing an exact analysis may be prohibitive (e.g., if multiple stuck-at fault model is assumed). Hence, we have developed methods to compute an upper bound for the aliasing probability. We present two conditions (Condition 1 and Condition 2) that can be used as tests to check whether the probability of aliasing is low. If either test succeeds, we can share a functional unit between the candidate operations. Initially, we assume that all primary input variables of the CDFG are uniformly distributed in their entire range and uncorrelated. Later, we show how arbitrary distributions can be considered. A generalization of the aliasing conditions concludes this section. We discuss the two conditions next.
Condition 1
In this subsection, we describe a condition, called Condition 1, under which sharing can be performed with a very low aliasing probability. The test for this condition hinges on the following result:
Result 
Consider again the duplicated CDFG of Fig. 9 . The aliasing equation for this CDFG when the shaded operations are faulty is as follows:
WfIY P À fRY S TfRY S UfRY S fTY U RfTY U SfTY U fRY SfTY U HX
In this case, we can see that the number of product terms, , is equal to V. Consider a primary input i that is a factor of a proper subset of the product terms and does not appear in the argument of any error function. We rewrite the aliasing equation, abstracting all the terms other than i into coefficients g j . In other words, we reduce the aliasing equation to a polynomial in i as given below. If at least one coefficient is nonzero, then the probability that a randomly chosen value of i will solve the equation is upper bounded by kaP x , where x is the bit-width of the data path and k is the degree with respect to i of the aliasing equation. (This is because (7), being a polynomial in i, has at most k roots. The probability that a randomly chosen integer in the range HY P x À I will satisfy the equation is upper bounded by kaP x . The actual probability could be much less because (7) may not have any integer roots, in which case the probability of the equation being satisfied by i is H.) For a data path which has a bitwidth of QP, for example, this is negligible. The above analysis has shown that the aliasing probability is very small if at least one of the coefficients g j is nonzero. We next analyze the aliasing probability when the assumption (at least one of the coefficients is nonzero) does not hold. To see what this means, let us apply the condition to our running example. In the following analysis, we refer to the probability of aliasing as lising .
Both of these subequations must be satisfied for aliasing to occur. The aliasing analysis procedure is recursively called on these two subequations. Therefore, the lising of each subequation bounds the lising of the whole equation.
In general, the primary input i can be looked upon as inducing a partition 2 on the product terms of the aliasing equation. One block of the partition consists of product terms which depend on i and in the other block are product terms which do not depend on i. The product of all these partitions gives a set of aliasing subequations. If each subequation consists of a single product term, then, for the aliasing equation to have a solution, each of the product terms in the aliasing equation must be zero. Note that each product term in the aliasing equation represents an additive error at a primary output or loopout. Therefore, if all the product terms are zero, it implies that the error at the primary output or loopout is zero and aliasing does not occur.
We illustrate recursive testing of the aliasing condition using the following example:
Example 10. Consider the aliasing equation given below.
IPQfIY P IfPY I RfPY I P HX 2. A partition, , is a structure which is defined on a universal set feIY ePY F F F Y erg and consists of several disjoint subsets IY PY F F F Y m such that m iI i . The product of two partitions, I and P, which have the same universal set is another partition Q which has the following property: Disjoint subsets eIY F F F Y en which comprise Q are such that if any two elements eiY ej P are in the same subset ei of Q, then the two elements are in the same subset in both I and P. By definition, ero rtition ffeIgY fePgY F F F Y fergg and sdentity rtition feIY ePY F F F Y erg. For examp l e , i f fIY PY QY Rg, I ffIY PgY fQY Rgg, a n d P ffIY QgY fPY Rgg, t h e n t h e p r o d u c t o f I a n d P i s ffIgY fPgY fQgY fRgg, which is the ero rtition.
Here, I IPQfIY P, P IfPY I, and Q RfPY I P, where I, P, Q, and R are primary inputs.
I is the only product term that depends on Q. Moreover, the error functions do not depend on Q. Hence, the aliasing equation splits into the following two subequations: IPfIY P H and IfPY I RfPY P I H.
It turns out that we can apply Condition 1 to the second subequation. Here, we see that the second term, Q, has the primary input R unique to it. So, this induces a partition on the subequation which leaves us with three distinct subequations: IPfIY P H, IfPY I H, fPY P I H. If a product term is H, then one of its constituent factors must be zero. We already know that a product term is a product of some primary inputs and some error functions. The probability of a primary input being zero is IaP x . This implies that, given that a product term is 0, with a high probability one of the error functions in it must be zero. (If there are no primary inputs in the product term, then one of the error functions which constitute the product term is guaranteed to be zero). From the above, it follows that all three error functions in our example are zero with a very high probability. However, this implies that all primary outputs assume their error-free values, in which case, by definition, aliasing cannot occur. Thus, we can conclude that lising IaP x .
The pseudocode for our procedure that tests for Condition 1 is given in Fig. 10 . In this code, the support set of an error function is defined as the set of all primary inputs that it depends on.
To derive the above result, we assumed that all the primary inputs are uniformly distributed. We now show how to evaluate lising for nonuniform distributions. Let primary input i satisfy Condition 1, i.e., it does not appear in the support of any error function and influences the coefficients of a subset of the error functions. Suppose i is nonuniformly distributed and the most probable value of i has a probability of eki. Aliasing occurs when a randomly chosen value of i satisfies the aliasing equation.
The number of solutions to the aliasing equation is bounded by k where k is the degree of i in the aliasing equation. Hence, lising is bounded by k Â eki, which corresponds to a case when all roots of the equation have a probability of eki.
Graph-theoretically, Condition 1 can be stated as follows: If a primary input, which is not in the transitive fanin of any of the faulty nodes, feeds a path from any faulty node to a primary output at a multiply operation, then the aliasing equation can be split into subequations. For instance, in the CDFG in Fig. 9 , W does not appear in the transitive fanin of any error function, but feeds the path from nI to yut at operation nU. Whatever values the error functions take, satisfaction of the aliasing equation requires, in some sense, the knowledge of W which the faulty nodes do not possess.
Condition 2
Our second test for aliasing is based on performing a set of variable transformations to recast the aliasing equation into a form where it is more amenable to an analysis similar to that for Condition 1. Condition 2 uses the following result:
Result 2: If there exists any pair of primary inputs, mY n, in the aliasing equation which satisfies the following conditions: 1) If the argument of any error function in the aliasing equation depends on either of them, the dependence is by means of a common subexpression g, involving m and n, and 2) m influences the coefficients of some of the error functions in the aliasing equation, but n does not, then the following holds: Satisfaction of the aliasing equation implies, with a very high probability, that the terms in the aliasing equation in which coefficients of error functions depend on m must sum to H and the rest of the terms in the aliasing equation must also sum to H.
If Result 2 can be recursively utilized to prove that each term of the aliasing equation is H, Condition 2 is said to be satisfied. The rest of this section is devoted to a proof of Result 2. The algorithm to check for Condition 2 is shown in Fig. 11 .
Each of the product terms in the aliasing equation given in (6) can be written in the following form:
The product in j is taken over all the primary input factors in a product term, while the product in k is taken over all error function factors. The variables p j and p k are the powers of the primary input j, and error function f k , respectively. (The subscript i alongside the brackets indicates that the corresponding product terms are derived from i). upport eti is defined to be the set of primary inputs that occur in j j p j i . Consider two distinct primary inputs, m and n such that, whenever m and n appear in the argument of any error function, they are part of a common subexpression, gmY n. Moreover, suppose that m is part of upport eti for some i. In addition, we also require that n does not belong to upport eti for any i. Under the above assumptions, we prove that variable m splits the aliasing equations into two aliasing subequations, both of which must be satisfied for aliasing to take place. IPfPQY R À fPQY I HX I n t h i s e q u a t i o n , P, I IPfPQY R, P ÀfPQY I, and upport etI fIY Pg. We can see that (PY Q) is a possible candidate for mY n and gPY Q PQ.
Suppose that we are given arbitrary values for: 1) all primary inputs except m and n, and 2) g.
Under the above constraints, let v be a permissible value for g. The aliasing equation reduces to a polynomial in m and can be solved to obtain possible values of m for which aliasing occurs. Assuming that at least one of the coefficients in this polynomial is nonzero, the number of solutions is bounded by p m , the degree of the polynomial in m (we later address the case when all the coefficients of the polynomial are zero). For each of the p m values that m can take, g becomes a polynomial in n, which can have at most p n solutions, where p n is the power of the polynomial in n. Thus, overall, there are at most p m Â p n solutions to the aliasing equation for a given value v for g. In the above example, m P and the aliasing equation reduces to the linear equation, P Â g I À g H H. In this case, given a value of g PQ, there is at most one pair of values for PY Q that satisfies the aliasing equation.
Note that the above arguments were made under the condition that g was fixed to a value v. It is possible that there might be several assignments to m and n that result in g assuming the value v. Only p m Â p n of these result in aliasing. Therefore, the probability of aliasing, given that g v, is given by p m Â p n anumer of solutions to g v.
Since in the above analysis we assumed a single value v for g, summing the conditional probability over all possible values of g will result in the probability of aliasing. Let us suppose that v represents the number of solutions to the equation gmY n v. The probability of aliasing can then be derived as follows: 
IP
Note that ekm and ekn evaluate to IaP x for the case of a uniform primary input distribution.
Condition 2 is more difficult to test than Condition 1, but is common in the case of CDFGs with heavy reconvergence. Condition 2 can be stated as follows: If there exists a node n in the CDFG with two primary inputs, m, n, in its transitive fanin such that all paths from m and n to any faulty node pass through n, and if exactly one of the two nodes feeds a path from a faulty node to a primary output at a multiply operation, then the aliasing equation can be split into aliasing subequations. In the CDFG in Fig. 12 , node nI is n. Condition 2 can be easily understood as a variable transformation wherein the fanout edge of n becomes a primary input, taking the place of n. Now, Condition 1 can be applied to the CDFG obtained because m does not appear in the transitive fanin of any faulty node. 
Generalization of Aliasing Conditions
The aliasing conditions were proven in the following manner: We tried to see if the outputs of the faulty operations could provide us with some information about the inputs to the operations and, hence, about the primary inputs. With this information, we tried to see how hard it was to ªguessº correct values for primary inputs so as to make aliasing occur. The extent of our difficulty provided us with an upper bound on the probability of aliasing. The similarity of the reasoning in the two cases suggests a more general aliasing condition of which Conditions 1 and 2 are special cases. In this section, we explore this generalization. The generalized aliasing condition is stated as follows:
Result 3: If there exists any set of primary inputs, p fIY PY F F F Y pg, in the aliasing equation, which satisfies the following conditions: 1) If the argument of any error function in the aliasing equation depends on any member of the set, the dependence is by means of a common subexpression gp, involving all elements of p, and 2) the coefficients of the error functions in the aliasing equation are influenced only by a proper subset, p H fIY PY F F F Y jg, of p, then the following holds: Satisfaction of the aliasing equation implies, with a very high probability, that the terms in the aliasing equation in which coefficients of error functions depend on members of p H must sum to H and the rest of the terms in the aliasing equation must also sum to H.
Proof. Let e denote the event that aliasing occurs. The probability of aliasing is given by
e., the set of elements in p which do not belong to p H . Let m i denote the number of p-tuples fvIY vPY F F F Y vpg such that gpvIY vPY F F F Y vp i. Then, the probability that gp i is given by gp i m i aP xp X IR The probability of aliasing can be upper bounded in the following manner: There exist at most P xjÀI Â wx degp H Â P xpÀjÀI wx degp HH s o l utions to the aliasing equation, given that gp i, where wx degp H is the maximum degree of any member of p H in the aliasing equation and wx degp HH is the maximum degree of any member of p HH in gp. This is because, assuming arbitrary values for all primary inputs other than IY PY F F F , and p, and fixing values for the outputs of faulty operations, the aliasing equation depends only upon IY PY F F F , and j. Therefore, there can exist at most P xjÀI Â wx degp H solutions to the aliasing equation which are j-tuples of the form fIY PY F F F Y jg. For each unique specification of these j variables, there are P xpÀjÀI Â wx degp HH ways of specifying fj IY j PY F F F Y pg so that both the aliasing equation and the condition on gp are satisfied. Since there are m i solutions to the equation gp i, we have:
Then from (13), (14) , and (15), we obtain:
The above analysis assumed the primary inputs of the CDFG to be uniformly distributed. However, this restriction can be easily relaxed, as in the case of Conditions 1 and 2. Let us assume that the primary inputs are independent, but nonuniformly distributed and the probability of the most probable value of primary input i is eki. As in the uniformly distributed case, our starting point is the equation follows that
Therefore,
IU
Note that (17) reduces to (12) if primary inputs are uniformly distributed. t u
Summary
The generalized aliasing condition is more comprehensive than Conditions 1 and 2, but is too expensive to implement in the inner loop of a synthesis algorithm. However, it could be used by a method that does static analysis of aliasing probabilities for selected configurations. The two tests which have been developed are not necessary conditions for aliasing to be extremely improbable. However, they are easy to implement and detect a significant fraction of cases where aliasing is extremely unlikely. In many of the cases in which the two tests failed, we could find error functions f which corresponded to single stuck-at faults which would result in aliasing with a high probability. Conditions 1 and 2 test for the possibility of aliasing at a specific primary output or loopout. A fault in a single functional unit may result in aliasing at multiple outputs and a very low probability of aliasing at one output does not imply a very low probability of aliasing at other outputs. This is because a faulty operation need not cause an error to occur at the output of every operation it feeds. Hence, we have to apply the two tests to all outputs that are fed by operations mapped to the functional unit under consideration. Only if all primary outputs and loopouts pass the test can the candidate operations be mapped to the same functional unit.
Faults in Registers and Multiplexers
In this section, we discuss how register and multiplexer faults are taken care of.
Register faults: The previous section described aliasing analysis procedures for faults in functional units. However, it can be straightforwardly extended to cover register faults. Here, we would like to determine whether a set of variables in the original CDFG and another set of variables in the copy can share a register. As in the case of functional units, we express the erroneous value of a variable stored in the register as f , where is the fault-free value and f is an arbitrary error function. Any subset of variables assigned to a faulty register can get corrupted due to the fault. Therefore, some of the edges in the CDFG are faulty (as opposed to nodes in the case of functional unit faults). The procedure for deriving and analyzing the aliasing equation, however, remains similar to the case of faulty functional units.
A register which holds a variable v in the original cannot store the corresponding variable v in the copy. If it does, then a fault in the register could cause identical errors in the value of v in both the original and the copy. This error cannot be detected by a comparator. This is entirely analogous to the condition that nodes that perform the same operation in the original and the copy cannot map to the same functional unit.
Multiplexer faults: Faults in multiplexers can be classified into those that affect data selection and those that do not affect data selection. Formally, we can characterize them as follows: Consider an n-to-I multiplexer with data inputs s I Y s P Y F F F Y s n and output y. In the first case, we can write the erroneous value at the multiplexer output as:
In the second case, the error is: y err fseleted dt input. Our analysis techniques handle the second class of faults and a subset of the first class. We assume that faults that affect data selection could make the multiplexer select a wrong input, but the output must be a value present at one of the multiplexer inputs. The multiplexer is not permitted to select an incorrect input and alter it.
We use a point-to-point interconnect model. A fault that does not affect data selection is equivalent (from the point of view of aliasing analysis) to a fault in the register or the functional unit that feeds the multiplexer. We make the multiplexers secure against a restricted subclass of the other class of faults, namely, those faults that cause the multiplexer to select a data input that is different from the one that it is supposed to select. It can be shown that a multiplexer routing error causes the controller/data path circuit to behave in a manner that can be modeled by deleting some nodes and edges from the CDFG and adding different nodes and edges in their place. The aliasing equation is derived by equating the expressions for primary outputs and loopouts in the original and modified CDFGs. Analysis of the aliasing equation is then performed as explained for the case of functional units. Note that, in the new CDFGs that we obtain, the original and the copy potentially compute different functions of the primary inputs.
To see this, we observe that if the multiplexer feeds a register, at some time, the multiplexer chooses the output of the wrong functional unit to feed the register. Likewise, it chooses the input for a functional unit from the wrong variable. To evaluate the probability that this will result in a detectable error for an arbitrary input, we need information about functional unit and register assignment. Since it is expensive to do the analysis at every stage of the iterative improvement process, we perform this analysis after synthesis. If the multiplexer has a routing error, the RTL circuit computes an output for a CDFG which is different from the output of the actual CDFG. Once the register and functional unit assignments are known, we can determine the actual function which is computed by the RTL circuit with a multiplexer selection error. Since we assumed a datadominated CDFG, the output is still an arithmetic sum of products expression in the inputs. We would like to see if the outputs of the original and the copy can give identical wrong results. To do this we note that the aliasing equation would take the following form:
where i is a product of some primary inputs. Also suppose that p is the maximum degree of any of the variables in the equation (i.e., p is the highest exponent of any variable in any term of the equation). To evaluate the probability of this equation being satisfied by an arbitrary choice of primary inputs, we make the following observation:
Observation 3. Any equation of form (18) in n variables, with each having a range of fHY IY F F F Y P x À Ig, which is not an identity, will have at most p Ã P xnÀI solutions. To see this, we can simply fix the values of all but one of the variables and solve for the remaining one variable. Since this equation has a degree of at most p, the number of solutions is less than or equal to p. Since there are P xnÀI ways of fixing n À I variables, the number of solutions is less than p Ã P xnÀI .
Observation 4. The probability that a randomly chosen set of primary input patterns will satisfy this equation is paP x . This can easily be seen to be true because the total sample space has a cardinality of P xn . Therefore, the probability in question is just p Ã P xnÀI aP xn , which is paP x .
Therefore, to check whether a specific data selection error creates a high probability of aliasing, we merely have to check in the RTL circuit if the original and the copy CDFGs compute the same function. We can easily check this by simulating the faulty CDFGs for a small number of input patterns. If the value of the output differs for even one input pattern, we can use the earlier observation to guarantee that the probability of aliasing is extremely low. As we mentioned earlier, we do this after the register and functional unit assignment to save CPU time. If the probability of aliasing turns out to be high, we split the functional unit into two to take care of the problem. However, in practice, it was found that multiplexer data selection errors did not pose a problem and we did not have to add hardware after synthesis.
DYNAMIC SECURING
There are two situations where we need to select a set of intermediate results in the CDFG to secure. The first situation occurs in moves of class f where securing is performed to allow resource sharing between the original and the copy in order to minimize area overhead. The second situation is in moves of class g where comparison operations are added in order to improve scheduling freedoms of various operations in the CDFG. We present efficient heuristics to selectively secure operations in the duplicated CDFG for both of the above situations in this section.
Dynamic Securing for Moves of Class f
The dynamic securing problem can be transformed to the following problem: Given two sets of nodes in a CDFG, e and f, we want to find a minimum set of nodes g such that if all edges fed by the nodes in g are removed from the CDFG, then e and f have no common successor. Here, e consists of nodes in the original that map to a functional unit and f contains nodes in the original which correspond to those in the copy that map to the same functional unit. Since the nature of the problem forbids an exact solution, we try to find the minimum number of nodes whose edges should be secured such that e and f have no common successor. Note that we do not directly attempt to minimize the number of comparators. This is because the schedule and assignment change as iterative improvement progresses and sharing configurations that give the best area overheads in one step may not hold in the next. Therefore, we choose to minimize the number of comparison operations. We found that, in practice, this simplification yielded good results and the total area overhead was small.
We first find the nodes in the CDFG which are not successors of either e or f. Since these nodes need not be secured, we remove these nodes from the CDFG. Then, we model the CDFG as an undirected hypergraph 3 in which all the fanout edges of a node are replaced by a single edge. This is motivated by the fact that all outgoing edges of an operation in the CDFG are assigned to the same register. Securing the source node of this hyperedge secures all edges it corresponds to in the CDFG. Therefore, we are interested in finding a minimum set of edges in the hypergraph upon whose removal nodes in the hypergraph corresponding to e and f have no common successor. This minimum edge cut problem was solved in [19] .
To solve this problem, the hypergraph is transformed into a normal graph using the procedure described below. Each edge i of the hypergraph is replaced by two vertices H , also of capacity I. To this graph, we add a source vertex and a sink vertex . Vertex is connected by edges of capacity I to every vertex in the transformed graph which corresponds to a vertex in e. Vertex is connected by edges of capacity I to every vertex corresponding to a vertex in f. Solving the maximum network flow problem [20] from to yields the minimum edge cut of the transformed graph. From the minimum edge cut in the transformed graph, we can infer the minimum edge cut of the hypergraph using the inverse of the mapping described above. This corresponds to a minimum cardinality set of intermediate results in the CDFG to be secured.
The transformation is illustrated in Fig. 13 . Here, the original CDFG is shown at the top. We would like to separate operation nI in the original from operation nP in the copy. To do this, we need to secure intermediate edges so that nI and nP do not have unsecured paths to the same successor, as mentioned in Definition 1. The hypergraph, which corresponds to the original CDFG, is shown in the figure, with unnecessary nodes and edges deleted. In the transformed graph shown in the figure, nodes eIs and eIt correspond to edge eI of the hypergraph. Similarly, nodes ePs and ePt correspond to edge eP of the hypergraph. Nodes nI, nP, and nQ in the transformed graph correspond to nodes nI, nP, and nQ in the hypergraph. The edges in the transformed graph are annotated with their capacities.
Heuristic for Moves of Class g
We now present a method to identify nodes in the CDFG which cause scheduling bottlenecks and secure them with a view to increasing the scheduling freedoms of various operations in the CDFG. We would like to find a node which, when secured, ªfreesº itself and its predecessor nodes to be rescheduled to control steps where the hardware utilization is low, potentially increasing functional unit utilization. Example 4 illustrates an application of a class C move.
To find out which nodes of the CDFG are good candidates to secure, we use the following heuristic: For each node we compute a figure (called the weight) which is a measure of the gain in area that could be obtained by securing the node and rescheduling the operations in the transitive fanin of that node. We compute the weight of a node n as follows: For each node in the transitive fanin of n, we compute the area of the functional unit that it maps to. We sum the areas of all nodes in the transitive fanin of n, including n itself. We then compute the difference between the earliest and latest control steps (span) in which any node from the transitive fanin of n is scheduled. The ratio of the cumulative area to the span of control steps is then computed. From this ratio, we subtract the ratio of the area of a register to the lifetime in cycles of the variable that is generated at node n to get the weight of node n (the subtracted term accounts for the impact of the prolonged lifetime of the variable on register area). The node with the largest positive weight is chosen for securing during a move of class g.
The rationale behind this heuristic is as follows: A node, n, with a large weight implies that it ªlocks awayº a large number of resources. For example, if a node were scheduled in cycle 3 and the nodes in the transitive fanin consume an area of IHH units, then it can be assumed that IHH area units have to be scheduled in the first two cycles. The securing of n frees it and allows it to be scheduled any time before the deadline and in doing so frees the operations in its transitive fanin by increasing their mobility, thus increasing the chance of obtaining a compact architecture. In this example, the IHH units can now be scheduled anytime before the cycle in which n is scheduled. If the freeing of n allowed it to migrate to cycle 10, the nodes in its transitive fanin would have to be scheduled in the first nine cycles, which is considerably more relaxed and more likely to result in a compact architecture. A node which locks away more resources in fewer cycles would clearly be a better candidate for securing than one which locks away fewer resources over more cycles.
Example 12. Consider once again the example CDFG shown in Fig. 4 . Assume that the areas of the multiplier, adder, equality checker, and register are R, I, I, and HXS units, respectively. We now compute the weight of each node in the CDFG. For node nI:
p nin ere ere P rnsitive fnin ere of funtionl unit the node mps to I pn I yle egister overhed Iavifetime of vI Ã egister re I Ã HXS HXS eight I À HXS HXSX Similarly, the weights of nodes nP through nR can be determined to be QXS, RXS, and IXV, respectively. Node nQ, which has the maximum weight, is chosen for securing.
Handling Logical Operations
In this section, we describe how our analysis procedure handles logical operations in the behavioral description. Data-dominated descriptions, such as those that are usually found in DSP algorithms, do not typically contain many logical operations like AND, OR, and NOT. Hence, we do not need to extend aliasing analysis to cover logical operations. Instead, we secure the inputs and outputs of each logical block embedded in the CDFG. This is required because, even if the inputs to a logical block in the original and the copy are different from each other and from the correct value, the outputs of the block may be equal, but incorrect, allowing aliasing to occur. Securing the inputs ensures that any errors in the inputs are not suppressed by the logical block. Also, we do not allow sharing of functional units for logical operations in the original and the copy (within the original or within the copy, such sharing is allowed). This is because, unlike in the case of arithmetic unit faults, faults in logical operations in the original and the copy often do cause aliasing to occur with unacceptably high probabilities. In our synthesis system, though we enforce that logical operations do not share hardware across the original and the copy, the quality of our final solution is not affected because most CDFGs in the target domain contain very few, if any, logical operations.
ILLUSTRATIVE SYNTHESIS EXAMPLE
In this section, we trace the synthesis procedure for a small CDFG to illustrate the integration of aliasing analysis and securing into the synthesis procedure. In addition, we also trace the flow of the algorithm for a case when aliasing analysis is not applied because we require the circuit to have lising H. In the example shown in Fig. 14 , the library is assumed to have exactly one adder, one multiplier, and one TSC equality checker, all of which compute in one cycle. The deadline for all primary outputs is four cycles.
In the initial solution that is fed to the algorithm, each operation is bound to a separate functional unit and each variable to a separate register. In the intermediate solution for ALPS and zero-aliasing cases, sharing between different operations takes place without the use of rescheduling. If we are synthesizing a double modular redundant (DMR) architecture, no further sharing can take place because operations in the original and the copy cannot share functional units. In the case of ALPS, nodes nP and nI can share a functional unit if nP is scheduled in control step 2. Application of aliasing analysis to this configuration results in the following condition for aliasing at primary output yut: fRY S QfIY PX By applying Condition 1 to this equation, we can see that the aliasing probability is IaP x , where x is the bit-width.
The aliasing threshold is chosen as P Â P ÀQP RXTT Â IH ÀIH and we synthesize a QP-bit datapath (x QP). Therefore, lising threshold. Hence, we conclude that sharing of nP and nI requires no comparison operations. This gives the final solution for ALPS as no sequence of moves can result in any improvement in the current solution. If we want a solution with no aliasing, we would have to add a comparison operation at this stage to secure either operation nI or nP. In this example, we chose the latter option, which gave us the final solution for the zero-aliasing case. Further details regarding the working of the area optimizer and the use of iterative improvement for area optimization can be found in [16] .
EXPERIMENTAL RESULTS
We have implemented the behavioral synthesis framework presented in the previous sections, including the aliasing probability analysis techniques and dynamic securing procedures as the program ALPS. ALPS is written in C++. We have performed experiments to evaluate our techniques using several behavioral descriptions of digital signal and image processing applications. ALPS reads in a textual description of the CDFG and performs functional unit selection, scheduling, allocation, and assignment to result in a highly fault-secure RTL circuit that consists of a data path netlist and an FSM description of the controller. The controller is then subject to constrained logic synthesis, as described earlier, to result in a fault-secure implementation. The controller and data path netlists are merged and mapped to the MSU standard cell library (SCMOS 2.2) using the SIS logic synthesis system and then placed and laid out using tools from the Octtools suite.
Past literature in the area of behavioral synthesis for fault security has traditionally tabulated the numbers of registers, functional units, and multiplexers consumed by a behavior and a reduction in these numbers is taken to represent a reduction in area. We, however, present only the final area of the synthesized circuit. This break with tradition is intentional. Our algorithm dynamically trades off different circuit elements, depending upon their effect on the total area of the circuit, e.g., the savings in area obtained by merging two functional units into one, as suggested by a class B move, might well be offset by the overhead in terms of registers and multiplexers. We therefore believe that it is more meaningful to present the final area of the circuit, rather than to present a breakup in terms of the numbers of individual circuit elements.
We conducted experiments on nine CDFGs which perform different DSP algorithms. Among them, Paulin is a widely known benchmark. Chemical, Dist, and IIR77 are IIR filters used in the industry. Dct_dif, Dct_lee, Dct_pr2, and Dct_wang perform the discrete cosine transform and are named after the inventors of their algorithms. FIR is an FIR filter. Of these examples, Paulin, Chemical, and Dist have loops in their behavioral descriptions. The number of operations in the duplicated CDFG ranges from 23 (Paulin) to 111 (Dist). Table 1 presents the results of our experiments. The example name is given in the column Circuit. The circuit was synthesized at four different sampling periods at a bit-width of QP. The laxity factor (L.F.) is the ratio of the given sampling period to the delay of the RTL circuit corresponding to the fastest possible implementation of the CDFG (with the given functional unit library). Major columns 2, 3, 4, and 5 are for circuits synthesized with laxity factors 1.2, 1.7, 2.2, and 2.7, respectively. For each laxity factor, we synthesized the following architectures: 1) an architecture with a duplicated RTL circuit and a TSC equality checker, 2) a fault-secure area-optimized architecture where we enforced the constraint that the probability of aliasing in functional units should be zero, and 3) a faultsecure area-optimized architecture synthesized using ALPS, where the probability of aliasing was restricted to be less than or equal to RXTT Â IH ÀIH . Since we synthesize a QP-bit data path and IaP x PXQQ Â IH ÀIH , the value of k in Condition 1 and the value of p m p n in Condition 2 must be less than or equal to P. The clock cycle time was constrained to PSns. The area in grid ountaIHY HHH 4 is reported for these three architectures under the subcolumns DMR, 0-alias, and ALPS, respectively. The results show that architectures synthesized using ALPS show average area improvements over DMR of 12.8 percent, 22.3 percent, 19.0 percent, and 24.9 percent at laxity factors of 1.2, 1.7, 2.2, and 2.7, respectively, while architectures synthesized with zeroaliasing requirement show improvements of 6.2 percent, 9.7 percent, 10.0 percent, and 12.1 percent for the corresponding laxity factors. On an average, the control logic accounted for approximately 4 percent of the total area of the circuit. The checking hardware accounted for approximately 15 percent, the ªoriginalº elements in the circuits about 47 percent, and the redundant elements about 34 percent. Note that all architectures were required to meet the same sampling period constraint. The results indicate that architectures synthesized using ALPS are more areaefficient than architectures synthesized with an aim to guarantee that aliasing cannot occur or compared to duplicate-and-compare architectures. The CPU times taken up for synthesis were not significant enough to be an issue. Our largest example consumed under IS minutes of CPU time on a SPARC-20 workstation running at TP MHz.
CONCLUSIONS
In this paper, we described a behavioral synthesis framework, called ALPS, for the construction of highly faultsecure, area-efficient RTL circuits. We introduced the concept of aliasing probability analysis to enhance resource sharing among operations in the duplicated CDFG. We showed that our approach results in a significant reduction in the overhead associated with the provision of fault security. Experimental results indicate that fault security can be provided by ALPS with area overheads as low as 25.5 percent (IIR77 at a laxity factor of 2.7) over a circuit synthesized without a requirement of fault security (i.e., over a simplex system).
ACKNOWLEDGMENTS
This work was supported by the US National Science Foundation under Grant No. MIP-9423574. 4 . The grid count represents the number of ! Â ! squares in the circuit's layout. We used the SCMOS 2.2 library for our experiments. 
