In this paper, we present a new approach to the problem of inverter elimination in domino logic synthesis. A small piece of static CMOS logic is introduced to the circuit to avoid significant area penalty resulting from duplication. To maximize the domino logic part and to minimize the static CMOS logic part, a generalized ATPG based logic transformation is proposed to eliminate or relocate a target inverter. Based on the new concept of dominating set of mandatory assignment (DSMA) and the corresponding implication graph, we propose algorithms to identify a minimum candidate set for a target inverter. Experimental results show that logic transformation based on implication graph can reduce transistor counts by 25% and power delay product by 25% on average.
Introduction
Dynamic logics such as domino logic [8] can be a substitute for static CMOS logic in high performance control logic as well as in data path design. Such substitutions could lead to reduction in delay as well as silicon area. Despite the benefits provided by domino logic circuits, the monotonic property of domino logic imposes substantial limitations on the synthesis of an inverting logic. The monotonic property originates from the intrinsic operation of the domino logic which allows at most one low-to-high transition at the output buffer in the evaluation phase.
To overcome such limitations, attempts have been made to minimize inverters by the method of output phase assignment [5] . However, some inverters cannot be swept away this way, because they are trapped in a reconvergent loop as illustrated in Figure 1 . Most of the conventional techniques to eliminate trapped inverters are based on duplication of subcircuits. There have been efforts to minimize the area of duplicated logic [12] , or the power consumption [9] by way of output phase assignment. Satisfiability don't-care optimization was used in [11] . There have been approaches to implementing an inverting logic in circuit level, such as dual rail logic [4] , clockdelayed domino logic [14] , and clock-and-data precharged dynamic circuit [10] .
In essence, duplication-based inverter elimination approaches are extensions of the bubble pushing algorithm. Every transitive 
) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) F  F I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I fanin gate of a trapped inverter is duplicated in the network, and then inverters are pushed towards primary inputs by applying DeMorgan's theorem. Area penalty is obviously significant in duplication-based inverter elimination. Along with duplicated transistors, considerable amount of interconnecting wires is required. Thus, the routing and placement problems become more complex, which in turn result in performance degradation.
To overcome such significant penalties caused by duplication, we allow some inverters to be included in a part of the logic network. The monotonic property of domino logic prohibits the implementation of the transitive fanout subcircuit of an inverter in traditional domino logic. Because static CMOS logic allows both lowto-high and high-to-low transitions, the transitive fanout subcircuit of an inverter can be synthesized with static CMOS logic. In our approach, the circuit is composed of a dominant domino logic portion followed by a small static CMOS logic portion. Small piece of static CMOS logic in the circuit is to avoid significant area penalty due to duplication. Figure 2 shows the resynthesized network when our proposed transformation technique is applied to the circuit in Figure 1 . All the gates are implemented with domino logic except gate P which will be implemented with static CMOS logic. Clearly, the number of transistors is significantly reduced compared to duplication-based inverter removal.
Our ultimate goal is to maximize the advantages of domino logic and at the same time to minimize the size of the compensating static CMOS logic. Logic transformation enables us either to eliminate 0-7803-5832-X /99/$10.00 ©1999 IEEE. the inverters, or to relocate the inverters as close to the primary outputs as possible so that the CMOS subcircuit is kept small.
For elimination or relocation of inverters, we propose a generalized automatic test pattern generation (ATPG) based logic resynthesis technique which will be presented in section 2. Experimental results and conclusion follow in section 3 and section 4.
Inverter Elimination using ATPG-based
Logic Transformation
Problem Formulation
Without loss of generality, we assume that a logic network is a combinational circuit consisting of 2-input AND gates, 2-input OR gates, and inverters. The complementary signal of each primary input is assumed to be available. Logic transformation enables us to restructure a circuit such that an inverter output signal can be replaced by an alternative signal. ATPG techniques can be used to identify redundant signals as candidates for substitution by performing fault excitation and propagation [1, 2] .
The transformation is carried out as follows. ) can also replace without modifying the functionality of the circuit in Figure 1 . Most of the conventional ATPG-based logic transformation approaches have focused on single line rewiring. In [1] , identification of multiple signals as a candidate set is carried out.
We introduce the notion of a minimum candidate set, and propose systematic approaches to identifying minimum candidate sets in the following sections. Our new approaches generalize traditional ATPG-based logic transformation.
Definitions and Notations
Throughout this paper, denotes the output signal of the target inverter to be removed. 
Minimum Candidate Set
Now we are ready to describe the property of a minimum candidate set. The proof of the following lemma and theorems are shown in [7] . . In other words, the goal is now to choose a minimum set of variables and associated values which assert that the output will have the controlling value of P . We propose two algorithms based on the notion of implication graph defined in the following section. 
Minimum Candidate Set Algorithms based on

SMA Implication Graph
To determine a minimum n i j z k , we use (1) the number of transistors for generating an alternative signal and (2) signal propagation delay as figures of merit in choosing variables from the i j z k . In other words, we choose the candidate set with minimum cardinality first. If there is a tie, an early signal is preferred. By definition, a n i j z k is represented in an implication graph as follows. Let u y ¤ be a set of nodes corresponding to a n i j z k
. Then, there exists at least one path including a subset of u y ¤ between each source node and the sink node.
To reduce computational complexity, we simplify the implication graph in the following way: 
Node collapsing helps to improve the signal propagation delay while keeping the transistor counts unchanged. 
Redundant Edge Elimination
Optimal Algorithm for Minimum Candidate Set
In general, the minimum n i j l k problem is transformed into the problem of identifying a minimum set of nodes which covers all source nodes. The set covering problem is known to be NP-hard [3] .
In the special case where the implication graph is a tree with a node set u and an edge set , the optimal solution can be obtained by the min-cut algorithm. The algorithm computes the cut of minimum weight with time complexity¸(
) [13] . For this purpose, we exploit a node-splitting technique, which is used to find a min-cut for nodes rather than edges. Each node is divided into two nodes that are connected by an edge which will be weighted by a finite value. Other edges are assigned to infinity. Then the min-cut in the transformed graph gives an optimal solution.
Heuristic Algorithm for Minimum Candidate Set
The general problem of finding a minimum candidate set becomes more complex when the graph contains nodes with multiple outgoing edges representing multiple fanouts in the circuit as well as nodes which have no path to the sink node. (These nodes are called bridge nodes).
To apply the min-cut algorithm, we use the following heuristics to change the graph into a tree.
Multiple Outgoing Edges
In some cases, a node with multiple outgoing edges precludes the possibility of obtaining a better candidate set. To remove such limitation, we use the following heuristics. First, we remove all outgoing edges except one incident with a node at the lowest level. This heuristic is used to put a preference on the earliest fanout node. Secondly, if the sink nodes of all outgoing edges are located at the same level, choose the edge incident with the node with the largest number of incident edges. (It corresponds to the greedy approximation for the set covering problem.) Bridge Nodes Bridge nodes provide opportunities for obtaining better candidate sets. To take advantage of bridge nodes, we use the following heuristic. First, all connected bridge nodes are collapsed into one supernode Figure 4 (a) and applying heuristics for a bridge node ( ) lead to the implication graph in Figure 4(b) . In order to apply the min-cut algorithm, the implication graph in Figure 4 (b) is modified to become that in Figure 4 (c) by node-splitting. The weighting function Á Â m Ã q will be properly specified according to the objective of logic transformation such as switching activity for low power. The min-cut algorithm produces the minimum candidate set
for unity weight function for the implication graph in Figure 4 (c).
AE
Experimental Results
The described methods have been implemented. First, each circuit in the benchmark circuit set was optimized using SIS, and inverters are minimized based on the technique proposed in [5] . Then 
The ratio of the transistor counts for static CMOS logic part to the transistor counts for the whole circuit trapped inverters are eliminated using duplication and our algorithms.
i j z k of the target signal is obtained by using the technique in [15] . Area and delay are measured using SIS after technology mapping. Power consumed by the circuit is measured using FIT power estimation tool [6] , whose accuracy of the power estimation is within 5% from HSPICE. Table 1 compares the results of inverter elimination by duplication, our implication graph (IG) based optimal algorithm and heuristic algorithm for the ISCAS-85 circuits. In comparison with the duplication approach, the IG optimal algorithm reduced the area by 25%, while the delay increases by about 3% on the average. We note that the power reduction is significant. The circuits consume 27% less power. Comparing the performance of the IG heuristic algorithm with that of the IG optimal algorithm, we note that computations are carried out within reasonable time without significant degradation of area, delay, and power.
The column denoted "CMOS" corresponds to the number of transistors in the static CMOS logic divided by the number of transistors in the whole circuit. On the average, 10% of the transistors used in the static CMOS logic. We note that a larger number of trapped inverters causes increases in the size of the static CMOS logic as in the circuit C6288.
Conclusions
This work was motivated by the observation that duplication-based inverter elimination causes expensive area and power penalty. To overcome such penalties, we proposed a hybrid circuit containing both domino logic and compensating static CMOS logic.
To minimize the CMOS logic and to maximize the benefits of domino logic, we proposed a general ATPG-based approaches for the elimination of inverters. Our new algorithms enable us to map the set of mandatory assignments into the implication graph, and to identify minimum candidate sets for inverter replacement from the implication graph.
Our experimental results show that area is reduced by 25% on the average. The transformed circuits also show a 25% reduction in power delay product compared with the duplicated circuits.
Note that our approach to identifying minimum candidate set is also useful for logic resynthesis. Equipped with the proper weighting function, such as switching activity, this technique can determine an optimal solution for low power in a reasonable amount of computational time.
