A novel approach to the Boolean mapping problem is presented. It relies on a Boolean matching algorithm based on testing techniques. The matching is detected by checking the controllability and observability o f signals in the cell structure against the subject functions of the network. The method was implemented inside the SIS Tou90] synthesis environment. The comparison with the SIS structural mapping shows that the Boolean mapping achieve better results in similar or smaller computing time.
Introduction
Technology mapping is a well know problem in logic synthesis. For multilevel circuits, the starting point for the mapping is a logic optimized network. The network is modeled as a direct acyclic graph (DAG), where to each node is associated a logic function and a logic variable that represents that function in the network. The DAG m a y be seen as a system of logic equations describing the logic behavior of the circuit. The mapping task is to translate this logic system into a network of logic gates that preserves its behavior while trying to optimize some cost criteria like silicon area, circuit delay or power dissipation. A mapped network will be referred here as a netlist.
The mapping may be roughly de ned by two interrelated processes: the network covering and the gate matching. The goal of gate matching is to discover if there is a gate in the library that may implement a given node function, while the aim of the covering is to implement all node functions with a minimum cost. A network is covered if each node function is implemented in the netlist either by a cell, by part of a cell, or by a set of cells.
Supported by the ECC project KIT #144 LLSynth 1 Supported by CAPES-Brazil 1 To simplify the covering step, mapping tools like that used in SIS decompose the network into a set of simpler graphs (referred to as subject graphs) that are covered independently.
The covering of a network may be modeled as a generic graph covering problem, which is a known di cult problem. An exact algorithm for solving it must handle the binate covering problem, which i s t o o c o s t l y . Several heuristic approaches have been proposed in the past. They fall into three main categories: rule based approach DBG + 84], algorithm methods Keu87, KDN92, S a v92, MM93a] , and stochastic techniques. Rule based systems like DBG + 84] try to cover a network by applying local transformations that reduce the circuit complexity. Its drawback is that the method is too library dependent a n d a l l o ws only local optimizations. The algorithmic approaches Keu87, KDN92, Sav92, MM93a] produce usually better results because they rely on global optimizations. Stochastic methods also provide global techniques but are based on space search navigation, such as genetic algorithms.
The covering adopted in this work is based on the algorithmic approach. A common heuristic adopted in this case is to partition the network into trees and to compute the optimal tree covering using dynamic programming techniques, which may be done in linear time. This method works well specially because most cells in standard cell libraries have tree-like structures. However, more complex cells have a non-tree structure and are not tackle by this approach. Systems like SIS Tou90] that are based on tree mapping adopt ad hoc solutions to deal with such cells. Boolean matching techniques overcome this limitation, providing better results.
Structural matching and Boolean matching are the two major approaches to solve t h e cell matching problem. Structural matching relies on the equivalence veri cation between the cell structure (called pattern graphs) and the node function structure. In order to be comparable, both cell and node function must be cast to a common representation. The usual approach is to decompose them into simple 2 input gates such as NOR2 or AND2. The cell matching in this case reduces to checking for graph isomorphism. One drawback of the structural matching is that one library cell may have several equivalent decompositions. Thus, the matching for such cell must consider all possible alternatives which may becostly. Since graph isomorphism detection is a complex task, library cells are represented by trees. This leads to another handicap of this method, which is the unability to represent cells with internal fanout. Finally, another problem with structural representation is that they can not work with incompletely speci ed functions. All these problems may beuniformly treated by using Boolean matching.
Boolean matching relies on the detection of logic equivalence between Boolean functions instead of graph isomorphism. Let f(x 1 : : : x n ) be the function of a node in the Boolean network, and g(y 1 : : : y n ) the function of a cell in the library. Boolean matching may b e checked by answering the next question:
Does there exist an input variable permutation and an input phase assignment such that:
In other words, is there an input variable permutation and an input phase assignment of x for which cell g produces function f or its complement ? The function f may be incompletely speci ed, but the function g (the library cell) is always completely speci ed. Since there is (n!) permutations of x, (2 n ) phase assignments and 2 possible functions (direct and complemented), the total numberofvariants to be considered are: (n! 2 n 2).
2
This number becomes quickly intractable if we consider the exhaustive c hecking. However, Boolean matching can use practical lters BM95, MMM95, Sav92, MM93a, CMS93, MM93b, TZ96] that drastically reduce the numberof veri cations to be considered.
In this work we present a Boolean mapping algorithm based on a new Boolean matching approach proposed by TZ96]. It is based on testing techniques that analyze an arbitrary cell structural decomposition in terms of observability a n d controllability f u n ctions and compares it to a binary decision diagram (BDD) representation of the node functions. The idea is to verify if the observability function of the cell inputs at the cell output are equivalent to the observability function of the node function inputs derived from the BDD.
The mapping was implemented in SIS environment, with e cient C++ Library (LEDA). We describe the SIS approach and compare the matching techniques, showing that the Boolean matching may deal e ciently with tree representations as well as non-tree cells such as MUX and XOR. 
where is the exclusive-or operator. This function is also called the observation function of the input variable x i . The support of a function f is the set sup(f) of variables that f explicitly depends on. An input p of a logic gate is said to have a controlling value (C t r (p)) if its value prevents other inputs from a ecting the output of the gate. The controlling value for AND-type gates (ANDs and NANDs) is 0, for OR-type gates (ORs and NORs) it is 1, a n d no controlling value exists for inverters and EXOR gates.
3 Technology Mapping in SIS SIS uses a pattern-covering algorithm to map arbitrary complex logic functions into cells speci ed in a technology library (the technology library is given in genlib format). This is done by decomposing the logic circuit to bemapped into a network of 2-input NAND (or NOR) gates and inverters. The covering algorithm splits the network into a forest of trees. These trees are then covered by patterns that represent the cells in the library. The covering phase is guided by an optimization criterion that may b e the circuit area, delay or power.
In the dynamic programming approach used in SIS the trees are covered from the inputs to the outputs. Each node of the subject graph is the root of an acyclic subgraph that contains all nodes from the root up to the primary inputs of the network. All the where node is the subject graph root, primitive is a primitive class and options de nes the fanout criteria. A primitive class contains some informations about equivalent pattern graphs. The fanout criteria allow the user to change the fanout handling by t h e covering and matching algorithms. Cost function is used to compute and to record the match cost. SIS handle the input phase assignments by introducing inverters pairs at each arc of the subject graphs and pattern graphs. This increases the size of the subject graph from k to 3k nodes, and penalizes the matching performance.
Improving SIS with Boolean matching
We can improve the mapping in SIS using a Boolean matching algorithm. Boolean matching can nd matches that are not detected by structural methods and may exploit the 
Boolean Covering
In the tree covering adopted by SIS the network is traversed from the inputs to outputs and, for each node, all the patterns in the library are checked for matching. In this case, the time complexity of the mapping is closely related to the size of the library. The Boolean mapping proceeds in a di erent w ay. The nodes are grouped in clusters from the outputs to the inputs. A cluster is a connected sub-graph of the subject graph, having only one vertex with zero out-degree (a single output). It is characterized by its depth ( longest path from the root to a leaf) and numberofinputs. To e a c h cluster is associated a cluster function, obtained by collapsing the nodes it contains. Then the cluster function is matched against a subset of the gates that are ltered out of the library through signature and symmetry analysis. In this case, the time complexity is a ected by the number of clusters generated at each node. The cluster generation is illustrated by the following example.
Example 2 Consider the subject graph shown in gure 4. The base function for the decomposition of the subject graph are 2-input AND and OR functions. We consider the 2 If there is a permutation operator P and complementation operators N i N o , such that f (x) = N o g(P N i x) is a tautology, then f and g belongs to the same N P N class. 5
clusters rooted at vertex o of the subject graph and shown by di erent dot lines in the picture. It is the same subject graph shown in example 1 that it is reduced by inverters handle. One advantage of Boolean covering is the possibility to extend the matching possibilities by using the network don't cares. Two main approaches for the Boolean covering problem with don't cares were developed in the literature. In Ceres, Mailhot MM93a] generates all clusters up to 6 inputs, limiting the logic depth to 5. For each cluster a BDD is generated and a BDD matching algorithm is applied. A balanced tree with depth 5 may have up to 32 inputs. Generating all clusters of the network with that depth may be too costly for the mapping process. Ceres reduces this complexity b y considering only the longest depth in the cluster. Don't care handling in Ceres is restricted to 4 input variables. In the benchmark presented in MM93a] using a standard cell library without don't cares, Ceres reported a 4% gain in area but running 2 times slower than SIS. An alternative method was proposed by S a voj S a v92]. It is based on the tautology check by BDD computation but does not restrict the cluster size of incompletely de ned functions. However, the use of the full don't care set for matching may slow down the mapping by two orders of magnitude. Benchmarks comparisons between SIS mapping and Savoj ' s Boolean covering without don't cares reported a 8% gain for Savoj ,but with a 5 times performance degradation.
The covering algorithm implemented in our package is based on clustering process enhanced to deal with reconvergent fanout. Reconvergent fanout is a particular situation where a signal fanouts to two cells whose fanouts merge later in network. In some cases, this cluster may b e m a t c hed against a single complex cell. Tree covering fails in recognizing this situation.
The algorithm assumes that the network has been partitioned into subject graphs and decomposed into base functions beforehand. Its pseudo-code is shown in gure 5.
The parameter top is the root node for the covering process. The recursive algorithm computes all clusters for each node in the network rooted at top and try to nd the best set of matches that reduces the cost function. The number of inputs of a cluster is used as a lter both to the clustering and to the matching processes. is used for this purpose. The cluster generation process will stop when the number of cluster inputs reaches this value. Thus, this allows a larger matching space exploration.
The parameter nodes is the set of nodes contained in the current cluster, and inputs are the cluster's inputs. The algorithm uses the Leda data type set to store this information. Care should betaken during the clustering in order to not generate duplicated clusters. This is controlled here by marking the already collapsed nodes.
The covering algorithm attempts to match e a c h cluster function to a library element. The cost a matching is given by the cost of the selected cell plus the cost of matching the clusters inputs. The problem of the selection of the input and output phases for each match is also considered at this stage. For each mapped vertex v, t wo library elements are stored: F on matches the ON set of the cluster, while F of f matches its OFF set. If variable v is already used as an input to another mapped cluster, then F on or F of f is selected according to the required polarity of v. If the same variable v is needed at a later stage with the opposite polarity, t h e n a n i n verter is automatically inserted at its output. There are other parameters that heuristically control the performance of the mapping. The covering may be set to not collapse reconvergent fanouts. This is usefull when the library contains only tree-like cells because it provides faster results. Another parameter, the match threshold, controls the amount of successful matches accepted. The matching process starts after all clusters are generated and takes rst the larger ones. Restricting the number of matches will reduce the time for covering by discarding the smaller clusters, which improves the covering speed but with a little area increase.
Boolean Matching
In the last years the Boolean Matching problem received a lot of attention. A recent and comprehensive survey of Boolean Matching methods was presented in BM95]. For incompletely speci ed functions, the rst algorithm for detecting a match under input variable permutation and/or input and output phase assignment was proposed by Mailhot MM93a]. This algorithm uses compatible graphs to solve this problem. The drawback is that the size of these graphs is exponential in the numberofinputsvariables, and their application is thus limited to functions with a maximum of four inputs. Savoj Sav92] presented a method based on the tautology check. The main problem in this approach i s to nd the variable assignment. Savojintroduced a class of lters that are valid even for incompletely speci ed functions. Lastly, a new approach, using multi-valued functions, was presented by Wang WH95] , in which functions f and g are represented by sum of products. The computational complexity of this method depends on the cardinalities of the sum of products to be handled.
The method presented here is an extension of the controlling value matching presented in TZ96]. For ease of explanation, we will start with only tree netlist without don't cares, then we will extend to match with don't care conditions and, lastly over any D AG structures. The properties of this method allows to consider only a subset of input variables, and prune the permutation tree as soon as these variables are rejected. Moreover, when a correct input variable permutation -even partial -is found, the corresponding input phase assignment can be directly deduced : a total of 2 n possible input phase assignments is saved.
Controlling Value Boolean Matching
The method is based on the equivalence of observation functions for equivalent functions (observability equivalence). These observation functions are applied on the proposed cell, and its structure is scanned from the external inputs to the output. The cell g is represented by a n e q u i v alent netlist composed only with AND-,OR-or inverter-type gates (the structure of the cell). Inside the structure, every elementary gate is checked using the controlling value paradigm (controlling value check). To continue the scanning along the structure, we have to determine the observation functions of the internal signal lines in the netlist equivalent of the cell. This step is called Observation Function Deduction. If there are multiple fanouts in the circuit, no exact deduction is possible, and only heuristics TZ96, TF96] could beapplied.
Observability equivalence
The library cell function g, represented by the circuit C , is proposed as an implementation of the function f. To locate the design errors within the circuit C, its primary inputs are initialized with the observation functions derived from f. The only acceptable errors here are missing inverters (wrong phase assignments) or wire exchanges at primary inputs (input variable permutations). For any other detected error, the library cell g is not a match for the given function f. In other words, f and g must have the same observation functions. For any input variable permutation, we must nd the phase of each input of the cell g which satis es these initial observation functions. If no such input phase assignment exist, the input variable permutation is not correct -o r this cell is not a m a t c h.
The structure of the cell g is analyzed from the primary inputs to the primary output, following a \scan line" ( gure 6a). The analysis consists of two basic phases: controlling value analysis and observation function deduction.
Controlling Value Analysis
We will start by restricting to the analysis of a library cell g, which has a tree-structure equivalent netlist. The controlling value analysis is used to check the correctness of each logic gate in this netlist. Let us consider a logic gate g i in the cell g with two inputs p,q and one output r ( gure 6a). We assume that the gate g i is an AND-type gate or an ORtype gate, and that the observation functions of the gate inputs p and q with respect to the required f are known. If the gate input p is a primary input, its observation function is initialized with the one computed directly from f, otherwise it has to be deduced from the observation function of its fan-ins (see Section 4.3.3).
The observation function of a signal line indicates the condition for which a n y c hange at the signal line can be observed. If the computed values for a proposed permutation are not coherent with the nature of the gate analyzed, this permutation has to be rejected.
Knowing the correct gate inputs, we have to check if inverters are needed at some inputs. If an input p of an AND gate has the controlling value 1 (C t r (p) = 1), we need to insert an inverter at this input. If it is a primary input, the input phase is determined. If it is not, and if this inverter cannot beequivalently transferred as missing inverters at primary inputs or at the primary output, the current input variable permutation is not correct.
Example 3 Let us rst check the partial permutation x 2 ! y 2 and x 3 ! y 1 ( gure 6c). At gate 1, f x 3 x 2 6 = 0 and f x 3 x 2 6 = 0 : the input x 3 could be observed for any logic value of the input x 2 . This is not possible because gate 1 is an AND gate, and this permutation must be rejected. Note that the input y 3 of cell g has not yet been assigned: we consider only a subset of the input variables. Now, we will try the partial input variable permutation x 1 ! y 1 and x 2 ! y 2 ( gure 6d), which gives f x 2 x 1 = 0. The gate input x 1 has the controlling value 1 (AND gate): an inverter is to be added at input x 1 . For x 2 , f x 1 x 2 = 0 : the controlling value is 0 (AND gate) and the phase is correct. The phase assignment ( gure 6e) is thus directly determined from this analysis: (y 1 y 2 ) = ( 1 0) .
Observation Function Deduction
To c o n tinue the scanning along the structure, we h a ve to determine the observation functions of the internal signal lines in the netlist equivalent of the cell. These will be deduced from the initial observation functions set at the primary inputs. We will show that the deduction approach is an approximation but gives an exact controlling check value for tree circuits.
Let us consider a tree circuit, and some particular gates g i and g j ( gure 6a). If p and q are correct inputs of gate g i , we want to deduce the observation function of its output r. We propose to approximate the observation function by: f r f r = f p + f q
What we want to validate is that the check of the signal line r may be conserved with this function. Suppose that r is one input of the gate g j , which other input is s. We could Consider now a function f which is incompletely speci ed. We have to de ne the incompletely speci ed observation functions. All what is important is the on-sets of these observation functions. We will start by restricting to the matching on tree structures.
De nition 1 Let f x i and f x i D C be respectively the on-set and the don't care set of the observation functions with respect to an input variable x i , of an incompletely speci ed function f, with don't care set f D C :
What we w ant to establish is that, if the controlling value check is false, there cannot beamatch. So we will never reject acceptable solutions. If there are multiple internal fanouts in the library cell, the observation function deduction is more complex. If we k n o w the observation functions for all the fanout branches, we can deduce the observation function of a fanout stem Mic94]. The deduction is computed by a simple network traversal, from the output to the inputs. But in the matching process, we w ant to deduce the observation function of the fanout branches from that of the fanout stem, i.e. from the inputs to the outputs. Normally, n o exact deduction is possible, and only heuristics could beapplied. We will propose a new one, which can beproven as an approximation of the exact expression. The deduction of the observation function from output to inputs (reverse deduction) is developed by Damiani Mic94], based on the concept of the perturbed network. In the network of gures 7a and 7b , the signal line h has two fanout stems: h 1 and h 2 . The observation function can be computed as:
Results
We ran a set of benchmarks from the MCNC suite to evaluate our Boolean Mapping approach compared to the SIS structural one, on a Ultra Sparc I workstation. The version of SIS modi ed to include our Boolean mapping is referred here as Land. All benchmarks, except c432 c m 150a x2 t t t 2, were optimized with the rugged script followed by a call to full simplify. The Boolean matching was processed without don't cares. Table 1 presents a comparison between Land and SIS for the library example.genlib, which is distributed with the SIS package. It has gates up to 4 inputs. The cell area is estimated by the literal count of its sum-of-products representation. Column 2 (nodes) shows the number of 2-input base gates of the circuit after technology decomposition, just before mapping. Column 4 (7) presents the total area of the mapped circuit and column 3 (6) the cell count used by SIS (Land). Column 5 (8) shows the CPU time in seconds. The Depth parameter limits the logic depth of the clusters generated in the Boolean covering. In summary, Land presents a 4 % gain in area while being 2.7 times faster than SIS. On this simple library, we add a XOR3 gate just to illustrate the dependence of the structural matching on the type of the library cells. The mapping time with this new library is depicted in columns 9 and 10. Average1 is the relative gain in time including benchmark ttt2, while Average2 is the relative gain excluding ttt2, which is a pathological case for SIS. Land performance remained stable on both cases, with 1% to 2% of performance degradation due to the inclusion of XOR3. SIS performance was reduced by 30% (without ttt2) t o 351% (with ttt2). Table 2 and 3 show the results of mapping the benchmarks with syncho.genlib, which is also distributed with SIS. It is a standard cell library which includes XOR2, 2-1 MUX and tree like cells up to 8 inputs. Table organization is similar to 1. For a depth of 2 (table 2) Land generates all clusters with that logic depth. This means that we may have gates up to 4 inputs. Land reported an average area gain of 7% and was in average 3 times faster than SIS (columns 6, 7 and 8). For a depth of 3, the maximum number of inputs is 8 (tree like cluster). In this case, Land reported an average gain of 9% while being 20% faster than SIS (columns 9, 10 and 11). Table 3 presents the same benchmarks but using a di erent heuristic to restrict the cluster generation. Columns 6, 7 and 8 refers to Boolean covering with clusters restricted to 5 inputs, while in columns 9, 10 and 11 this limit is set to 6 inputs. In both cases Land is about 9% better in area while the computing time gets closer to SIS one. The time di erence with respect to the depth limited clustering is explained by the data structures used in the clustering process. The depth limited clustering uses a hash table to keep track of the clusters generated, while the input limited clustering uses a set data type, Average 100% 100% 100% 94% 93% 29% 91% 91% 79% Table 2 : Library synch.genlib on depth cluster which turned to be less e ective.
Conclusion
This papers presents an e cient algorithm for Boolean mapping based on a fast Boolean matching approach. The Boolean matching is based on testing techniques which prune the space search of the matching problem by early detection of unsuccessful matches. Applied to the area minimization problem, the benchmarks have shown both an area and performance gain with respect to the structural mapping of SIS. Several improvements may still be done and are subject of future work. The main cost in the Boolean mapping is related to the relative large amount of clusters that may be generated and evaluated. While libraries present gates with a large number of inputs, those gates are not frequently used and account for a performance degradation. In those cases, better ltering techniques will increase the mapping speed. Because this new technique may handle don't cares, we will consider in the future the don't care processing, in order to achieve low power mapping.
