We then generalize the method to larger subsets of tmate gates. Beeauae shntdtaneoua opttmizatton of local functions ean take place, our methods are more powerful and general than Boolean optimization methods using don t cares , where only single-gate opti.
Introduction
Logic synthesis has been traditionally divided into two-level and multiple-level synthesis. Two-level synthesis has been intensely researched from theoretical and engineering perspectives, and efficient rdgorithms for exaet and approximate solutions are available [1] Exact optimization algorithms for multiple-level logic networks have also been considered [2] . They are, however, generally impractical even for medium-sized networks. For this reason, many efficient approximation algorithms have been developed over the paat decade. Such algorithms can be classified according to the algebraic/Booleau type of operations they perform. Algebraic techniques, such as factoring and kemeling, are described in [3] .
As algebraic methods do not take full advantage of the properties of Boolean algebr~a spectrum of Boolean optimization techniques has been developed in parallel. Such techniques consist mainly of iteratively refining an initial network by identifying subnetworks to be optimized, deriving their associated degrees of freedom (expressed by so-called don't care conditions), and replacing such subnetworks by simpler, optimized ones.
The independent optimization of the local function of a network, called single-gate optimization, lies at one end of the spectrnm It has been shown [4, 5] that the degrees of freedom associated to a single gate cart be represented by a don't care set. Once this set is obtained, two-level synthesis algorithms can be used to optimize the subnetwork [5] .
The concurrent optimization of several local functions, called multiple-gate optimization, lies at the other end of the speetrum Such methods have been shown to offer potentially better quality networks as compared to siogle-gate optimization because the degrees of freedom of multiple gates are used simultaneously.
Heuristic approximations to multiple-gate optimization include the use of compatible don 't cares [4] which allows us to extend don 't care based optimization to multiple functions by restricting the don't care sets themselves.
Although such methods are applicable to large networks, the restriction placed on don't care sets reduees the degrees of freedom and hence possibly the quality of the results. Exact methods for multiple-gate optimization, tirst analyzed in [6] , have been shown to best exploit the degrees of freedom. Unfortunately these methods suffer from two major disadvantages. Fira4 even for small subnetworka, the number of primes that have to be derived can be remarkably large; second, given the set of primes, it entails the solution of an often wmplex birtate covering problem, for which efficient algorithms are still the subject of investigation. As a result the overall efficiency of the method is limited, and ooly relatively small networka can cumently he handled.
The binate nature of the covering problem arises essentially from the arbitrariness of the subnetwork selected for optimization.
In this paper, we develop alternative techniques for the optimization of multiple-output subnetiorka. These techniques are based upon a temporaty transformation of a network into an internally unate one, and on an accurate choice of the subnetworks to be optimized. The difficult binate covering step is avoided, and yet so optitnization quality superior to don't cam -baaed methods with comparable efficiency is achieved because multiple loesl functions can be optimized simultaneously.
To this regard, first we introduce the notion of compatible set of gates as a subset of gates whose optimization cao be solved exactly by classical two-level synthesis algorithms. We show that the simultaneous optimization of compatible gates allows us to reach optimal solutions not achievable by conventional don 't care methods. We then leverage upon these results and present an algorithm for the optimization of more general subnetworks in an internally unate network.
The algorithms have been implemented and tested on several benchmark circuits, aod the results in terms of literal savings as well as CPU time are very prmnisiug.
Terminology
LetB denote the Boolean set {O, 1}. A kdimensional Boolean vector x= [zl, . . . , z~] ~is an element of the set f3k (bold-faeittg is hereafter used to denote vector quantities. IO particular, the symbol 1 denotea a vector whose components are all 1).
A ni-inpu~no-output Boolean function F is a mapping F:
is the function expressed by a variable or ita complement.
A cube c is the product of some litersla. A gate refers to the local function that the output of the gate represents. A Boolean function can also be represented as a set using its miuterms of the ON-act. For the rest of the paper, we interchangeably usc set (~,~) and function (+,. ) notations to deaeribe Boolean functions for notational convenience.
The cofactors (or residues) of a function F with respect to a variable z, are the functions F=, =F(z I, . . . ,x, = 1, . . . ,z~) and F=/ =F(zI,... A function F' positive unate in a variable z, can always be expressed witbout using the literal z~ [7] .
The desired terminal behavior of a combinational network is specified by two functions, ON(x) and DC(x), the latter in particular representing the input combinations that either do not occur or such that the vahre of some of the network outputs is regarded as irrelevant [5] .
The functions ON and DC identify the set of possible terminal behaviors for the network: s citications are met by an implemenr tation, realizing a function F x) if and only if F(x)=ON(X) for every input x not in DC.
Another 
Any such cube is termed an implkant. An implieant is termed prime if no literal can be removed from it without violating the inequality (3). For the purpose of logic design, only prime impIicants need be considered [7, 1] . Each implicant c k has a cost~k associated to i< which depends on the technology under consideration. For example, in PLA minimization all implicants take the same are% and therefore have identical cost; in a multiple-level contexti the number of Iiterals can be taken as cost measure [3] . The cost of a sum of implicants is usually taken as the sum of the individual Costa.
Once tbe list of primes has been built, a minimum-cost cover of 
k= 1 k=l where the Boolean parameters w k are used in this context to parametrize the search space: they are set to 1 if c k appears in the cover, and to O otherwise. 'Ihe approach is extended easily to the synthesis of multiple-output circuits by defining multiple-output primes [7, 1] . A multiple-output prime is a prime of the product of some components of F~~=.
These components are termed the influence set of the prime.
Branch-and-bound methods can be used to solve exactly the problem.
Engineering solutions have been thoroughly analyzed, for example, in [1] , and have made two-level synthesis feasible for very large circuits.
Eq. (4) can be rewritten as
The left-hand side of Eq. (5) represents a Boolean function F. of the parametem a i only; the constraint equation (4) is therefore equivalent to F. =1.
The conversion of Eq. (4) into Eq. (6) is known in the literature as Petrick's method [7] . Two properties of two-level synthesis are worth remarking in the context of this paper. First once the list of primes has been built, we are guaranteed that no solution will violate the upper bound in Eq. (1), so that only the lower bound needs to be considered (as explicated by Eq. (4)). Similarly, only the upper bound needs to be considered during the extraction of primes. Second, the effect of addirtg/removing a cube from a partial cover of F rrain is always predictable: that partial cover is increaseddecreased.
This property eases the problem of sifting the primes during the covering step, and it is reflected by the unateness of F-: intuitively, by switching any parameter CY, from O to 1, we cannot decrease the chances of satisfying Eq. (6). These are important attributes of the problem that need to be preserved in its generalizations. Don't care -based Multipk-level Optimization llvo-level optimization is the basic engine in don't care -based multiple-level logic optimization, where it is used to iteratively optimize single-output gates in the network. Consider a single-utput subnetwork, with local output y, to be re-synthesized.
The primary output F of the overall network can be expressed in terms of the signal y:
By replacing I@ (7) in Eq. (l), it follows that y must satisfy 
Eq. (9) can be rewritten as F~.=F9/~yl~F~a= + F; .
(lo)
Similarly, the lower bound holds if and only if Fy, + VI~F~,n and FY + y'1~Frnin, i.e.
FminF~/~~1~F~,n + Fv (11) Eq. (10) and (11) can be merged together, to obtain:
Eq. (12) represents the exact degrees of freedom available in the synthesis of the signal y, and is formally identical to Eq. (1): the value of y is undetermined comesponding to those points for which the lower bound differs from the upper bound.
Such oints are ( the local don't cares for y, and are denoted by DC~x).
Once the bounds (or, equivalently, the don't cares ) for y are computed, ordinary two-level synthesis rdgorithms can be applied. 1
Boolean Relations-based Multiple-1evel Optimization Don 't care -based methods allow the optimization of only one single-output subnetwork at a time. It has been shown in [6] that this strategy may potentially produce lowerquality results with respect to a more general approach attempting the simultaneous optimization of multiple-output subnetworks.
Lety= [Y1, Y2, .". , ym] denote the outputs of a subnetwork, to be re-syntlwsixed, and let F(x, y) denote the network outputs, expressed in terms of the variables y,. From equational), the functional constraints on y are expressed by F m:. G F(x, y) q Fmaz. such that @. (13) holds. An exact solution algorithm, targeting two-level realizations, is presented in [6] .
The difficulty with solutions to Boolean relation is twofold FirsL when trying to express Eq. (13) in a form similar to Eq. (12), the isolation of a particular y, results in dependence of y~in the upper and lower bounds of the expression. This makes simultaneous optimization of yl . . . y~difficult.
Second, Eq. (13) requires a solution so the binate covering problem in the covering step. Fast binate covering solvers are the subject of ongoing research [8] . Nevertheless, the binate nature of the covering step reflects an intrinsic complexity which is not found in the unate case. In particular, since F is in general binate with respect to y, the effect of adding / removing a prime to a partird Solution is no longer trivially predictable, and both bounds in Eq. (13) may be violated by the addition of a single cube. As a consequent, branch-and-bound solvers may (and usually do) undergo many more backtracks than with a untie problem of comparable size, resulting in a substantially increased CPU time.
'In prsctice, y is @synthesised by takins sdvantage slaa of the other internal signals available in the network. Implicants and primes are in this context expreswd in tams of primary inputs and other nehvork variables. 
Compatible Gates
The analysis of Boolean relations points out that binate problems arise because of the generally binate dependence of F on the variables Vi. We int.mduce the notion of compatible gates in order to perform multiple-gate optimization while avoiding the binate covering problem.
In the rest of the paper, given a network output expression F(A y), x is the set of input variables and y is the set of gate outputs to be optimized. Compatible gates allow optimization of multiple gates without having to solve the binate covering problem.
Intuitively, compatible gates are selected such that their optimization can only affect the outputs in a monotonic or unate way, and thereby forcing the covering problem to be unate. The compatibility of a set S of gates is a Boolean property. In order to ascertain it one would have to verify that all network outputs can indeed be expressed as in Definition (3). 'Ibis task is potentially very CPU-intensive.
In Appendix A, we present algorithms for constructing subsets of compatible gates from the network topology only.
Optimizing Compatible Gates
The functional constraints for a set of compatible gates cso be obtained by replacing I@ (14) into Eq. (13). From Eq. (14) we obtain m F~i. c~yjpj
j= 1 Eq. (15) can be solved using steps similar to that of two-level optimization. In particular, the optimization steps consist of implicant extraction and the covering steps.
Implicant Extraction
Assuming that q c Fro.=, the upper bound of Eq. (15) 
where Fm.=,j is the product of all the components of F~a= + p;. :3; canthus appear in a two-level expression of y j if and onlỹ .z, j. As this constraint is identical to Ii@ (3), the primeextrtition strategies [7, 1] of ordinary two-level synthesis can be used.
Example 2 Consider the optimizatwn problem for gates g 1 and gz in Fig. (1) .
From Example (l), pl = (21 +X3 + X4)'; p2 = (xl +X4 +x3)'.
We assume no external dim? cam set. Consequently, F~~. = FM.S = z1x2xj + z2Z3Z4 + xi z;(x3 + xi).~Karnaugh maps of Ftiin and Fro.= ate shown in Fig. (2a) , along with those of p 1 and pZ Fig. (2b) shows the maps of F~.=, 1 = F~~= + p; and F '-F~a= ma.a,2 -+ pi, used for the e.xmaction of the primes of Y I and yz, respectively. The list of all multiple-output primes is given in Table (l). Note that primes 1 through 5 can be used by both y 1 and yz.
Covering Step
Let N indicate the number of primes. For example, in the problem of Exanmle (2). N = 9. We then inmose a sum-of-uroducts representation; as~;iated with each variabl; y j: Using the set of primes found in Example (2), y~and y~are expressed by W = al,lcl + al,2c2 + CY1,3C3 + n'l,4c4 + rrl,5c5+ l,8c8 + @l,9c9 y2 = a2,1cl + a2,2c2 + @2,3c3+ a2,4c4 + a2,5c5+
qecd + a2,7c7
The optimum solution has cost 6 and is given by y 1 = z; z; + zzxd; yz = $Zx$ come$ponding to the a.$sig~ents al,l = &l,z = al,3 = &l,5 = al,9 = t); al,4 = &~,8 = 1; CY2,1 = CY2,2 = ff2,3 = a2,4 = a2,5 = a2,7 = O; CYZ,6 = 1.
The initial cost, in terms of literals, was 12. The solution corresponds to the cover shown in Fig. (3) , and resulting in the circuit of Fig. (4) .
It is worth contrasting, in the above example, the mle of y 1 and YZ in covering Fwtin. Before optimization, P1w covered the minterm8 zlzzZ$zj, Z1ZZZ4Z4, zIz2r3z4 Of F~i~, while pzyz covered z;zjzjzj, Z\XjZ& ${~z~s~q, Z;Z;ZWA After O@i-mization, yl and yz essentially "switched role" in the cover: p2y2 is now used for covering z 1z2x~zj, xl x2x~x4, while pl yl covers all other minterms.
In the general case, the possibility for any of y 1, ..., y~to cover a minterm of Fmin is evident from Eq. (15). Standard single-gate optimization methods based on don 't care [5] regard the optimization of each gate gl, ..., g~as separate problems, and therefore this degree of freedom is not used. For example, in the circuit of Fig. (1) , the optimization of gl is distinct from that of~. The don 't care conditions associated to (say) y 1 am those minterms for which either p 1 = O or such that pZyZ = 1, and are shown in the map of Fig. (5) , along with the initial cover. It can immediately be verified that y] can only be optimized into z 1ZZZ4 + z2z4, saving only one literal.
The don 't cares for yz are also shown in Fig. (5) . No optimization is possible in this case. Note also that the optimization result is (in this particular example) independent from the order in which gl and g2 are optimixed.
Unlike the compatible gates case, it is impossible for the covers of y] and yZ to "switch" role in covering Fmi~. 
IiEB

Unate Optimization
In the previous section we showed that in the case of compatible gates, the functional constraints expressed by Eq. (13) can be reduced to Eqs. (17) and (19), which could be solved by a two-step prm%dure similar to that of two-level optimization.
We now generalize the compatible gates results to the optimization of arbitrary subsets S of untie gates.
Optimizing Unate Subsets
Assume, for the sake of simplicity, that F is positive unate with respect to yl, . . ..ym.
We can perform optimization on the subset of unate gates in a style that is totally analogous to compatible gates by dividing it into implicant extractwn and covering steps.
Implicant Extraction
In this step, for each yi to be optimized, a set of maximalfinctwns isextracted. In particular, the maximal functions of each each y i can be expressed as Eq. (20), which is similar to Eq. (17). 
The remaining task is to find a minimum-cost covering solution for &.
(22). The following theorem, whose proof is in [9] , provides means for Ending a set of maximal functions.
It also shows that computing such functions has complexity comparable with computing ordinary don 't care sets. (23) where D Cj repnments the don 't care set associated with y j, ass~ingthatyk = p~a.,k,k = 1,.. .,l-l dyk = fh; k= +l,..., m.
This theorem states that the maximal function for vertex i depends on the maximal functions already calculated (j < i). This means that unlike the case of compatible gates, maximal function for a given vertex is not unique.
Example 4 For the network of Fig. (6) , assuming no external don 't cam conditwns, we Jittd the maximal functwns for y], W, and VS. The DCU, terms cornsspond to the observability don't care at Y;, computed using the Fmax of the previo~gates. yl = X1X; X4; yz =~4 ( Eq. (20) allows US to find a set of multiple+utput primes for Yl, ..., y-. The covering step then consists of finding a minimumcost sum such that Eq. (22) holds.
We now present a reduction for transforming the covering step to the one presented for compatible gates. We first illustrate the reduction by means of an example.
Example 5 In Fig. (6 In this particular example, we get
More generrdly, corresponding to each combination x such that F~,~(x) = 1, the constraint F(x, y) = 1 can be re-expressed as
The transformation shown in Example (5) Eq. (25) has the same format of Eq. (15), with q and pj being replaced by F(x, z) and z~1, respectively. Theorem (4.2) thus allows us to reduce the covering step to the one used for compatible gates. Theorems (4.1 ) and (4.2) show that the algorithms presented in Section 3 can be used to optimize arbitr~sets of gates with the same parity, without being restricted to sets of compatible gates only.
Implementation and Results
The implementation of the algorithms presented in Sections 3 and 4 is as follows.
The original networks are first transformed into a unate, NOR-only description. All internal functions ate represented using BDDs [10] .
For each unoptimized gate g i, the following heuristic is used. FmL we try to find a set of comptible gates for g,, called Sc. In the case where not enough compatible gates can be found, we find a set of gates that are unate with respect to g i, called s..
In the case where S= is optimized we use Eq. (14) to extract the functions Pj and q. In particular, q is extracted by simulating the network with outputs YJ stuck-at O. The functions p j m then extracted by simulating the network with y~stuck-at 1, with y;; i # j stuck-at O.
In the caae of optimizing arbitrary unate network S., Theorem (4.1) is used to determine the maximal functions for each y,. Note that optimizing S. is preferable because for a set of m compatible gates, m + 1 simulations are needed to obtain all the required don't cares. For S., two simulations (with yj stuck-at-O and stuck-at-l) are required for the extraction of the don 't care set of each variable Y,~resulting in a~~of 2m sim~~ions.
A set of primes for the gate outputs is then constructed. Because of the large possible set of primes, we hit our prime selection to single-literal primes only. The BDD of F(x, z) is then built, and the covering problem solved. Networks are then iteratively optimized until no improvement occurs, and eventually folded back to a bioate form.
The algorithms presented in this paper were implemented in C program called achilles, and tested against a set of MCNC synthesis benchmarks. Note that scnpt.rugged wsz chosen because it is the most robust script of the SIS script suite, and it matches closely to our type of optimization.
Our objective was to compare optirrtixation results based only on Boolean operations, namely compatible gates versus &n 't cams . The script. rugged calls jiLrimpltfi[l 1], which computes observability don 't cares to optimize the network. The table shows that the achilles runtimes are competitive with that of S1S. In thii first implementation, we are more interested in the quality of the optimization than the efficiency of the algofitbms, therefore an exact covering solver is used. We can improve the runtime in the future by substituting a faster heuristic or approximate solvers (such as used in~PRESSO [1] In this section, we describe an algorithm for finding compatible gates baaed on network topology. In this analysis, we make the assumption that the network is transformed into its equivalent NORooly form. In this case, the parity of a path is simply the parity of the path length.
In defining Equation (14) [g those gates whose fattouts contain reconvergent path; wih different inversion parity. The resulting network is COMPATIBLES starts by Iabetii as no~ates in that cone can belong to &erefore at most twice the size of the ofiginal one. In &mXice, the increase is smrdler.
Definition 3 A network is temred unate with respect to a gate g if all reconvergent paths from g have the same parity of inversions.
A network is internally unate if it is unate with respect to each of its gates. All paths from g to a primary output z~in an internally unate network has parity xi, which is &fined to be the parity of g with respect to z:. Theorem (3.1 ) below provides a sufficient conditions for a set S of gates to be compatible.
Without loss of generality, the theorem is stated in terms of networks with one primary output. The following auxiliary definitions are required:
Definition 4 The fanout gate set and fanout edge set of a gafe g, indicated by FO(g) and FOE(g), respectively, are the set of gates and interconnections contained in at least one path from g to the primary output. 1) if each gate in FO(S) with parity ir has at most one input inte~onnection in FOE(S), then the primary outputs can be expressed as in Q. (14) for some stu"table functions p j and q;
2) 1~each gate in FO(S) with parity x' has at most one input in FOE(S), then it can be shown that the output can be expressed as in Eq. (14).
The proof can be found in [9] . Theorem (A. 1) rdso provides a technique for constructing a set of compatible gates directly from the network topology, starting from a "seed" gate g and a parameter (rule) that specifies the desired criterion of Theorem (A. 1) (either 1 or 2) to be checked doring the construction. The algorithm is as follows: [ e fsnout cone of a. I compatible set containti"g' g. Ix$eled gates represents elements ( the set FO(S).
All gates gi that are not yet labeled and have the correct parity are then examined for insertion in S. To this purpose, the fsnout of g i that ia not already in FO(S ) is temporarily labeled "'TMP", and then visited by dfsdteck in order to check the satisfaction of rule. If gi k compatible, it becomes part of S and its fanout is merged with FO(S).
The depth-first traversal of df s_check is stopped whenever the primary outputs or gates already itI FO(S) are reach@ or a viol~tion o_f TU1; is deb-cted.
