During the last decade, many different approaches have been proposed to solve the multiple-level synthesis problem with different minimum functionally complete systems of primitive logic blocks. The most popular of them is the division-based approach. However, modem microelectronic technology provides a large variety of building blocks which considerably differ from those typically considered. The traditional methods are therefore not suitable for synthesis with many modem building blocks. Furthermore, they often fail to find global optima for complex designs and leave unconsidered some important design aspects. Some of their weaknesses can be eliminated without leaving the paradigm they are based on, other ones are more fundamental. A paradigm which enables efficient exploitation of the opportunities created by the microelectronic technology is the general decomposition paradigm. The aim of this paper is to analyze and compare the general decomposition approach and the division-based approach. The most important advantages of the general decomposition approach are its generality (any network of any building blocks can be considered) and totality (all important design aspects can be considered) as well as handling the incompletely specified functions in a natural way. In many cases, the general decomposition approach gives much better results than the traditional approaches.
INTRODUCTION
The term logic synthesis refers to all transformations in the design of digital hardware in which binary data are involved. In this paper, we will only consider a subset of logic synthesis methods, namely methods for the transformation of a multiple-output binary function into a (near-)optimal multiple-level network of primitive logic blocks. The term primitive logic block refers to any binary function that can be mapped one-to-one onto a primitive hardware building block in a certain technology. A primitive hardware building block is the smallest hardware unit considered that is used to implement binary functions in a certain technology. Examples of primitive hardware building blocks are the gates in the library of standard-cell implementations or configurable logic blocks (clbs) for Xilinx FPGA [54] implementations.
Up to 1980, very special cases of the multiple-level logic synthesis received the most attention namely, the transformation of a binary function into an optimal two-level network (e.g. AND-OR-NOT, OR-AND-NOT, NAND-NAND, NOR-NOR and AND-EXOR implementations), and the transformation into multiple-level EXOR, AND-EXOR, AND-OR-NOT or MUX networks.
This interest resulted from the fact that these networks could be easily modelled, minimized and mapped oneto-one on networks of typical primitive hardware building blocks provided by the electronic technologies of that time, for example, on those as in TTL and ECL technologies and on PLAs or PALs.
Multiple-level networks often allow a more compact implementation of combinational logic in comparison with two-level networks. They enable the separate implementation of common sub-expressions and sharing them among multiple functions or sub-functions. However, many functions do not result in compact AND-EXOR, AND-OR-NOT or MUX networks. On the other hand, modern microelectronic technology provides us with a huge number of various primitive hardware building blocks which can be used for obtaining more compact networks. Exploitation of these abilities requires new appropriate multiple-level synthesis methods. The introduction of a new generation of FPGAs [49] has recently generated very strong stimulus for research in multiplelevel logic synthesis. The internal structure of the FPGAs is in fact a programmable multiple-level network and therefore, these devices require the use of multiple-level 268 E VOLF, L. J0WIAK AND M. STEVENS logic synthesis techniques in order to exploit their abilities. Unfortunately, the synthesis of general multiple-level networks is much more complicated than the synthesis of two-level logic. The main reason for this is the difficulty in defining the nature of the "optimal solution" in the multiple-level synthesis problem. For example, in the case of two-level AND-OR-NOT logic, the "optimal solution" is the solution with the minimal number of product terms, which is a relatively good measure for the complexity of the implemented network.
In multiple-level logic, the structure of the logic is less uniform and can be considerably more complex than two-level logic. Also, the design decisions have a much more substantial impact on the many factors that decide the total quality of a multiple-level logic network: area, speed, power dissipation, testability etc. Furthermore, these factors are no longer simple functions of the implementation structure as it was the case for two-level logic.
During the last decade many different approaches have been proposed to solve the multiple-level logic synthesis problem. The ([ 17] contains an overview), methods based on the iteration of gate transforms and gate reductions, th.e so-called transduction methods [40] .
In recent years a number of papers have been published which implicitly or explicitly use a new concept of general structural decomposition as a synthesis paradigm (e.g. [15] [26] [27] [30] [32] ). The distinctive feature of these methods is that they are all special cases of a general full-decomposition as presented in section 3. In the general full-decomposition approach, an incompletely specified multiple-output binary function is decomposed into a network of communicating subfunctions (logic blocks) in such a way that this network realizes the specified behaviour, satisfies specified constraints and optimizes given objectives. Decomposition decisions are based on analysis of the structure of the information streams in the function and the relations between this structure and the specified con-straints and objectives. The constraints imposed by hardware building blocks and their possible interconnections are innately taken into account.
This approach has a number of interesting properties, including the following:
In many multiple-level synthesis approaches Boolean expressions are used to describe functions. Very often these approaches use only a limited set (minimum functionally complete set) of Boolean operators (e.g. AND-OR-NOT) and not the full set of operators implemented by a certain library of hardware building blocks. To implement the minimised expression, a transformation step called technology mapping must be performed in order to transform the expression into a network of hardware building blocks. If the repertoire of primitive logic blocks offered by a certain technology library differs substantially from the set of Boolean operators used during synthesis, the work completed during synthesis is almost futile, because the real synthesis must be performed during the technology mapping. The synthesis methods based on general decomposition integrate the technology mapping phase into the synthesis: a network of logic blocks, that can be mapped one-to-one onto a network of primitive hardware building blocks, is constructed. The internal structure of Xilinx FPGAs and similar fine granularity FPGAs is in fact a programmable multiple level logic block structure which can be innately modelled using the theory in Section 3. Therefore methods for the synthesis of these types of FPGAs can be relatively easily constructed using this theory. This paper aims to present a comparative analysis of the general decomposition-based and division-based multiple-level logic synthesis approaches. We will investigate the properties of the decomposition-based logic synthesis methods by introducing the general full-decomposition concept, presenting and discussing the existing decomposition methods and comparing the decomposition methods with the classical multiple-level synthesis methods based on the division of Boolean expressions. We have chosen the division-based algorithms as our reference, because they are by far the most popular ones (as measured by the number of publications on this subject).
The remainder part of this paper has been organised as follows: Sections 2 and 3 contain introductions to the theory and the most important results obtained in division-based and decomposition-based logic synthesis, respectively. In Section 4, a comparison between these two classes is presented. Some concluding remarks can be found in Section 5.
DIVISION-BASED LOGIC SYNTHESIS
The fundamentals for division-based multiple-level logic synthesis were introduced by Robert K. Brayton in 1982 [7] [8] [10] . In this section, we will review the most important aspects of Brayton's original method and Because (a + b) and (a + c) have intersecting input supports, the product is a Boolean product.
The disadvantage of the Boolean division approach is that the set of Boolean divisors is usually very large and therefore good divisors cannot be easily found. Consequently, an altemative approach has been considered. It is based on the fact that the sum-of-product term representation for two-level functions is almost canonical and efficient common algebraic factors can be identified as being common factors of the product terms. The idea is motivated by the fact that manipulations of sum-ofproduct terms are in most cases quickly performed (many algebraic operations have linear time complexity). The disadvantage of this idea is that it does not guarantee optimal solutions. Brayton uses an alternative approach based on this idea. In this approach, the incompletely specified function 7"is minimised to obtain a two-level minimal representation of the on-set f of (using the two-level minimiser Espresso [9] [7] .
This theorem states that two functions only have a multiple cube divisor if an intersection of a kernel of f and a kernel of g has more than one cube. It is the fundamental theorem used in the factorisation algorithms presented in the next sections.
Standard Factorisation Algorithm
The algorithm discussed below was introduced by R.K. Brayton in 1982 ( [7] [8] [10] ). We will call it the standard factorisation algorithm, because all other factorisation algorithms are based on it. Each method presented in this paper will be characterised by sketching an outline of the major steps and by discussing the impact of these steps on the quality of the result. The standard factorisation method can be characterised as follows:
The fundamentals of the method are based on the notion of kernel/co-kernel pairs. This notion is easy to comprehend, which makes easy reasoning of the synthesis process. Although the set of kernels can become very large, it is considered to be reasonably small for most practical synthesis problems. Kernels form a very small subset of all algebraic divisors of a function. Algebraic divisors are a special subset of another group of divisors: the Boolean divisors. Therefore, the set of kernels of a Boolean expression is a very small sub-set of all possible divisors and can be too restrictive to find near optimal solutions. The input to the algorithm is the two-level, locally minimised on-set f' of an incompletely specified function f,d,r) obtained using the two-level minimiser Espresso [9] . This makes incompletely specified multiple-output functions much easier to handle, but the loss of the don't cares before the algorithm actually starts will almost certainly lead to a less satisfactory implementation.
In the type of factorisation problems presented here, two optimisation criteria are generally considered.
Firstly, the area occupied by the implementation of the circuit and secondly the maximal speed of the implementation of the circuit. The area of the implementation is usually split into two components: the active area (area used for active elements (transistors)) and the routing area (the area used for wiring). The maximum speed of the implementation is usually determined by its critical path; this is defined as the worst case response time of any output to a change in one or more of its inputs. In the standard factorisation algorithm, the number of literals is used as the optimisation criterion. The lexicographical factorisation method can be characterised as follows:
The lexicographical algorithm, like standard factorisation, has the locally two-level minimised Boolean functions as its inputs. The input-order is created in the first step. Part of the input order can be imposed externally to account for external factors (e.g. late arrival times of some input). If this enforced input ordering is not complete, then the following input ordering algorithm is used to complete the order. A list of all kernel/co-kernel pairs is first constructed. The pairs are sorted with respect to their global gain in the number of literals. The pair with the largest gain in the number of literals and which not violates the input order is selected and the input order is updated with regard to the precedence relation of the selected kernel/co-kernel pair. These steps are repeated until no more compatible kernel/ co-kernel pairs can be found. The lexicographical factorisation algorithm uses a greedy selection algorithm to find a good input order. The constructed input order may not be the best one (in addition to the fact that any input order is a large restriction on the set of kernels). Therefore, a more sophisticated algorithm may be necessary to find a near optimal input order. Related to this problem is the fact that lexicographical factorisation uses the number of literals to estimate the quality of an implementation. One of the main advantages of lexicographical factorisation over standard factorisation is the following theorem proven in [45] " Theorem 2 Lexicographical compatible kernel/ co-kernel pairs are algebraically compatible. Lexicographical factorisation therefore does not require the recalculation of the set of kernels/cokernel pairs after one pair is selected from the set. The result is that lexicographical factorisation is a much faster algorithm when compared with standard factorisation.
The factorisation is then performed and respects the variable ordering just created. Since the variable ordering is known, factorisation is very simple. Negated variables are factored out immediately after the non-negated variable. In the final step, common sub-expressions are identified and implemented as sub-functions. Because of the input order, the search for common sub-expressions is very efficient. It is performed as the last step, because high priority is given to internal simplifications of the expressions and this results in a low number of wires and short wires. The lexicographical factorisation results in implementations which have a much smaller routing area compared to the standard factorisation algorithm of Brayton. Respecting the variable ordering can however result in an increase of the active area. Many good kernels can not be used because the input order is very restrictive. Therefore, the method produces only good results for circuits with a high routing factor (the amount of active area used is larger than the active area obtained using Brayton A feature of concurrent factorisation is that it does not only take elements of D(f) and S(f) as divisors, but also uses the complement of these divisors. In [42] Figure lb) . The function also requires 10 literals. As it can be seen, a subfunction Y has been introduced. Because the lexicographical factorisation algorithm uses NANDs as an internal representation, it was able to identify ab and a + b as common subexpressions. If this realisation is compared with standard factorisation then it shows the properties of lexicographical factorisation: each input is only once connected to a gate and therefore the routing complexity for the inputs is reduced. It can also been seen that the method does not try to minimise wires for subfunctions as the subfunction Y is routed globally. It is however possible to specify a threshold on the gain of the number of literals for sub-functions. If the gain of a certain sub-function is larger than the threshold, the subfunction is created and applied globally (expecting a large active area gain, but small extra area for wiring), otherwise it is implemented locally (costing extra active area, but negligible routing area). Furthermore, the lexicographical factorisation algorithm needs more gates than standard factorisation (increase of active area). The real power of the lexicographical factorisation algorithm can only be shown on large examples, where the gain in active area and the extra routing area for sub-functions is compensated by a large reduction in the routing area for the inputs. We refer to the benchmark results in [45] to illustrate the effectiveness of the lexicographical factorisation for large circuits. Using lexicographical factorisation, the partial input order can also be enforced. Suppose we want the inputs a,b and c to be extracted first (because these inputs have late arrival times due to external circumstances), the input ordering is first completed as { a,b,c,d,e,f,g } and the following factorisation is then found: z ab(ce + d)+ (az + bz + gz1) and z ef (see Figure lc ) which requires 13 literals. It should be noted that the second part of z cannot be written as zl(a + b + g) without violating the precedence relation. Also, it should be noted that the critical path of a and b has reduced from 6 gates to 2 gates al; the expense of a somewhat more complex routing and an increase in active area (more inputs per gate). The concurrent factorisation algorithm searches explicitly for the complement of the kernel a+b, whereas in lexicographical factorisation this equivalence was implicitly found. Concurrent factorisation finds the same factorisation as lexicographical factorisation (function y) (see Figure b) . As it can be seen only double cube divisors and cubes with two literals are used.
GENERAL DECOMPOSITION-BASED MULTIPLE-LEVEL LOGIC SYNTHESIS
In this section, we will present a theory of general full-decomposition for combinational machines and give an overview of the decomposition-based methods for multiple-level combinational logic synthesis. Basic definitions are presented in sub-section 3.1, the theory of general full-decomposition can be found in sub-section 3.2 and some special decomposition cases are the topics of sub-section 3.3.
Basic Definitions
A (completely specified) combinational machine M is an algebraic system defined by:
where" M (I,O,k) I -a finite non-empty set of inputs, O a finite non-empty set of outputs, h -the output function h:I--->O.
The design requirembnts do not always completely specify a machine for example, certain input values may never occur due to external constraints or due to realizing the machine in such a way that some of the input values of the realization are not used for implementing the inputs of the originally specified machine. From the behavioural viewpoint, the designer does not care what will be the output value for such an input value. In all such situations one talks about so called "don't care" conditions. "Don't cares" are commonly denoted by "-". about elements of S and an identity partition gives no information. The partition product can be interpreted as a product of the appropriate equivalence relations introduced by these partitions; it represents the combined information about the elements of S provided by both relations together. The partition sum can be interpreted as a sum of the appropriate equivalence relations introduced by these partitions and it represents the combined abstraction of both relations.
Example
In Table I , the function table of an incompletely specified Boolean function is presented. The function has 3 input bits (x, xa, x3) and two output bits (y and ya).
Each input combination has been labelled with a unique name (a, b, c, d, e, f, g and h) and hence we can use these symbolic names in the following considerations. Likewise, the output combinations have been labelled by w, x, y and z. These functions can be written as the following completely specified machine: M(I,O,k) such that I { a,b,c,d,e,f,g,h}, O { w,x,y,z} and k:I---O as specified in Table I For example, take x f: k(f) z (see Table I ) and 0(hr(XIc(f)) 0(hr(6)) 0(8) Z. Verification of this relation for other combinations can be performed by the reader as an exercise. For machine M(I,O,k) as defined above "rrI(0)={a;b;c;d;e;f;g;h } is the zero input partition and "rri(I)={ a,b,c,d,e,f,g,h } is the input identity partition. Let "rr {a,b; c,d; e,f,g,h} and "rra {a,b; c,e; d,f,g,h} be partitions on set I. The product "rr .'rra {a,b; c; d; e; f,g,h} denotes the combined information of "rrl and -rra, for example in "rr the symbols c and d are equivalent and hence partition "rrl cannot distinguish between these two symbols. 7r a can make the distinction between c and d (because they are in different blocks of "rra). The product -rrl .'rr a represents the partition that makes the union of the distinctions of 7rl and "rra and combines in one block only those elements which are only if each block of "rr unambiguously determines the block of 7r o in which the output is contained. If (wi, is a partition pair then "rr is called the first partition of the pair and "rro is called the second partition of that pair. Let "rr be a partition on I. The minimal second partition which forms an I-O partition pair with "rr as a first partition will be denoted mi_o('rri). The maximal first partition which forms an I-O partition pair with "rro as a second partition will be denoted Mi_o('rro). It can be proved [19] For a given 'l'ri, m('rri) describes the largest amount of information which can be computed about the output of M knowing the block of 'fix which contains the input. M(aro) describes the least amount of information which must be known about the input of M, in order to be able to compute the information about the output with precision to " For the purpose of a bit decomposition (in which the input/output bits are appropriately distributed instead of the input/output symbols), the concepts of bit partitions has been introduced [25] . Let B {bl,b 2 bB} be a set of (input or output) bits. Let T {tl,t2 tT} be a set of (input or output) symbols. Each input/output bit bkB, introduces a two-block partition "trT(bk) on the set of symbols (bit value patterns) T (in the case of incompletely specified machines on the subset of T for which the value of this bit is specified). One block of "rrv(bk) contains the symbols for which bit b k has the value 0 and the other block contains the symbols for which b k has the value 1. The product of the partitions "trv(bk) for all the bits bk: bk B will unambiguously define the set of all input/output symbols, i.e. it will be a zero partition. A partition "rr B on the set of bits B "rr B {b,b2 bk, ( Table I as a binary function instead as a symbolic function will now be considered. Bit-partitions on the inputbits can be made. The set with the input bits is called X, i.e. X { x, x 2, x3}. In fact, an input bit represents a symbolic partition which contains in the first block all symbols for which this bit is 0 and in the second block, all symbols for which this bit is l" 7rx(X) { a,b,c,d;e,f,g,h}, 7rx (x2) 
General Full-Decomposition
A theory of general decomposition of sequential machines is presented in [29] . In this paper we are concerned with the synthesis of combinational logic however, a combinational machine is merely a special case of a sequential machine with one state and a trivial nextstate function. Therefore, the general decomposition theory can also be applied to our problem. A special case of the general full-decomposition theory [29] output information j, --< --< n, is separately transmitted to the input of a certain machine i, 1 <--< n, i.e. without combining it with a (partial) output information from other partial machines k, -< k-< n, k 
Proof
There follows only an outline of the proof in order to show how to construct a decompositional realization structure that is based on partition doubles ('rr I, -rr*i) and (a'i, a'*i). Let M1 ('rri "q, i, and M2 ('q "rri, "*I, k2) be the two machines for which the following In a similar way, formal definitions for parallel and serial compositions and decompositions can be introduced. In [29] the following theorem has been proved. Let : I ----> arI 'l'i be a function, 0:ar* q'*I----->O be a surjective partial function, and the solution with 4 look-up tables as presented in Figure  4b . Since, inputs symbols are already binary encoded, we have chosen to use an input coder which achieves a direct distribution of a subset of the input bits to each machine C and E (see Figure 4c ). i.e. arc ind({Xl;X4;(xe,x3)}) {0,2,4,6;1,3,5,7; 8,10,12,14;9,11,13,15}. Similarly: arE ind({xe;x3;x4; (x)}) {0,8;1,9;2,10;3,11;4,12;5,13;6,14;7,15}. Since the decomposition in Figure 4c is a parallel decomposition, conditions (1) and (2) in theorem 6 are satisfied trivially (there is no information flow from one machine to the other) and hence ar and a-are identity partitions.
We then need to show that the partition doubles satisfy condition (3) of theorem 6. This is relatively easy: Xr*c ar*E {0,5,6,7,8,14;1,2,3,4,10,12;13,15;9,11 }, inspection of Table II Table II , the specification of a completely specified 3 function f is given. Our goal is to implement the function 4 using a minimum number of look-up tables with two 5 inputs and one output. All three factorisation algorithms 6 described in Section 2 find the following implementation 7 f of f: fl x2x3x4-I-X2(X3X 4 + X1X4). These functions 8 must then be mapped on two-input one-output gates. The 9 only possible technology mapping that can be performed 10 without repeating the synthesis, i.e. without destroying 11 the structure obtained from the factorisation process, is 12 presented in Figure 4a . It results in a circuit with 7 13 14 look-up tables. The prototype of an algorithm that is It should be noted that condition (3) is slightly modified.
Since we are building a subfunction for implementing "rr* E the partitions -rr* A, "rr* B and "tr* E are all partitions on I and the partition pair property is reduced to the <-property. It must be shown that the partition doubles satisfy all these conditions. As a result of the fact that each block of ,1T A is included in a block of 'IT* A] qT A qT* A. Since all output information of machine A is used as input information for machine B, it can be assumed that axe, "rr* g and condition (l) is then satisfied whenever condition (2) is also satisfied. For condition (2) we need to first calculate the product "tr A ffl" B (0,8;1,9;2,10;3,11; 4,12;5,13;6,14; 7,15}. Using these results, it is easy to see that condition (2) is then satisfied. Finally, for condition (3) we need to calculate "rr* g "tr* a {0,8;4,12;1,2,3,9,10,11; 5,6,7,13,14,15}. With this product partition, it is obvious that condition (3) is satisfied and hence the decomposition is correct. In Table  III, EXOR and the gate ab. All these gates are innately and directly obtained from decomposition without the use of (non-trivial) technology mappings.
Special Cases of General Full-Decomposition
Today, none of the methods that have been published have been able to produce near optimal solutions for a multiple general full-decomposition. All the published results relate to special cases of the presented model. In this section, a number of special cases will be discussed.
Input-bit parallel full-decomposition
In parallel decomposition, no information flows between the partial machines, and therefore the partitions "rr and "r in theorem 5 are reduced to "fro(I). In input-bit decomposition, the input decoder is reduced to the appropriate distribution of the input bit lines and this results in the replacement of the input partitions "rr and "r by the bit-partitions "rrIB and "riB. In this way, the following theorem was obtained from theorem 5.
Theorem 7 A combinational machine M has a non-trivial input-bit parallel full-decomposition with two component machines (see Figure 5 ) if and if only two partition doubles (XriB, -rr*i) and ("riB, "r'i) exist that satisfy the conditions"
(1) 'IT "fl'* and T T'I, where 'IT "-ind('triB and ri ind("riB).
(2) (TI'* T*I,'/I'O(0)) is an I-O partition pair. A well-known and extensively studied special case of the input-bit parallel full-decomposition is the inputencoder problem. In this case, the output decoder 0 is implemented as a PLA and the input encoders M have multiple exclusive sets of input bits. The problem is often modelled using multiple-valued logic [43] In the first step, a set of inputs is partitioned into a number of disjoint subsets. Two heuristic approaches are presented for these: the first is based on integer programming whereas the second is based on a modified min-cut algorithm. Benchmark results from a large and varied set of machines show that the integer programming approach is better, but the graph-partitioning approach is faster.
A classical multiple-valued minimization is then used to find the best implementation of the PLA O.
The results that have been presented (in the form of benchmarks) are very promising, the only drawback of this approach is that the input sets may not intersect.
Another special case of the input-bit parallel decomposition was considered by W. Wan et al. [51] . Given a certain incompletely specified multiple output function f(xl Xn), the method presented in [51] Figure 6 ) [31] , where one of the component machines (M2) is replaced by an identity function. The first step of this algorithm is to find the inputs IB 2 which have to be fed directly to output decoder 0. The search algorithm tries to find the best set of inputs, so that the number of inputs of output decoder 0 does not exceed a user specified bound (for Xilinx clbs this bound is set to 4). The algorithm then tries to find an implementation for machine M1 using a minimum number of possible inputs (i.e.
IBeLIIB3 contains a minimum number of elements). First a disjoint decomposition is used (i.e IB 3 has no elements). If this fails, inputs from IB 2 are added to IB 3 until machine M can be constructed. Unfortunately, no heuristics are described and no results on large benchmark sets are presented, therefore it is impossible to estimate the efficiency of this method for large circuits. However, the results that are presented are very promising.
In recent years, a number of methods for the more general input-bit parallel decomposition problem have been presented. In [22] In the bit-parallel full-decomposition, both the input decoder and the output decoder are reduced to the appropriate distribution of the input/output bit lines (see Figure 7 ). The theorem for this type of decomposition can be obtained from theorem 7 by replacing the output partitions o and "r o with bit-partitions "rroB and Theorem $ A combinational machine has a non-trivial bit parallel full-decomposition with two component machines (see Figure 7) , if two partition doubles ('rr m, "rroB) and ('riB, "rOB) exist that satisfy the conditions" Solutions to this decomposition problem have been presented independently in [27] , [30] and [20] . In [27] First, the information processing structure of the original combinational machine and its relation to the characteristics of building blocks are analyzed. From information about the correlations between the input, term and output variables as well as information about the constraints, the expected minimum number of building blocks and the expected number of input bits, output bits and terms per building block are computed. The expected values show how difficult it is to satisfy each of the constraints with a given number of blocks and indicate the amount of attention that must be paid to each of the constraints during the partitioning process. The active input bits and terms for each single-output function are also computed. Based on this information, affinities (from the viewpoint of a certain partitioning problem) between each two (single or multiple-output) functions can be computed.
With the above information, a limited number of near optimal solutions are constructed in parallel by performing a multi-dimensional packing while using a beam-search algorithm. Since the decision making during the search is based on uncertain information, the search is guided by the heuristic elaborations of the rule of minimizing the uncertainty of choices. At each step, the decisions are taken which ensure the highest certainty of achieving the optimal solutions and, under this condition, the decisions that minimize the uncertainty of information for the future choices. Information that is used directly for decision making consists of relations between the characteristics of single-output functions and constraints imposed by ( A similar method was published few months later in [20] . It uses less information about the original multipleoutput function than the method presented in [27] and elaborates information less precisely. An interesting concept not present in the method published in [27] is that of relaxing the term constraints and dynamically processing the terms.
Another approach to bit parallel full-decomposition is presented in [30] . In this paper the problem is referred to as parallel decomposition. Characterisation:
The actual parallel decomposition is preceded by argument reduction. Argument reduction is a technique which minimizes the number of inputs of a Boolean function, as opposed to the classical minimisation which aims at finding a minimum number of product terms. It is used to find function representations which use minimum number of input variables. This process is similar to term reduction which finds the minimum number of product terms.
A parallel decomposition algorithm uses the results of the argument reduction and constructs two-block parallel decompositions. Unfortunately in [30] , only the idea of parallel decomposition is presented with no algorithms and heuristics. Only few results of experiments are shown however, these are very promising.
In a later paper [32] , this decomposition method is combined with the input-bit parallel decomposition method mentioned in the previous section. This already allows for the construction of complex networks of blocks, but it is not yet a general full-decomposition in its most complete form. In this paper, Luba [35] where division-based synthesis is used to preform a special input-bit parallel decomposition. For special cases of logic implementations in the form of exclusively AND-OR-NOT, NAND or NOR networks, the solutions with division-based synthesis may be appropriate, under the condition that they take into account all the important objectives and constraints and involve effective and efficient algorithms.
The general decomposition approach has a number of advantages over the division-based approach. The main advantage of general full-decomposition is its general character. The general decomposition model and theorem enable modelling and construction of all possible combinational network structures, while the traditional logic synthesis methods, including the division-based methods, model circuits in terms of special minimum or almost minimum functionally complete systems. Such a functionally complete system is able to express each function, but it models functions as networks composed of exclusively special sub-functions which are included in a certain functionally complete system (e.g. AND-OR-NOT, NAND, NOR, EXOR-AND, MUX), while the general decomposition approach models them in terms of all possible sub-functions. If a certain element library includes more types of primitive gates than those included in a certain minimum functionally complete system or includes look-up tables, technology mapping is necessary. The network synthesized using exclusively the gates from the minimal functionally complete system must be mapped into the network composed of any elements from the library. If the repertoire of subfunctions offered by a certain implementation technology differs substantially from the set of gates provided by a given minimal functionally complete system, the work done by a traditional synthesis method is almost futile. Since the initial network is constructed without any regard to future implementation, to guarantee a possible to implement or optimal solution, the technology mapping must again perform synthesis using the previously synthesised network as only a functional specification. Using decomposition-based synthesis this problem does not exist: the synthesis process constructs a network of functional blocks, which are in one-to-one correspondence with physical hardware blocks.
A further advantage of decomposition-based synthesis is its total character. During the decomposition, attention is paid not only to the active elements (operators) but to all the elements and aspects which can influence the quality of the results (i.e. inputs, outputs, interconnects and functionality) and to their interrelations. In the division-based approach all these aspects, except for active elements, are completely ignored. The only exception is the lexicographical factorisation [45] , which takes the interconnections into account by accepting a predefined input ordering during synthesis. Of course, it is possible to improve or further develop division-based methods by taking into account all these elements and their relations to the actual objectives and constraints, but it will not enlarge their range of application to circuits substantially different from AND-OR-NOT circuits.
Another important aspect is the use of don't cares in incompletely specified functions. In division-based synthesis, incompletely specified functions are first minimized using a two-level minimizer and then the actual synthesis is performed. In this way all don't cares are removed and hence the design freedom is drastically reduced. The synthesis problem is transformed to this of finding an implementation of a completely specified function (without considering influence of the don't cares on realisation of the actual design). These don't cares are lost for ever. Also, the multiple-level structure itself can introduce don't cares [5] [ 11] . It has been shown that very complex and time-consuming techniques are necessary to effectively use don't cares in division-based synthesis [4] . The decomposition approach does not require prior two-level minimisation and innately uses the freedom given by don't cares in order to optimize the resulting network structure (see for exalhple [31] ).
In Table IV , some synthesis results are presented that compare the division based synthesis with the general decomposition based synthesis. The results are taken from [34] . The goal of the experiments was implementation with the minimum number of primitive logic MULTIPLE LEVEL LOGIC SYNTHESIS 285 blocks being 5-inputs, 2-outputs look-up tables. Table IV compares the number of clbs needed to implement a number of benchmark circuits from the MCNC logic synthesis benchmark-set [53] . In the second colunm (marked Luba) the results of the general full decomposition method described in [34] are presented. The remaining columns present the results of different division based algorithms. These algorithms use (modified versions of) one of the factorisation algorithms presented in section 2 to minimise the two-level representation, followed by a technology mapping phase. The goal of the technology mapping phase is to transform the AND-OR-NOT network obtained by division into a feasible network with a minimum number of clbs. A feasible network is a network where blocks are only used if they do not violate the constraints imposed on them (in our cases: the blocks can implement any Boolean function that uses no more than 5 different input bits and 2 different output bits). In the third column (labelled MIS-PGA) the results are presented for the method described in [38] [39] . This method heavily depends on the classical kernel/co-kernel minimisation and tries to map the minimum network to a feasible structure. In column 4, the results for the same set of benchmarks are presented for the ASYL system [48] . The ASYL system implements, among others, the lexicographical factorisation algorithm. In [48] As the table shows, the general decomposition approach gives results, which are in all cases better than the division based approaches. These dramatic improvements are obtained because of two major reasons:
In the general decomposition approach, we deal with partitions representing any function with a certain number of inputs and outputs and not with 
