This paper describes a timing-driven optimization technique for the synthesis of multi-level logic circuits. Motivated by the parallel prefix problem, the proposed timing-driven optimization produces logic circuits with "lookahead" properties due to the inherent parallelism among the synthesized sub-circuits. Lookahead logic circuits are synthesized using global critical path sensitization information to decompose and reduce the Boolean functions of the nodes in the technology-independent representation of the logic circuit. Unlike prior timing-driven optimization techniques, where synthesis of the decomposition functions is potentially expensive, the proposed technique has the advantage that the decomposition functions are discovered in the synthesized form. On average, the proposed technique reduces the number of logic levels (mapped delay) of 15 benchmark circuits by 40%, 56%, and 22% (21%, 56% and 10%) over the best results of SIS, ABC, and an industrystandard synthesizer, respectively.
Introduction
Timing-driven optimization during multi-level logic synthesis is a well-researched area, and several solutions have been proposed in literature [1] [2] [3] [4] [5] [6] [7] [8] [9] . These techniques either restructure the critical paths or perform decomposition-based resynthesis of the circuit.
Restructuring techniques, such as [1] [2] [3] [4] , are computationally efficient but the improvements from these techniques are limited because restructuring is restricted to cutsets of nodes on the critical path. On the other hand, decomposition-based resynthesis techniques, such as [5] [6] [7] , have immense scope for optimization because the space of possible transformations is vast. However, the algorithms proposed in literature are computationally intensive and the improvements achieved are limited by available computational resources.
Motivated by the parallel prefix problem [10, 11] , this paper describes a timing-driven optimization technique for the synthesis of multi-level logic circuits. The prefix problem is one of the fundaThe authors would like to acknowledge Prof. Peter Varman at Rice University and Prof. Adnan Aziz at the University of Texas, Austin for helpful discussions and suggestions. This research was supported in part by NSF CAREER Award CCF-0746850 and in part by a gift from the Fujitsu Laboratories of America. mental approaches to build parallel algorithms and has been extensively studied in literature, with successful applications to problems including sorting, parallelizing compilers, task scheduling, etc. The classic example of the application of the prefix problem to logic circuits is in the design of the tree-structured carry lookahead adder (CLA), where it is used to reduce the delay of carry propagation in an n-bit ripple carry adder from O(n) to O(log(n)).
The basic property that allows the application of prefix theory to these problems is the identification of intermediate computation that can be performed in parallel. For instance, in an n-bit binary adder, the generate bit (g i) and propagate bit (pi) for each bit-slice can be computed in parallel and the carry bit (c i) is a prefix computation defined over the pair (p j , gj), 1 ≤ j ≤ i. The regular modular structure of the adder makes it easy to identify parallel computation of the pairs (p j , gj) for each bit-slice in the adder. Once the pairs are computed, parallel prefix theory is used to synthesize a fast implementation for the carry, such as the CLA. For an adder, parallel intermediate computation are simple functions with disjoint support, i.e., (g i, pi) and (gj, pj), i = j do not have any common inputs. However, in general multi-level logic circuits such as control logic in microprocessors, intermediate computations are complex functions with non-disjoint support due to logic sharing. Hence, identifying parallel computation to apply the principles of the prefix problem to general multi-level logic circuits is significantly more challenging.
In this paper, we propose a timing-driven optimization technique to identify intermediate computation that can be parallelized in general multi-level logic circuits. The optimized logic circuits exhibit "lookahead" properties due to the inherent parallelism among the synthesized sub-circuits, and are hence called lookahead logic circuits in this paper. When applied to a ripple carry adder, our technique can systematically derive different realizations of high-performance adders including the carry lookahead, carry skip, and carry select adders. The main advantage of our technique is that it synthesizes decomposed circuits with smaller delays in the form of lookahead logic circuits, instead of searching for decomposition functions. Lookahead logic circuits are synthesized by decomposing and reducing the Boolean functions corresponding to the internal nodes in the technology-independent representation of the original logic circuit. The lookahead logic circuits are then combined using Shannon's decomposition and its implication-based simplifications to reconstruct the original logic circuit. Unlike prior timingdriven optimization techniques, where the synthesis of the decomposition functions is potentially expensive, our technique has the advantage that the decomposition function is discovered in the synthesized form of the lookahead logic circuit.
The performance of lookahead logic circuits is compared to the state-of-the-art academic tools SIS and ABC, and an industry-standard synthesizer. First, a case study of the n-bit ripple carry adder is used to compare our technique with these tools. Results indicate that our technique discovers several interesting decompositions with fewer levels of logic. Next, 15 circuits from the MCNC and ISCAS benchmark suites and the OpenSPARC T1 processor are used to compare our technique to the best results obtained using these tools. On average, our technique reduces the number of logic levels in the final circuit by 40%, 56% and 22% over the best results of SIS, ABC, and the industry-standard synthesizer, respectively. When mapped delays are evaluated, our technique achieves an average reduction of 21%, 56% and 10% over the best results of SIS, ABC, and the industry-standard synthesizer, respectively. Our approach is computationally efficient, with a runtime of 100 seconds on the largest circuit considered in this paper.
This paper is organized as follows. Sec. 2 provides a background on existing timing-driven optimization techniques. Sec. 3 introduces lookahead logic circuits and describes the proposed synthesis algorithm. Sec. 4 presents a case study for n-bit adders. Sec. 5 presents results. Sec. 6 presents conclusions.
Timing-driven decomposition
The timing-driven optimizations proposed in literature can be broadly divided into two classes: (i) structure-based and (ii) decomposition-based. The earliest techniques for timing-driven optimization were based on restructuring critical paths to reduce circuit delay [1] [2] [3] [4] . Most structure-based techniques have used the transformation of a ripple carry adder into a fast implementation like the CLA, carry select adder or carry bypass adder as motivation for their techniques. The technique proposed in [1] , called tree height reduction, uses a CLA as motivation to reduce the delay of the circuit by rescheduling computation along critical paths. The technique presented in [2] , called the generalized select transform, uses a carry select adder as a motivating example and proposes a technique that identifies late arriving signals, performs computation using both 0 and 1 as the value for the signal, and then uses that signal to select the correct output through a multiplexer. In [3] , the carry bypass adder is used as motivation to propose the generalized bypass transform that reduces the critical path delay by adding redundant bypass paths and turning the critical paths into false paths. The false paths can then be eliminated without increasing the delay of the circuit using a technique presented in [4] .
Decomposition-based techniques fundamentally differ from structure-based techniques in that they do not directly restructure the circuit. Instead, the circuit structure is changed as a result of changing the functionality of the internal nodes, while maintaining functional equivalence at the primary outputs. Decomposition-based techniques are capable of exploring a much richer design synthesis space, at higher computational cost, as compared to structure-based techniques. A decomposition-based technique using partial collapsing and simplification of nodes to reduce the delay is proposed in [5] . The technique proposed in [6] uses permissible functions to resynthesize sets of nodes that lie on the critical path to reduce the delay. In [7] , additional redundant circuitry is added to compute the output on input patterns that sensitize the critical paths. This approach includes features of structure-based techniques, but suffers the following drawbacks. Since redundant logic is added in the form of bypass paths to the original circuit, the technique leads to a circuit with a high area and/or power footprint. The improvements in delay are limited because the additional redundant logic is restricted to only implications (0-approximation or 1-approximation) of the original function. The scalability of this approach is also limited due to a bottom-up synthesis approach for the additional redundant logic starting from an incompletely specified Boolean function with a large don't care space. Finally, although not directly related to the present work, BDD-based decomposition techniques for timing optimization have also been proposed [9, [12] [13] [14] [15] and to this day are an active area of research.
In this paper, we propose a decomposition-based timing-driven optimization technique using lookahead logic circuits. Unlike prior techniques, where the synthesis of the decomposition functions is potentially expensive, our technique has the advantage that the decomposition functions are discovered in the synthesized form. It can explain conversion of a ripple carry adder into several fast implementations including the carry lookahead, carry select, and carry bypass adders. Like most other timing-driven optimization techniques, it also complements existing logic optimization algorithms. The next section develops the theory of lookahead logic circuits and describes the synthesis algorithm for lookahead logic circuits.
Lookahead logic circuits
With the background on timing-driven optimization, we use binary addition to introduce the basic principles of prefix computation and then develop the theory of lookahead logic circuits. The most common approach to speed up carry computation in adders with large operand sizes is to exploit the observation that carry propagation in binary addition is a prefix problem [16] .
Prefix problem:
Given n values z1, z2, . . . , zn and an associative binary operator ⊗, the prefix computation problem, or simply the prefix problem, is to compute the n values zi
In the context of binary addition of two n-bit numbers a and b, the carry for the ith bit can be expressed as
where g i = aibi and pi = ai ⊕ bi represent the generate and propagate bits. Since the prefixes g i and pi can be computed in parallel, the prefix problem reduces to efficient prefix computation and several tree structures, with size and depth trade-offs, have been proposed in literature to realize parallel-prefix adders [11] . We make the important observation that the parallel-prefix CLA can be thought of as an optimal timing-driven decomposition for carry computation and we generalize this as follows. Consider a Boolean function f (x1, x2, . . . , xn) of n inputs x1, x2, . . . , xn. Consider the decomposition for the Boolean function f given by the identity
where Σ i (called the window function) and fi are all functions of x 1, x2, . . . , xn. By drawing an analogy to the CLA representation from equation 1, we can interpret the CLA representation from equation 2 as a lookahead decomposition for the Boolean function f . Here, Σifi corresponds to the generate bit gi and Σi corresponds to the propagate function p i, 1 ≤ i ≤ l. The interesting connection between the CLA representation and the timing-driven decomposition lies in the expressions for Σ i and fi. Let us look at the timing-critical computation for the carry bit, c i, of each stage of the n-bit adder. Note that c i can be computed without the carry, c i−1, of the previous stage when ai = bi = 0 (ci = 0) and when a i = bi = 1 (ci = 1). Thus, the case ai = bi is not a timingcritical computation at the ith bit-slice. However, when a i = bi (ai ⊕ bi = 1), the carry of the previous stage is necessary to compute c i. Hence, ai = bi is a timing-critical computation at the ith bit-slice. When Σ i is set to ai ⊕bi and fi is set to the value of ci for Σi = 1, i.e., fi = ai or fi = bi, the timing-driven decomposition for c out for an n-bit adder is given by
which is equivalent to the expression for cout obtained using the prefix problem in equation 1. Thus, the key contribution of this paper for timing-driven optimization is the use of information about timing critical computation to identify window functions Σ i that produce lookahead logic circuits f i with fewer levels of logic. The regular modular structure of a binary adder makes it easy to identify a good timing-driven decomposition, Σi and fi. However, applying this technique to the synthesis of multi-level control logic circuits is challenging for the following reasons:
1. Control logic is irregular with multiple critical paths. Due to logic sharing, control logic defies the easy modularity that makes it possible to write a CLA-like representation for the Boolean expression of the critical paths. 2. The Boolean expression for the critical-path in control logic is significantly more complex, i.e., it cannot be expressed as a simple expression such as that for the carry in adders. 3. Both Σ i and fi for an adder have a disjoint support set for 1 ≤ i ≤ n, i.e., Σ i and Σj as well as fi and fj (i = j) do not have common inputs in their support. However, for multi-level logic circuits, Σi and fi may not have disjoint support sets. Hence, realizing them independently using separate logic circuits can be very expensive, and a good tradeoff would be to share logic between these functions. 4. Unlike an adder where the delay of each p i and gi term is equivalent to a single level of logic, the functions Σ i and f i may have different levels of logic and delays and hence combining them optimally is a challenge.
In the rest of this section, we will describe a synthesis technique for lookahead logic circuits (circuits for implementing Σ i and fi) that addresses these challenges.
Definitions
Decomposed logic circuit: A decomposed logic circuit is a directed acyclic graph (DAG) with nodes representing AND gates. The edge connecting a node i to another node j can be of two types: (i) complemented, when there is an inverter between the output of node i and input of node j and (ii) uncomplemented, when there is no inverter. Thus, a decomposed circuit uses AND and NOT gates as building blocks, and is referred to as an and-invert-graph (AIG).
Technology-independent network: A technology-independent network is an intermediate DAG representation of a circuit in which the internal nodes are arbitrary Boolean functions. An AIG can be converted into a technology-independent representation using clustering algorithms ("renode" command in the tool ABC [17] ).
Synthesis of lookahead logic circuits
Given a decomposed circuit C with n inputs, x1, x2, ..., xn, and m outputs, let l C denote the number of levels of logic in C. Although our implementation considers all outputs simultaneously, for ease of notation and without loss of generality, we refer to a primary output y containing at least one critical path, i.e., at least one path with lC levels of logic for the rest of this discussion. Consider the problem of obtaining a single level of timing-driven decomposition for the Boolean function, y, given by
as proposed in equation 2 to improve the performance of the circuit by reducing the number of logic levels. Attempting a function-based decomposition of the Boolean function y as shown in equation 4 has two major disadvantages. First, there is no knowledge of the circuit implementation of Σ1, y0, and y 1. Hence, a function-based decomposition may result in a bad choice of Σ 1, y0, or y1 that may lead to a higher number of logic levels than the original circuit. Since the space of decompositions is vast, finding a good function-based decomposition based on equation 4 with lesser levels of logic than the original circuit is challenging. Second, even with the knowledge of the functions Σ 1, y0, and y 1 that can potentially produce a good decomposition, directly synthesizing AIGs for these Boolean functions is a challenge and does not scale as the complexity of the function increases.
In this paper, we propose a novel approach to address the issues of finding the functions Σ 1, y0, and y1 and synthesizing their AIGs to have fewer logic levels than the original circuit. Our technique is based on two key ideas. First, we use transformations on the technology-independent network, T , of the original decomposed circuit, C, to synthesize the technology-independent networks for Σ 1, y0, and y1. The transformations are made by simplifying the Boolean functions of the internal nodes in the technology-independent network to reduce the logic levels of the circuit. In this process, the functions Σ1, y0, and y1 are derived dynamically during simplification. Second, we use global path sensitization information, extracted from the given decomposed circuit, C, as a metric to guide the simplification. This ensures that the simplifications transform the functionality of the internal nodes significantly to reduce delay while preserving the functionality at the primary outputs. Our technique has two stages: (i) extracting global critical path sensitization information from C and (ii) simplifying the technologyindependent network T to obtain the technology-independent representations of Σ 1, y0, and y1. We will now describe each step in greater detail.
Path sensitization information:
The aim of obtaining path sensitization information for output y is to identify minterms in the input space of y that are responsible for exercising all the speed-paths (critical or near-critical paths) in the decomposed circuit. These minterms are referred to as the timing-critical minterms or speedpath minterms in the input space of y. We shall refer to this set of minterms as the speed-path characteristic function (SPCF) for y. Thus, for a given delay Δ, the SPCF for y contains all minterms that sensitize paths of length greater than or equal to Δ. To compute the SPCF for a decomposed circuit, in which the delay is given by the number of levels of logic, Δ may be set to an integer value greater than 0. In that case, the SPCF will contain all minterms that sensitize paths with greater than or equal to Δ levels of logic.
Several algorithms have been proposed for the exact computation of the SPCF [7, 18] . These algorithms compute the exact set of minterms that sensitize paths with a delay greater than or equal to a desired value. These algorithms are path-based and require traversal of each critical path. Other algorithms that compute an approximation of the SPCF have also been proposed [19, 20] . These algorithms compute an over-approximation of the SPCF, i.e., minterms that do not sensitize critical paths may be included in the SPCF. The over-approximation algorithms are computationally more efficient than path-based algorithms because they are node-based and require computation only at nodes that lie on the critical path. Note that the SPCF is used only as a metric to guide the synthesis of the lookahead logic circuit. Although our implementation computes the SPCF exactly, it is possible to use the over-approximation techniques to compute the SPCF for computational efficiency.
After the SPCF for output y is computed, simplifications are made to the technology-independent network T . The simplification of T is performed in two stages. The first simplification, referred to as the primary simplification, is used to synthesize the technologyindependent networks for Σ 1 and y0 and the second simplification, referred to as the secondary simplification, is used to synthesize the technology-independent network for y1. Both primary and secondary simplifications involve simplifying the Boolean expressions of the internal nodes in T . As a result of the simplifications, the Boolean function for output y is transformed to y 0 in the primary simplification and to y 1 in the secondary simplification. In the primary simplification, additional logic for the technologyindependent network of Σ 1 is also added to T .
Primary simplification of T :
The pseudo-code for the primary simplification algorithm is shown in algorithm 2. The main goal of the primary simplification of T is to reduce the number of logic levels by simplifying the Boolean function of the internal nodes in T . When an internal node in T is simplified, the original Boolean function at y is changed to y 0. By adding additional logic to T , the algorithm ensures that the window function, Σ 1, is altered suitably (as described in algorithm 1) so that y 0 = y when Σ1 = 1. The algorithm ensures that the additional logic does not cancel the reduction in logic levels obtained as a result of the simplification of the internal node. Another goal of the primary simplification is to obtain a good window function Σ 1. As we have seen in the carry lookahead adder example in equation 3, functions containing timing-critical minterms or speed-path minterms form good window functions. Thus, the SPCF for the output y is used as a metric to guide the simplification of the internal nodes as explained below.
Using the SPCF:
Consider an internal node j in the fanin cone of output y in T . Let bj denote the Boolean function of this node. Thus, b j is a typical Boolean function with 10-15 inputs. The SPCF contains the global critical-path sensitization minterms for output y. Letb j denote the Boolean function obtained after simplification of b j . In order to use the SPCF information for simplification of b j , we assign a weight w(c) to each prime implicant cube c in the off-set and on-set of b j . The weight w(c) is the fraction of minterms in the SPCF that will be covered in the window function Σ 1 ifbj(c) = bj(c). Thus, w(c) is the metric based on which the Boolean function of the internal node is simplified. The cube weights can be easily computed for each node using the global Boolean functions of each node and the SPCF. Note that the cube weights for a node are computed only if the node is chosen for simplification. The function reduce in algorithm 2 describes the procedure for choosing nodes for simplification. The function simplify in algorithm 1 describes the procedure for simplifying the Boolean function of a node using the SPCF. At the end of the primary simplification, a technology-independent network for Σ 1 and y0 is obtained for every output y with lC levels of logic.
Secondary simplification of T : The primary simplification determines the window function Σ1. In the secondary simplification, T is reduced to generate the technology-independent network for y 1. Thus, in the secondary simplification, the complement of the window function, Σ1, is used to assign cube weights for the internal nodes. However, unlike the primary simplification, where the nodes had to be carefully chosen for simplification in order to obtain a good window function Σ 1, the only objective of the secondary simplification is to generate the technology-independent network for y 1. Hence, the objective is to reduce the levels of logic in T as much as possible. This is done by replacing all cubes with zero weight by don't cares to simplify the Boolean function of every node. After the secondary simplification, the technologyindependent network for y 1 is obtained for every output y with lC levels of logic.
Reconstructing y: In general, equation 4 can be used to reconstruct y from Σ 1, y1, and y0. However, there are several simplifications that can be applied when Σ 1, y1, and y0 satisfy implication properties with y. For example, consider y0 ⇒ y and y 1 ⇒ y. This means that y1 is a 1-approximation for y and y0 is a 0-approximation for y. This can be used to reduce y to Σ 1y0 + y1. In this manner, 28 unique implication-based rules can be identified for the simplification of the Shannon decomposition in equation 4. We do not list them in the paper for brevity. In our optimization runs, we have observed that the implication-based rules are frequently used to reduce the number of levels of logic while reconstructing y. Finally, the technology-independent network for the reconstructed y is converted into a decomposed circuit by converting each node in the technology-independent network into an AIG. Area recovery is then performed using standard redundancy elimination algorithms.
Algorithm 1: simplify(j) input
: j is a node in T with Boolean function bj and logic level lj output :bj , the simplified Boolean function for node j S0(S1) is the minimum 0(1)-SOP of bj w(c) is the weight of cube c, c ∈ S0 or c ∈ S1
window(j) =bj (bj ) else Both 0-SOP and 1-SOP for j have non-zero weights Initializebj = x /* don't care */ L -Cubes of S0 and S1 in decreasing order of weight
Quantifying logic levels in T : The logic levels for the nodes in a technology-independent network is used during the simplification of the technology-independent network in the proposed algorithm and is also used to keep track of the progress in the reduction of the logic levels. The logic level for a node j, level(j), is computed using the minimum sum-of-products (SOP) representation of the off-set and on-set for the Boolean function of node j. The minimum logic level is computed for the Huffman AND tree of each prime-implicant cube in the off-set and on-set. The minimum logic level for the Huffman OR tree is then computed using the minimum logic level of each cube. The smaller logic level value, between the off-set and the on-set, is defined as the logic level for node j.
In addition, to computing the level of each node, the critical inputs can also be identified for each node. An input to a node is critical if the reduction of its level is a necessary condition for reducing the level of the node. The critical inputs to a node are also used in the the function reduce to explore candidate nodes for the function simplify.
Case study: n-bit adder
Historically, the adder has been an excellent example for evaluating various timing-driven optimization techniques primarily because of its regular prefix structure. Fast implementations of an n-bit adder include the (i) carry lookahead adder (CLA), (ii) carry select or conditional carry adder, and (iii) carry bypass or carry skip adder. In Sec. 2, we have described how existing timing-driven optimization techniques have used one of these adders as a motivating example to develop timing-driving optimizations for general multi-level logic circuits. In contrast, our timing-driven optimization technique can be used to derive all these fast adders from a ripple carry adder. Let a and b be two 2-bit binary numbers and c in be the carry-in bit. Let y denote the two bit sum and cout denote the carry. Let g i = aibi denote the generate bit and pi = ai + bi denote the propagate bit.
The simplest implementation of an n-bit adder is a ripple carry adder that can be realized by linearly cascading n full adders. Although the ripple carry-adder has a small area, the critical path delay of the ripple carry adder is O(n). The carry-propagation logic is the most delay-intensive operation in a ripple carry adder. In a 2-bit ripple carry adder, c out = g2 + p2(g1 + p1cin) with 5 levels of logic. We will now explain how our timing-driven decomposition can transform a ripple carry adder into all these fast adders.
CLA (4 levels, disjoint):
Based on the discussion in Sec. 3, two levels of timing-driven decomposition, i.e., (Σ2, y2) and (Σ1, y1) can be used to convert a ripple carry adder into a CLA. The window functions at the two levels are disjoint. Σ1 = (a1 ⊕ b1) and Σ2 = (a2 ⊕ b2) y0 = cin, y1 = a1, and y2 = a2 cout = Σ2y2 + Σ2Σ1y1 + Σ2Σ1y0
Carry select and carry bypass adders (4 levels, overlapping): For the carry select and carry bypass adders, a single-level of decomposition is sufficient to realize the final implementation. However, it is important to note that 2-bit carry select and carry bypass adders have 4 levels of logic if a multiplexer is considered as a single level of logic. Both decompositions are overlapping because y 1 and y0 have common inputs in their support. For the carry select adder, we have: Σ1 = cin, y0 = g2 + p2p1, and y1 = g2 + p1g1 cout = Σ1y1 + Σ1y0
For the carry bypass adder, we have: Σ1 = p2p1, y0 = cin, and y1 = g2 + p2g1 cout = Σ1y1 + Σ1y0
New decomposition (4 levels, overlapping): The proposed technique also reveals another decomposition of the 2-bit adder with 4 logic levels. This decomposition also falls under the category of a single-level overlapping decomposition. Σ1 = cin + g2 + p2g1, y0 = g2 + p2p1, and y1 = 0
From these examples, it is clear that even a simple circuit like a 2-bit adder has four different decompositions with the optimal number of logic levels. This illustrates the expressive power of overlapping timing-driven decomposition techniques to extract equivalent descriptions with area-delay tradeoffs.
For a 2-bit adder, it is easy to identify many different fast implementations. In general, for an n-bit adder (n ≥ 4), identifying the adder implementation with the optimal number of logic levels is non-trivial. To illustrate this, we present the best results from SIS, ABC, an industry-standard synthesizer, and our technique to optimize an n-bit (n = 2, 4, 8, 16, 32) ripple carry adder (details of the scripts used are given in the next section). We compare the results of synthesis to the theoretical number of logic levels required to generate the carry in a tree-structured CLA for each value of n in table 1. Note that in the optimum tree-structured CLA, the critical path terminates in the output computing the most significant bit (MSB) of the sum. Hence, the optimum number of logic levels for a 2-bit tree-structured CLA is 5, even though cout has 4 logic levels. The number of logic levels obtained using existing techniques is higher than the theoretical optimum for the tree-structured CLA. In contrast, our technique provides the optimum solution for n = 2 and returns a circuit with one level of logic less than the optimum for n ≥ 4. This is because our approach is able to identify a Boolean factoring for the MSB of the sum and c out simultaneously. 
Results
Our timing-driven optimization technique for synthesis of lookahead logic circuits is implemented within ABC [17] . All experiments were run on a 64-bit 2.4 GHz Opteron-based system with 6 GB memory. The performance of lookahead logic circuits is compared to state-of-the-art academic tools SIS and ABC, and an industry-standard synthesizer. Fifteen circuits from the MCNC and ISCAS benchmark suites and the OpenSPARC T1 processor are used to compare our technique to the best results obtained using these tools. Each benchmark circuit is optimized with each tool and mapped to a library of gates for the 65nm CMOS technology. For each circuit, an equivalence check is performed after optimization to ensure that the original and optimized circuits are equivalent. Our approach is computationally efficient, with a runtime of 100 seconds on the largest circuit considered in this paper.
The first two columns in table 2 give the circuit information. Subsequent columns report the number of gates in the AIG, logic levels in the AIG, technology-mapped delay, and the power consumption at 1GHz for the best results obtained with each optimization tool. Within SIS, the scripts delay, rugged, algebraic, and speed_up were used. For each benchmark circuit, the best results with the lowest technology-mapped delay are reported in the table. Within ABC, script resyn2rs was used. Within the industry-standard synthesizer, each design was compiled with the options -map-effort high and -area-effort high. The last row in the table compares the tools, on average and normalized to the industry-standard tool. On average, our technique shows a 40%, 56%, and 22% reduction in the number of logic levels in the optimized circuit over SIS, ABC, and the industry-standard synthesizer, respectively. Note that, on average, the size of the decomposed circuit obtained using our technique and the industrystandard tool are comparable. When mapped delays are evaluated, our technique achieves an average reduction of 21%, 56% and 10% over the best results of SIS, ABC, and the industry-standard synthesizer, respectively. For our technique, the trade-off for a 10% improvement in mapped delay over the industry-standard synthesizer is a 10% increase in the total power consumption.
Conclusions
This paper described a timing-driven optimization technique based on lookahead logic circuits. Lookahead logic circuits are synthesized by simplifying the technology-independent network of the original circuit using path sensitization information. The original logic circuit is then reconstructed from the lookahead logic circuits using Shannon's decomposition and its implication-based simplifications. The use of a technology-independent network for simplifications provides a computationally efficient means for searching a rich space of circuit decompositions to enhance the performance of the original circuit.
