Rewriting is a common approach to logic optimization based on local transformations. Most commercially available logic synthesis tools include a rewriting engine that may be used multiple times on the same netlist during optimization. This paper presents an And-Inverter graph (AIG) based rewriting algorithm using 5-input cuts. The best circuits are pre-computed for a subset of NPN classes of 5-variable functions. Cut enumeration and Boolean matching are used to identify replacement candidates. The presented approach is expected to complement existing rewriting approaches which are usually based on 4-input cuts. The experimental results show that, by adding the new rewriting algorithm to ABC synthesis tool, we can further reduce the area of heavily optimized large circuits by 5.57% on average.
I. INTRODUCTION
Logic optimization approaches can be divided into algorithmic-based methods, which are based on global transformations, and rule-based methods, which are based on local transformations [1] . Rule-based methods, also called rewriting, use a set of rules which are applied when certain patterns are found. A rule transforms a pattern for a local sub-expression, or a sub-circuit, into another equivalent one. Since rules need to be described, and hence the type available of operations/gates must be known, the rule-based approach usually requires that the description of the logic is confined to a limited number of operation/gate types such as AND, OR, XOR, NOT etc. In addition, the transformations have limited optimization capability since they are local in nature. Examples of rule-based systems include LSS and SOCRATES.
Algorithmic methods use global transformations such as decomposition or factorization, and therefore they are much more powerful compared to the rule-based methods. However, general Boolean methods, including don't care optimization, do not scale well for large functions. Algebraic methods are fast and robust, but they are not complete and thus often give lower quality results. For this reasons, industrial logic synthesis systems normally use algebraic restructuring methods in a combination with rule-based methods.
In this paper, we propose a new rewriting algorithm based on 5-Input cuts. In the algorithm, the best circuits are precomputed for a subset of NPN classes of 5-variable functions. Cut enumeration technique [2] is used to find 5-input cuts for all nodes, and some of them are replaced with a best circuit. The Boolean matcher [3] is used to map a 5-input function to its canonical form. The presented approach is expected to complement existing rewriting approaches which are usually based on 4-input cuts. In [4] , the authors proposed an optimization flow composed of balance, rewrite and refactor processes, and implemented it in the tool ABC [5] with the script resyn2. Compared to [4] , rewriting using 5-input cuts exploits larger cuts and more replacement options, thus has the potential for getting resyn2 script out of local minima, providing better rewriting opportunities.
The presented algorithm is implemented using structurally hashed AIG as an internal circuit representation and integrated in ABC synthesis tool as a command rewrite5.
II. ALGORITHM DESCRIPTION
The presented algorithm can be divided into two parts: 1) Best circuit generation 2) Cut enumeration and replacement Part 1 of the algorithm tries to find the optimal circuits for a subset of "practical" 5-variable NPN classes, and stores these circuits. Part 2 of the algorithm enumerates all 5-input cuts in the target circuit, and chooses to replaces a cut with a suitable best circuit.
In the implementation of rewriting using 4-input cuts in [4] , pre-computed tables of canonical forms and the transformations are kept for all 2 16 4-input functions [5] [4] . As we extend rewriting to 5-input cuts, the size of these tables becomes 2 32 . i.e. too large for using in a program that runs on a regular computer. In our implementation, we use a Boolean matcher [3] to dynamically calculate the canonical form of a truth table and the corresponding transformation from the original truth table.
A. Best circuit generation
Similarly to [4] , we pre-compute the candidate circuits for each NPN class so they can be directly used later. There are 616126 NPN equivalence classes for 5-input functions, among which only 2749 classes appear in all IWLS 2005 benchmarks as 5-feasible cuts. We picked 1185 of them with more than 20 occurrences, and generated best circuits for representative functions of these classes.
Due to the expanded complexity of the problem, we had to make some trade-offs between the quality of the circuits and the time and memory usage of our algorithm. Our implementation has following differences compared to [4] :
• Use of Boolean matcher to calculate canonical form, instead of table look-up.
• Use of a hash map to store the candidate into best circuits, instead of using a full table.
• When deciding whether to store a node in the node list, a node with the same cost as an existing node is discarded, instead of being stored in the list.
• Nodes of both canonical functions and the complement of the canonical functions are used as the candidate circuit, while in [4] complement functions are not used.
• When the number of nodes reaches an upper limit, a reduction procedure is performed before the generation continues, leaving only the nodes used in the circuit table. We use two structures to store the best circuits: the forest, list of all nodes, and the table, storing only the pointers to the nodes in the list, which represent canonical functions or their complements. In the forest, a node can either be an AND node or an XOR node, and two incoming edges of a node have complementation attributes. The cost of a node is the number of AND nodes plus twice the number of XOR nodes those are reachable from this node towards the inputs.
First, the constant zero node and five nodes for single variables are added into the forest. The constant node and one of the variable nodes are added to the table, since all variable nodes are NPN equivalent. Then, for each pair of nodes in the forest, five types of 2-input gates are created, using the pair as inputs:
• AND gate • AND gate with first input complemented • AND gate with second input complemented • AND gate with both inputs complemented • XOR gate A newly created node is stored in the forest if the following conditions are met, otherwise it is discarded:
• The cost of the node is lower than any other node with the same functionality.
• The cost of the node is lower than or equal to any other node with NPN-equivalent functionality. In addition, the pointer to this node is added to the table if the following condition is also met:
• The function of the node is the canonical form representative, or its complement, in the NPN-equivalence class it belongs to. When the number of nodes in the forest reaches an upper limit, a node reduction procedure is performed, where only the reachable nodes from the nodes in the table are left in the forest.
The algorithm stops when the number of uncovered "practical" classes is smaller than a threshold value.
Finally, the generated best circuits are stored, so they can be used later when rewriting takes place.
B. Cut enumeration and replacement
We use a quite similar cut enumeration and replacement technique as in [4] . The main difference is that we use a Boolean matcher to calculate the canonical form of the NPN representative as well as the transformation to the canonical form from the original function, while in [4] , a faster table look-up is used.
The Boolean matcher proposed in [3] calculates only the canonical form representation. We modified it slightly so it can simultaneously generate the corresponding NPN transformation, which is needed when connecting the replacement graph to the surrounding network.
Nodes are traversed in topological order. For each node starting from the PIs to the POs, all of its 5-input cuts are listed [2] . The canonical form truth table and the corresponding NPN transformation of each cut are calculated using the Boolean matcher. Each cut is then evaluated whether there is a suitable replacement that does not increase the area of the network. Finally, the cut with the greatest gain is replaced by a pre-computed best circuit. In the presented algorithm, zerocost replacement is accepted, since it is a useful approach for re-arranging AIG structure to create more opportunities in subsequent rewriting [6] .
III. EXPERIMENTAL RESULTS
To evaluate the effectiveness of the proposed approach, we performed a set of experiments using IWLS 2005 benchmarks with more than 5000 AIG nodes after structural hashing. All experiments were carried out on a laptop with Intel Core i7 1.6GHz (2.8GHz maximum frequency) quad-core processor, 6 MB cache, and 4 GB RAM. First, as comparison, we ran resyn2 script different number of times for each benchmark, recording the resulting area and the runtime of the optimization flow. Then, with the same numbers of resyn2 runs, we inserted rewrite5 commands between each pair of resyn2, and recorded the further improvement in area, and the additional runtime.
The comparison of average results is summarized in Table I . The improvement in area converges after certain number of resyn2-rewrite5 iterations. The increase of improvement is insignificant for more than four runs of resyn2. The experimental results show that, our algorithm can further reduce the area of heavily optimized large circuits by 5.57% on average. 1 
