Abstract-Logic synthesis is a crucial step in digital integrated circuit design. There are methods for exact synthesis of twolevel design able to handle very large circuits, with hundred of inputs, although of limited usefulness in VLSI circuit and system design. On the other hand, exact multi-level synthesis is a quite complex task, where the majority of algorithms are heuristic. To evaluate and validate new methods, benchmarks are of great importance. In particular, exact benchmarks unlock the possibility to evaluate the effectiveness of synthesis algorithm with respect to the optimal solution. This work proposes a novel method to generate exact multi-level circuits based on reversible logic. The proposed approach is able to build exact benchmark circuits with around 40 millions nodes in short time, acting as the identity function f (x) = x. It means, the most compact circuit corresponds to only wires, without any logic gate instantiation. The proposed work is complementary to other circuit generation approaches, being easily combined to explore particular characteristics of related benchmarks.
I. INTRODUCTION
Logic optimization plays a crucial role on the development of VLSI circuit design. In order to guarantee the quality of current synthesis tools, as well as the development of new ones, it is quite useful to have available techniques for evaluating the efficiency of related algorithms. Usually, it is done by using benchmark circuits as test vehicles in the quality-ofresults (QoR) evaluation. Most of the available benchmark suites can be classified as non-exact, i.e., when the optimum circuit implementation is unknown [1] [2] [3] [4] . Therefore, the evaluation of new methods is usually done by assessing them against the best known results for a given circuit synthesis [5] . However, even though this strategy of assessment is widely adopted, it is not possible yet to know how far is the optimized circuit from its optimal solution. Thus, exact benchmarks are of great interest to have a deep understanding on CAD tool algorithm effectiveness and QoR. Exact synthesis of two-level logic circuit is possible for quite large functions, with hundreds of inputs [5] [6] . On the other hand, exact multi-level synthesis represents a much harder task, and can be formulated as a dynamic search problem. Its complexity relies on the fact that the entire set of decisions is unknown until the search begins. Hence, the search space increases according to the network growing during the synthesis process. Given the problem complexity, the majority of algorithms are heuristics. These algorithms must meet some constraint criteria while minimizing other cost functions as much as possible [6] . Exact benchmark circuits help to look for improvements in current heuristic methods.
Exact multi-level circuits can be built by using two different strategies: either by running exact synthesis algorithm over a given function specification, which usually does not scale for large functions; or by constructing a circuit from no particular specification, and ensuring its exactness. The method proposed in [5] generates synthetic exact multi-level circuits in a constructive way. Such a method relies on balanced binary trees and can derive circuits with up to 1024 support size and more than 600,000 AND-inverter graph (AIG) nodes. The work assesses the state-of-art open-source synthesis algorithms and shows the lack between current methods and the exact solution.
In this work, an exact benchmark circuit generation based on reversible logic principle is proposed. It is taken the assumption that the exact circuit implementation to be applied as benchmark presents null logic depth and AIG nodes. In this sense, a redundant logic that represents the identity function f (x) = x is built, and the synthesis tool might replace all original AIG nodes just by wires, without any logic gate instantiation. An interesting characteristic of the proposed circuit is that the optimal solution in terms of node count is also the optimal solution in terms of logic depth (and viceversa). The proposed algorithm is able to generate AIGs with around 40 million nodes in a short time. The method can be used in a few different ways: (1) one can simply generate synthetic logic blocks; (2) it is also possible to embed other circuitry/functionality along with synthetic logic blocks; (3) one can combine the proposed method with the one described in [5] in order to create huge AIGs with known optimal solution.
The rest of this paper is organized as follows. Section II briefly summarizes both exact and non-exact benchmark suites available. Section III presents the proposed method, where the generated benchmark circuit implements the identity function f (x) = x. The possibility of instantiating any logic functions in the benchmark to stress the synthesis algorithms is also discussed. Experimental results are shown and analyzed in 978-1-5386-7431-4/18$31.00 c 2018 IEEE Section IV. Finally, Section V outlines the conclusions and future works.
II. BENCHMARKS
This Section focuses on presenting previous exact and non-exact combinational benchmarks. Even though most of real designs are sequential, combinational benchmarks are important because they are portable, i.e., many academic tools only deal with combinational circuits [4] . Also, logic synthesis algorithms are designed to deal with pure Boolean functions, which are implemented by combinational circuits. Finally, note that while synthesizing sequential circuits, synthesis tools usually deal with flip-flops as primary inputs, leading to a set of combinational clouds.
A. Non-exact Benchmarks

1) ISCAS'85 and ISCAS'89:
At the year of 1985, the International Symposium on Circuits and Systems (ISCAS) community introduced its first benchmark suite. The ISCAS'85 [7] , [2] is composed by ten purely combinational circuits. In 1989, the ISCAS'85 was extended and sequential circuits were also introduced, creating the ISCAS'89 [8] , which has increased the size and complexity of previous circuits. Both were firstly proposed to evaluate combinational and sequential automatic test-pattern generation (ATPG) tools. Despite their initial purpose, these circuits have been used as benchmarks for methods in additional areas, including logic synthesis.
2) MCNC Benchmarks:
In 1985, a group from U. Leuven published a small set of two-level synthetic circuits. In 1986, this set was extended by a Berkeley group with the addition of two-level industrial circuits and arithmetic operands. In 1989, Microelectronics Center of North Carolina (MCNC) published at the International Workshop on Logic and Synthesis (IWLS) what would latter become known as the LGSynth'89 benchmark suite [9] . LGSynth'89 was primarily proposed for logic synthesis and optimization and it republished both the U. Leuven's and Berkeley's two-level circuits, along with more industrial multi-level and finite-state machine circuits. In IWLS'91 and IWLS'93, LGSynth'89 was successively extended to LGSynth'91 and LGSynth'93, respectively [1] , [10] . From those efforts, different conferences and workshops followed this trend of publishing new benchmark suites, what includes HLSynth92, PDWorkshop93, Partitioning93, just to name a few.
3) ITC'99 Benchmarks: Initially published at International Test Conference (ITC), in 1999, and later extended in 2000, the ITC'99 benchmark suite was also primarily developed for design-for-testability (DFT) and ATPG [11] , [12] . It can be divided into four subsets. Although it brings good-sized circuits (up to 98K+ gates and 6K+ flip-flops) and theoretically comply with its initial purpose by publishing fault schemes along with the circuits, this benchmark suite lacks of maintenance. Except from the I99T subset, all other circuits are hard to find for download. Even for the I99T subset, its original functionality may have been lost over time. Its release clearly states that, due to the development process, there is no guarantee that VHDL descriptions are functionally meaningful.
4) IWLS'05 Benchmarks:
In 2005, the IWLS community made a new benchmarking effort [3] . The IWLS'05 benchmark suite brings the previously published ISCAS'85, ISCAS'89 and ITC'99 (subset I99T), together with three brand new sets of circuits: one is a selection from the OpenCores collection and two are industrial initiatives from Gaisler and Faraday companies. The OpenCores selection is comprised of 26 digital modules (or cores), from a PCI decoder to a VGA/LCD controller. The Gaisler set presents 4 good-sized circuits, as the 32-bit processor LEON3, which has around 900K cell instances. Then, the Faraday set brings three functional blocks: (1) a 16-bit DSP with SRAM blocks; (2) a 32-bit RISC CPU; and (3) a Direct Memory Access Controller.
5) OpenCores Benchmarks:
A joint work by Altera Corp. and UC Berkeley presented at IWLS'07 a new FPGA-oriented benchmark suite [13] . The initial release consisted of 8 large designs, each containing at least 10,000 4-input look-up tables. Still, this benchmarking effort has been updated and an interesting improved subset claims attention: 12 medium-size OpenCores designs free of multi-entity hierarchies, memories or other hard blocks (which would create design flow restrictions), and also free of adders/multipliers (which would be functionality synthesized to macros in ASIC flows or mapped to dedicated circuitry in FPGAs).
6) EPFL Benchmarks:
The EPFL benchmark suite [4] was published in IWLS'15. The set of circuits is purely combinational and includes ten arithmetic circuits, ten random/control circuits and 3 synthetic circuits with more than ten million (MtM) gates.
Although the EPFL benchmark suite represents a good initiative, it also fails in some aspects: purely combinational arithmetic circuits are not realistic, in the sense that large arithmetic operators are usually sequential (pipelined); the proposed random/control circuits are reasonably small; and regarding the MtM subset, despite runtime and scalibality benchmarking, purely synthetic circuits (with no other embedded logic) can be less relevant for other sort of evaluations, such as optimization algorithms QoR.
In Summary, even though there are several public combinational and sequential benchmarks, all the aforementioned sets of circuits have similar weaknesses. First, most of them are small and no longer represent nowadays challenges for most of the applications. Also, there are some of them which lack from maintenance. Finally, the IWLS'05 benchmark suite, which is well maintained and has circuits with good size, suffers from the same issue all the non-exact benchmark circuits suffer: the lack of information about the optimum possible implementation of their circuits.
B. Exact Benchmarks 1) LEKO Benchmarks: Logic synthesis Examples with
Known Optimal (LEKO) [14] proposes a method to generate circuits with known optimal technology mapping solution. The method relies on generating a small circuit, with a known optimal solution in size, and then follow a strategy to replicate this circuit in order to derive huge circuits. The small circuit is designed by hand to be as hard as possible to synthesis tool to map. The size optimality is given in terms of 4-inputs Look up tables LUTs. The same work also presents the Logic synthesis Examples with Known Upper bounds [14] (LEKU). LEKU serves as sub-optimal start pointing to synthesis tools. These circuits are derived from LEKO by collapsing the circuits in a two level network and decomposing the result in a network with simple gates.
2) Logic-Depth Exact Circuits:
The method proposed in [5] generates exact multi-level circuits from no specification in a constructive way, based on balanced binary trees. To guarantee a certain complexity in the generated circuits, the authors discuss methods to design binate circuits and break possible disjoint support properties. With the exact circuit, the authors proposed an approach based on binary decision diagram (BDD) to obtain a sub-optimal circuit as start point to feed the logic synthesis tool. The main motivation of such a method is to generate exact circuits in logic-depth, since the LEKO suite generates exact circuit in size.
Despite the availability of exact benchmarks, this work proposes a novel strategy to generate an exact benchmark in both, size and depth, where the optimal solution for both metrics is the same. Our method is flexible and can derive huge circuits. Also, the method can be combined with aforementioned exact benchmarks, as presented in Section III.
III. PROPOSED METHOD FOR EXACT BENCHMARK GENERATION
A. Identity Logic Block
The proposed method to generate exact multi-level benchmarks is based on reversible logic. Reversible logic implements a bijective function, i.e., from the circuit primary outputs (PO) it is possible to determine its primary inputs (PI). In other words, there is a one-by-one correspondence between PIs and POs. Reversible functions are a subset of multiple-output Boolean functions [15] . For further discussion in reversible logic, please refer to [15] , [16] .
Our basic identity logic block (ILB) comprises two parts or stages: the first one corresponds to a reversible function F , whereas the second one implements the reverse function F −1 . As a result, the value of the second stage output is equivalent one-by-one to the input value of the first stage. Thus, by connecting both blocks, the method derive an ILB, corresponding to F (x) = x. For a given number of variables (primary inputs), the proposed method obtains a bijective function to be used as the first stage as well as on its reverse (second stage). The resulting ILB block diagram is shown in Fig. 1 .
There are different possibilities for generating the first stage of the ILB. It is possible to derive an exact benchmark based on any reversible circuit. To do so, one alternative is to take an irreversible function and to embed it in a reversible function. It can be done by applying known embedding methods, such as the ones presented in [17] and in [18] , and then by arranging the obtained reversible function into two stages in order to build an ILB. Another possibility is to generate a synthetic circuit and guarantee its reversibility by construction. In this work, we are generating synthetic circuits with random logic, which are reversible by construction as follows.
ILB First Stage:
The first stage corresponds to a circuit defined by a reversible multi-output Boolean function f , with n inputs and m outputs, which is randomly generated. In the proposed approach, we have considered n = m. The first step to generate f aims to enumerate unique positive integer from 0 to 2 m − 1 and store it in a vector. In the sequence, a random shuffle procedure from C++ standard library is applied over the vector. Finally, the method constructs a truth table that represents f , where the output patterns of f are constructed row by row. Therefore, the output bits of the i th row of f are defined by the binary expansion of the i th integer in the vector. The generated function is bijective, since the generated integers are unique and do not lead to repeated patterns at the outputs of f . In Fig. 1, P(x) , with 1 ≤ x ≤ m, denotes the outputs of f , which are intermediate signals of the ILB. The first stage is described through a PLA file, which is applied as input in the ABC tool [19] and then converted to a BLIF file.
ILB Second Stage:
The intermediate nodes of the ILB, i.e., the outputs of the first stage, are the inputs of the second stage. Hence, this second stage is designed to rebuild the primary inputs of the first stage, being so In = Out. The second stage is easily obtained by mirroring the PLA description of the first one, i.e., the input and output patterns of the truth table of the first stage are just swapped to define the second one. The second stage PLA is also converted to BLIF, and then both stages are appended into a single BLIF that describes the whole ILB. Finally, we derive an AIG from such resulting BLIF by running the ABC command strash. Table I , which describes the behavior of an arbitrary ILB, is an example of a possible function generated by our method for n = m = 3. In Table I , the ILB primary inputs (PI) denote the inputs of the first stage, the intermediate nodes represent both the outputs of first stage and the inputs of the second stage, and the ILB primary outputs (PO) are the outputs of the second stage.
The truth table presented in
Since the number of I/O have a strong impact in both, AIG size and depth, we present the characteristic of generated AIGs while increasing the number of I/O. The trends in AIG size and depth are illustrated in Fig. 2 , where the y axis is multiplied by 10 7 , and Fig. 3 , respectively. The number of nodes ranges from 6 up to 40 millions as the number of I/O increases from 2 up to 21. The charts show the exponential tendency for both, number of nodes and logic depth while increasing the number of I/O. As discussed in [20] , it is expected trillions of logic gates to design future generation of ICs. Therefore, a set of exact large circuits is important for investigating new algorithms targeting the next generation of logic synthesis tools.
Even though the proposed approach is able to generate large circuit described in PLA format instantaneously, the conversion from PLA to AIG tends to be a time consuming task, as the I/O increases. In our experiments, while considering circuits beyond 21 I/O, the conversion from PLA to AIG in ABC [19] tool have taken more than one day. In these cases, we have considered a time out. Finally, several ILBs can be generated independently and combined to increase even more the number of AIG node, logic depth and I/O, as discussed in the next sections. 
B. ILB with Embedded Logic
This section presents an approach for placing any desired logic sided to ILBs. This way, exact circuits generated by the method presented in [5] can be combined with ILBs to generate larger exact benchmarks. Actually, the following approach can be used to build benchmarks by connecting ILBs to any logic, exact or not. Logic functions that appears often in digital circuit design can be combined to ILBs in order to avoid purely synthetic circuits.
Starting from small examples found in the previous section, we now propose to embed a custom logic block in order to observe its impact. Custom blocks can be connected to ILBs in different ways: (1) Notice that by placing the custom block between the two stages of a given ILB, the identity function feature may be lost. In this experiment, we have placed a 2-input exclusive-OR (XOR2) logic individually, between the PI and the ILB, as Fig.  4 shows. The benchmarks with embed logic were synthesized by running the same set synthesis commands used during the experiments with the ILBs, and the results are presented in Section IV-A. Finally, Fig. 5 gives an insight about possible ways to embed logic in a structure with multiples ILB. Let the colorful blocks represent any logic, and the white blocks represent ILBs. The ILBs can be placed in both axes, increasing either the total AIG breadth and depth. By increasing the AIG breadth, it is possible to simulate as many PI as wished. If the target custom logic has more PI than a single ILB, it is possible to instantiate as much ILBs as necessary to achieve the desired number of PI. On the other hand, by cascading ILBs with custom logic blocks, the logic depth is increased and it is possible to analyze logic depth oriented algorithms. This flexible architecture allows our method to be combined with different and exact circuits for benchmarking. 
IV. RESULTS AND DISCUSSION
A. ILB Synthesis in open-source tools
We propose a progressive approach to validate our method. First, we have incrementally generated circuits 1 starting from n = m = 2 up to n = m = 21. Then, we have executed some commands in ABC and CirKit [21] to optimize the target AIG size. The optimization commands are run over the generated AIG. Since the exact number of nodes is known in our benchmark, i.e., the exact solution comprises zero nodes, our objective is to reduce the AIG size to zero without being constrained by the logic depth. To do so, we have chosen the ABC commands that, per default, do not take into account level preservation. In the CirKit tool, we have chosen the mig rewrite with area as the cost metric for optimization. We stop running each command when the AIG size is no longer reduced during ten successive iterations.
The results of ILB synthesis are presented in Table II . The first three columns show the AIG I/O, size and depth, respectively. In the commands columns, y indicates that the command have found the exact solution. Otherwise, it is presented the final number of AIG nodes after stop running the command. We are showing results up to n = m = 15 because most of the commands have already stopped finding the optimum number of nodes for n = m = 5. Moreover, the command dsd has taken more than one day and has not finished the execution for ILB circuits with more than 14 inputs. Therefore, in these cases we have considered time out (TO). Similarly to the analysis presented in [5] , the main goal of our experiments is not to compare the available synthesis methods. Instead, we are concerned to figure out the frontiers where algorithms stop to find out exact solutions. Determining such a frontier is interesting because it enables to select small case studies where the exact solution is not found, contributing to the effort toward possible improvements in logic synthesis algorithms.
The results presented in Table II can be justified by the intrinsic characteristics of the applied synthesis algorithms. Most of these algorithms are based on tables of precomputed structures, which are used for replacing subgraphs defined by 4-input cuts [22] . Notice that the exact solutions are found often for the case studies with up to 4 inputs. In these cases, even by using 4-input cuts, the algorithms have a global view of the related AIG due to their reduced number of inputs and AND nodes. However, the local nature of such optimizations is strongly related to the problem of escaping from local minima for larger AIGs.
In order to verify whether increasing the number of AND nodes in the circuits with up to 4 inputs would change the algorithm behaviour, we have cascaded ILBs with the same number of I/O nodes. By doing that and re-running the commands, the results are quite similar. It can be explained by the fact that all blocks have the same number of I/O. Therefore, the command computes k-cuts to deal with the first stage and simplifies it to a wire (no logic). In that sense, while dealing with the next ILB, the previous have already been consumed, and the k-cuts are computed at the PIs of the next ILB. By the end, it becomes equivalent to synthesize each ILB independently.
Regarding the command dsd, it is expected that this method is able to reach the exact solution, since its underlying data structure for Boolean decomposition is a BDD [23] . Therefore, when considering the proposed case studies, the exact solution with zero nodes can be found as soon as the BDD construction is finished. In this sense, there is room for improvements in the proposed method for exact benchmark generation, in order to build harder benchmarks for BDD-based approaches.
Finally, in order to evaluate whether the randomness during the generation of the proposed benchmarks brings a bias to the experiments, we have generated and evaluated ten more circuits with 4 and 5 I/O nodes. This experiment has not presented any changes on the results, i.e., in the frontiers between exact and non-exact solutions. Therefore, these results are not explicitly presented herein. However, with such an experiment, we could evaluate that the randomness of our approach is not crucial when looking for the frontiers of exact solutions.
Even though the randomness seems to not affect directly the experiments, it may generate benchmarks with synthetic logic. The main issue when considering synthetic logic is that it may present structures quite different from many practical circuit designs. This way, in some cases, such benchmarks may not be the best alternative for fine tuning and improvements in synthesis algorithms, when considering both quality of results and runtime. There are different ways to overcome this drawback. One possibility is by embedding irreversible functions, which appears often in real designs, in a reversible one, as previously explained. Another alternative is by inserting a custom block, with a known exact and more realistic circuit, into the ILB, as discussed in Section III-B.
Results with the XOR2 gate in our custom block are presented in Table III . In this experiment, the XOR2 has been placed as depicted in Fig. 4 . As can be seen in Table III , the addition of a simple gate as custom logic may cause a small noise in the obtained results. For instance, the drw command has stopped finding the exact solution for the circuit with 3 I/O and 28 AIG nodes. Still, this experiment could be extended for exact multi-level functions. For instance, a non-dsd readpolarity-once (RPO) function may be suitable to cause noise on the dsd command. 
B. ILB Synthesis in a commercial tool
The same set of circuits used in the open-source tools were also synthesized by a commercial tool. It is interesting noting that, for all the study cases, the generic synthesis was not able to find the exact solution, as shown in the column "Generic Synthesis" from Table IV. In general, it is expected that the generic synthesis presents potential to find out the exact solution, since the minimum solution can be achieved by only applying technology independent optimizations in our benchmarks. However, as the commercial tool is a black box, it is quite difficult to understand and predict its behavior.
In this experiment, we have proceeded in the logic synthesis flow by running the technology mapping after the generic synthesis. Technology mappers can be classified as structural and functional mappers. Structural mappers do not change the given logic representation during the mapping, whereas functional mappers can perform Boolean decomposition in order to change the underlying logic representation targeting better results [24] . The experimental results shown that the commercial tool has delivered the exact solution for the most part of the cases after the technology mapping, as presented in the last column from Table IV . Thus, the results suggest that more powerful optimizations are performed in the commercial tool during the technology mapping. V. CONCLUSION This work presented a novel method to generate exact multi-level circuits in both the number of AIG nodes and the circuit logic depth. Being able to generate exact benchmarks is fundamental to to evaluate logical synthesis algorithms effectiveness. The proposed method is based on reversible logic and can generate huge circuits in AIG node count. Even though results for circuits computing synthetic random logic has been presented, it is discussed the flexibility of the proposed approach in incorporating any custom function into the building identity logic blocks (ILBs). In that sense, this method can be both, customized in different ways and combined with different benchmarks, in order to evaluate different aspects in synthesis algorithm development. The proposed constructive approach is useful to find out small circuits where the exact solution is not found. This kind of case study is interesting for investigating improvements in current methods. Finally, the work confirms the lack on finding exact solutions in state-of-art open source tools. Furthermore, it was also shown that there is still room for algorithm improvements in commercial tools as well, which stopped finding the exact solution for circuits with more than 50,000 AIG nodes. Finally, all the generated circuits are public available online to serve as benchmark.
