This paper describes a new procedure for generating very large realistic benchmark circuits which are especially suited for the performance evaluation of FPGA partitioning algorithms. These benchmark circuits can be generated quickly. The generation of a netlist of 1OOK CLBs (500K equivalent gates), for instance, takes only two minutes on a standard UNIX workstation. The analysis of a large number of netlists from real designs lead us to identify the following five different kinds of sub-blocks: Regular combinational logic, irregular combinational logic, combinational and sequential logic, memory blocks, and interconnections. Therefore, our generator integrates a sub-generator for each of these types of netlist. The comparison of the partitioning results of industrial netlists with those obtained from generated netlists of the same size shows that the generated netlists behave similarly to the originals in terms of average filling rate and average pin utilization.
INTRODUCTION
The evaluation of the performance of various EDA tools is generally based on the comparison between different sets of experimental results obtained by the application of the tools on benchmark circuits. The increasing size of circuits and systems requires new partitioning algorithms capable of handling netlists with up to several million gates. Therefore, benchmark circuits of approximately the same size are needed.
Various benchmarks exist in the literature. For several reasons they do not have the chara&ristics required l Existing benchmark suites such as those collected by ACUISIGDA, or those maintained and distributed by the CBL' [ 1 l] appeared in numerous papers in the past. They are, however, too small when compared with netlists to be haudled by the new generation of partitioning algorithms. Indeed, the biggest netlist from the benchmark Partitioning93 contains 2904 CLBs2 after its implementation on the XC3000 family of Xilinx FPGAs [13] . Gnly one benchmark from AClWSIGDA has more than 26000 cells [17] . lCo~ve &dunarking Lam 2%e CLBs (Contigurable Lo& Blocks) provides functional cells that implement the users's logic inside an FFGA.
Permission to make digital or hard copies ofall or part of this work for Personal or classroom use is granted without fee provided that copies arc not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. -ro copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISPD '99 Monterey CA USA Copyright ACM 1999 I-581 13-089-9/99/04...$5.00 l The results obtained with these benchmarks for various partitioning tools are difficult to compare. As lqloml in [3] , au of these benchmarks were originally designed for testing either placement or synthesis tools. Before using them to test partitioning tools, they have to be translated into partitioning formats. The result is that, in various publications, the same circuit has been shown with different properties.
l Real industrial netlists which have the requhzd size cannot be used for benchmarking due to patent rights. Recently, Alpert described in [l] the ISPD98 benchmark suite. It consists of 18 circuits varying from 13K to 210K modules in a hypergraph description format obtained by translation from internal IBM designs. Unfortunately, important information concerning circuit functionalty, timing and technology has been erased by the transformation process which was obviously necessary to comply with patent rights. Due to the representation format of the circuits, the pelformatlce of the following partitioning algolithms cannot be determined correctly by using the ISPD98 benchmark suite:
Algorithms, such as [14] , which are based on the method of functional replication need information about the module fUllCtiOtlS.
Atgoritbms, such as [15] [16], with an objective function which aims at mhimkng the length of the critical path need information about module delay and sequential modules. Partitioning algorithms for FPGAs, such as [6] [12], are based on netlists of CLBs. These CLBs are characterized by a&d number of input and output pins. As the b&&mark circuits do not comply with this constraint, they cannot be used in the hypergraph format, but have to be transformed into netlists of CLBs. As the functionality of the various modules is not available, it is impossible to synthesize a r&list of CLBs from an ISPD98 benchmark circuit. A certain number of external connectors (up to 90% [ll) in the benchmark circuits are bidinztional. In order to apply a partitioning algorithm that uses the orientation of the nets, like cone algorithms [4] or algorithms based on the max-flow min-cut theorem, such as [7][20] . it is necessary to transform these coMectors into directional ones. 'Ihe area of research is open as to how this can be done without changing the structure of the netlist. Furthermore, due to the increasing size of netlists, which can reach several million gates nowadays, a benchmark suite of fixed size, with at most 210K cells, will rapidly become obsolete. only a generator of heterogeneous netlists covering a large spectrum of circuit types and sizes is able to respect all required demands: It would be able to quickly create a large number of netlists of various sizes with different structural characteristics, while complying with patent rights.
The best way to evaluate the performance of various partitioning tools is to apply them to a netlist and compare the results. For FPGA partitioning, these results are often measured in terms of the number of FPGAs needed to implement a given netlist. The closer the netlist is to a real netlist, the more significant is the test. Therefore, the benchmarks have to be as realistic as possible. In this context, a generated netlist can be said to be realistic if the application of several partitioning tools to an industrial nedist, which is supposed to have the same characteristics, leads to similar results in terms of average tilling rate and average pin utilization. Theii generator takes into account the constraints of digital circuits, but the consequence of the random chamcter of the interco~on process is that the stmctuml information of real circuits is not properly captured. Ghosh et al. [8] extracted a wiring signature from a reference circuit in order to obtain a wiring-signature equivalent class. A wiring perturbation induces a perm&ed reference circuit. The resynthesis of this pernn&d reference circuit &tally leads to a number of mutants which can be used as benchmark circuits. The advantage of this method is that the generated mutants are not completely random. Hutton et al. [9] ,[10] developed a generator called GEN, that uses statistical information extracted from an existing design in order to generate an identical random netlist based on LUTs3. The suitability of netlists generated by GEN for evaluating the performance of partitioning tools has been tested with an adder and a multiplier. We partitioned a 32-bit CLA adder into units of 20 CLBs with 20 input and 20 output pins. Then, we cloned this adder with GEN and partitioned the clone into units of the same size. This procedure was repeated with a 32-bit multiplier. The constraints in this case were lixed to 60 CLBs with 60 input and 60 output pins. The results are displayed in They show that the generator GEN is not able to generate netlists which behave in a way similar to realistic netlists when partitioned. In fact, the number of partitions needed differs significantly from that of the original netlist while the partitioning process takes more CPU time. A possible explanation of these results is that GEN does not consider structural information which is always present in real designs (m particular the hierarchical structure). Figure 1 shows the original netlist of the 32-bit adder which is very regular. The partitioning tool cuts horizontally with a small cutset 3A LUT (look-up table) of n input pins is a memory, which can implement any boolean function of n variables.
and therefore needs a small number of partitions to contain the entire netlist. figure 1. Netlist of a 328it adder Figure 2 . Netlist of the clone of a 32-bit adder Figure 2 shows the clone of the 32-bit adder generated with GEN. It is rather irregular in comparison to the original netlist, and therefore difficult to partition. The number of inmmonnec tions between partitions is higher which leads to lower filling rates. These results suggest that more relevant information has to be taken into account to generate more realistic netlists, in particular structural information. Note that both drawings in figures 1 and 2 have been generated using the same graphic display tool. We have shown that existing hen&mark suites, as well as benchmarks generated by existing netlist generators, do not provide satisfactory solutions to the identitied benchmarking problem in the domain of partitioning. It will be shown that our netlist generator PartGen is able to overcome this problem: It quickly generates a large number of netlists with arbitrary size. These netlists are rep resentative of a wide variety of real designs and behave similarly to original netlists when partitioned. We can generate netlists that are easy to partition and netlists that are difficult to partition, so it is possible to examine the sensitivity, the robustness, and the performance of partitioning algorithms. Gur generator also makes it possible to analyse the CPU time and the memory requirements of a given algorithm when increasing the netlist size, while leaving its characteristics unchanged. The rest of this paper is organized as follows:
ln the next section, we identify the various types of netlists encountered in large real designs and present different generators, one for each type of netlist. The third section describes the integration of the different elementary generators into one supergenerator, which is able to generate netlists with arbitrary size and contents. ln the fourth section, we present the results of our experiments with the generated netlists, followed, in the last section, by a conclusion.
2. THE PARTGEN GENERATOR 2.1. Introduction: The different types of netlists The partitioning of a large number of netlists from real designs led us to identify five different types of designs which may be classified as follows:
l Regular combinational logic: Netlists of this type are very regular and generally easy to partition. 'Apical examples are arithmetic operators such as multipliers, or adders. 0 Imegdar combiitional logic: These netlists are the "glue" in a design. They are often located around large functional blocks and are characterized by great irregularity. l Memory blocks: These netlists contain memory blocks such as data cache or RAM. l Combinational and sequential logic: The netlists of this type consist of both combinational and sequential logic such as cache controllers. We will therefore call them controller netlists in the following sections. They are characterized by a high average number of pins of the nets. They are often irregular and therefore difficult to partition. 0 lnteramneetions: Different modules of a design communicate through netlists of this type. These interconnections are characterized by the number of nets, the number of pins on the modules and the small number of physical cells. Every circuit is composed of one or more of these types of netlists. Because these different types of netlists feature a quite different behaviour with respect to partitioning, our generator PartGen is built by combining five sub-generators, one for each basic netlist type. In this way, the generated netlists contain instantiations of one or more netlists of one or more different types. Gur approach was to develop the generators for each type mentioned above, and then to integrate them in a unique generator, PartGen. The generators and their integration into PartGen are presented below.
Regular combiitional logic generator
Netlists of this type are very regularly structured. Commonly used blocks of this type are adders, counters, multipliers, or pipeline structures such as data paths. The size of such a netlist depends on its interface. The use of an adder generator or a counter generator involves a great number of external pins of the netlist*. The partitioning of such a netlist leads to a relatively large number of partitions', which are only caused by the number of external connections, and not by the structure of the netlist. For the same reasons, the partitioning of a data path leads to a too small nlltnbef of partitions. Therefore, we used a multiplier generator-to create the generator for regular combinational logic.
2.3. Irregular combinational logic generator ln order to generate netlists of this type, we use the generator GEN (see [9] ), mentioned in the introduction. We use the simplest possible mode for GEN, where only the number of LUT input pins and the size of the netlist to generate are given. The generated netlists contains only LUTs with 4 input pins. 
Memory generator
Gur memory generator creates memory blocks which are 32-b& wide. In a way similar to the generation of regular combinational logic netlists, we can specify the number of memory block instances to be generated, as well as their size in terms of number of words.
2.5. Combinational and sequential logic generator The generators mentioned above use existing generators. Gur contribution is the development of a generator which creates random netlists having properties as similar as possible to real industrial controller netlists. The development is based on the analysis of an industrial netlist', using not only statistical, but topological information. 0 Generation process constraints We analyzed a typical industrial netlist with regard to the number or cells, jlip-Pops 0, nets, external pins, the number of input pins on the CLBs, the average fanout and the fanout distribution over the nets, as well as the length of the critical path. These characteristics are used by the generator as constraints for the generation process. The following requirements have to be respected during the generation process: 0 The analysis of the generator GEN [9] shows that the above conditions alone are not sufficient for the correct construction of a clone which has partitioning properties similar to the industrial netlist. ln fact, we need to better take into account structural information (in particular the hierarchical structure). The hierarchy of the industrial design was analyzed by applying a bipartitioning algorithm to the original &list [18] . The recursive application of this algorithm to the industrial netlist leads to the tree structure shown in figure 3. This hierarchical structure is reproduced during the construction process. The process starts with the generation of the macm-controllers; i.e. the macro-cells of the controller netlist. which ate the leaves of the tree, and recursively climbs the branches until reaching the root. We arbitrarily respect the leaf sizes and the depth of the tree shown in figure 3 , because the analyzed netlist is representative for this circuit family. 0 Macro-controller generation constraints Nets with a fanout greater than three have a great probability to be cut by partitioning. Gn the other hand, nearly 90% of the nets Vbe analysis of other industrial controller &lists shows that the choscn industrial controller is representative for this type of netlists. The results presented in table 4 conhn this hypothesis.
Vhis requirement helps to facilitate the generation process, describfd below. It does not inlIuence the partitioning results of the generated netlist. After the detenmnation of the netlist inter&e. each module must becoMectedtoothermodulesandlortoprimaryinpltandoutplt pins. The followlng constraints, which are the same as the routing constraints of the macroumtroller, have to be taken into account during this intercommction process: 0 Thenumberofnetsisequaltothesumoftheprimaryoutplt pins of the modules and the prhnary input pins of the netlist. l Each net must have exactly one driver to avoid short circuits or non-functional connectors. l Each input connector of the modules and each primary output pm of the netlist must be driven. 0 Loop3 on the same module have to be avoided l The process has to be as fast as possible.
Integration of the sub-generators in Pa&en
The different subgenerators have been &grated in a common enviromnentinordertofacilitatetheiruse. Anetlistcanbegenemted by specifying its size, the percentage of each netlist type that has to be generated and the number of modules of each type.
EXPERIMENT AL RESULTS
The validation of our generator was executed on a SUN Enkrprise 5000 server with 8 Ultra Spare II CPUs, 167MHz, and 4GB RAM. The criterion of quality for a partitioning algorithm is the average filling rate of the FPGAs needed to implement the whole netlist and the average pin utilization of the FKiAs. We pattitioned into two different IFGAs of the Xilinx 3000 family: the XC3064 with 224 CLBs and 120 I/O pins and the XC3090 with 320 CLBs and 144 I/O pins [19] . We used a partitioning tool. which recursively applies a bipartitiouing algorithm to the netlist. It stops, when the size of a partition and its pin numbef fit the FFGA constraints. This partitioner uses a ratio-cut algorithm similar to the one presented in [ 181. The generator for regular combinational circuits and the generator for memory blocks provide real. functional netlists. Therefore, they do not have to be validated. The validation of the other generators is presented in the following sections.
3.1. hegular combinational logic generator Irregular combinational logic is known to be difficult to partition. The validation of the use of the generator GEN for creating this type of netlist consists in showing that the output of GEN is much more difficult to partition than a regular combinational logic netlist. The experimental results are presented in table 3. The experiments show that the ratio (Fi / F,) between the average filling rates Fi (for the irregular combinational logic netlist) and F, (for the regular combinational logic netlist) lies in the rauge [0.26;0.39] for XC3064 FFGAs, and in the range [0.28;0.33] for XC3090 Fp-GAS. This means that the partitioning tool needs two to four times more FIGAs to implement an irregular combinational logic netlist than for a regular combinational netlist. This factor is realistic 
Combiitional and sequential logic generator
To vahdate the generator of combinational and sequential logic, we ampred the partitioning results in terms of average filling rates (F) and average pin utiliion (P) of nethsts generated with PartGen with the results of industrial netlists of this type. We also compared our results with those obtained for clone netlists of the industrial netlists generated using GENs [9] , and with the results obtained from partitioning of randomly generated netlists of the same size. The experimental results are shown in table 4. The size of the real netlists are 16893 CLBs (indust-1), 20718 CLBs (indust2), and 22578 CLBs (industf). Five different netlists of the same size have been generated with Pa&en and with GEN for each industrial netlist and partitioned to examine the sensitivity of both generators with respect to to the random generation process. We display only the best aud the worst results obtained Beside the recursive bipartitioning algorithm (RBA) [18] , we used two other algorithms. The first one is a recursive application of the FBB algorithm described in [20] with a balance criterion of 25% of the cells in each partition. We slightly modified the algorithm by doing a breadth tirst search to obtain source and sink nodes as far from one another as possible. The second algorithm is the DPRP algorithm applied to an order@ obtained with the scaled cost criteria [2] . The results presented in table 4 show that the partitioning of the nethsts generated with Pa&en achieves an average tillhtg rate and an average pin utilization comparable to the partitioning of the industrial netlists for three different partitioning algorithms. However, the average fillhrg rates obtained for the randomly generated netlists, and the netlists genemted with GEN, differ significantly. lndeed, the ratio (F I F,) between the average filling rates of the generated nethsts F , and of the original netlists F,, of XC3064 FPGAs lies close to 1 for the netlists generated with PartGen. The best results were obtained for algorithm RBA where the average filling rates of the generated netlists deviates less than 17% from the original ones for all three industrial netlists. FBB and DPRP achieve good results for netlist indust-1, and satisfactory results for the two other netlists. The worst results in terms of average filling rate are 0.58 (FBB) and 1.38 (DPRP) times the average filling rate of the corresponding industrial netlist. Note that in the best case, the average tilling rates obtained for the netlists generated randomly without any structural or topological information and the netlists generated with GEN do not exceed 0.44 (random) and 0.52 (GEN) times the average filling rates of the industrial netlists.
Incidentally, the ratio (T,, I T,) between the CPU times for the partitioning of the nethsts generated with PartGen T,, and of the original netlists T, is close to 1 for RBA and DPRP, and exceeds 3.2 only once for FBB. However, the ratio (T, / T,) between the CPU time of the partitioning of the randomly generated nethsts T, and of the original netlists is under 2.5 only once.
sA software tml named CIRC permits the extraction of statistical information about a netlist The mmator GEN takes into account tbis iofomation during the gemration phcess. The result is a clone netlist supposed to reproduce the same statistical information than the original one.
We conclude, therefore, that the netlists generated with PartGen have nearly the same hierarchical structure as the industrial netlists. The partitioning results for the various generated netlists also prove that our generator is stable enough to reproduce nearly the same properties regarding the partitioning algorithm. This shows that our generator is able to create netlists with partitioning properties similar to industrial netlists, thus validating our ap preach.
3.3. Generator for interconnections The generator of intercomrections links module instances created with the sub-generators in order to obtain a unique netlist. It does not introduce physical cells into the netlist and it cannot create netlists without the other sub-generators. The interconnections of real netlists depend on the modules it contains. There are no "typ ical" interconnections between modules. For all these reasons, the generator of interconnections cannot be validated separately. The inthtence of the generated intercomrections on the partitioning results remains an open area of research.
3A. Generating benchmark circuits with PartGen
We used our generator to create 30 benchmark circuits. Their size varies from 1OK to 1M CLBs. Their composition is varied. Each combination is respected, from a single type. netlist to a netlist that contains at least one instance of each netlist type. The resulting netlists are flattened netlists consisting of several modules. These netlists have been partitioned using RBA. The memory blocks were treated separately by a partitioning into SRAM-banks with a capacity of 512K words x 8 bit. The rest was partitioned into XC3064 FPGAs. The partitioning results of some of these netlists are presented in 
CONCLUSION AND PERSPECTIVES
We have described Pa&en. a generator of very large FPGA netlists for building partitioning benchmarks. Gur approach has been validated experimentaJly by comparing the behaviour of the generated netlists to real netlists with respect to partitioning. For all examples treated, the results have appeared to be very similar. Among the possible ways of improving Par&n, we mention further reiinement of the netlist types by identification of sub-types and the variation of the interconnection structure to realize BUSlike intercomrections between two modules. We also intend to use PartGen to create a set of benchmark circuits of large sizes which are available atftp://f5p-asimlip6~/ppuWm~~paren.
