Abstract -In this paper, a technology mapping algorithm is proposed for heterogeneous FPGAs. The technology mapping problem is first formulated as a flow network problem. Then, an algorithm based on the min-cost max-flow algorithm is presented to select a proper set of feasible LUTs for various objectives. The objective, the total area composed of LUTs and routing area, are discussed in the paper. This algorithm has been tested on the MCNC benchmark circuits. Compared with other existing LUT-based FPGA mapping algorithms, the algorithm produces better characteristics.
I Introduction
In a traditional lookup table (LUT)-based FPGA device, the configurable logic blocks (CLBs) are composed of k-input LUTs whose input number is constant. To maximize device utilization, heterogeneous FPGAs provide an array of homogeneous LUTs of different sizes or an array of physically heterogeneous LUTs. For example, the XC4000
[l] and ORCAZC [2] series FPGAs can be configured to have heterogeneous LUTs. The technology mapping problem for LUT-based FPGAs involves producing an equivalent circuit for a given circuit using only gates that can be implemented with.LUTs. This paper addresses the technology mapping problems for heterogeneous FPGA designs. There are several homogeneous FPGA technology-mapping algorithms for minimum layout size. However, most of these algorithms are unable to deal with heterogeneous FPGAs. A recent work [9] has shown that the area minimization-mapping problem for a tree network can be solved optimally in O(n3). However, this algorithm is significantly limited because the optimality holds only for the tree. The HeteroMap algorithm [lo] was presented to reduce the number of LUTs for heterogeneous LUT-based FPGAs. An algorithm is proposed in this paper. The proposed algorithm can be configured for various objectives, the minimum number of LUTs, the total area, or others. To minimize the total area of a FPGA, two parts: 1) The LUT area and 2) the routing area are considered simultaneously.
The technology mapping problem is formulated first as a flow network problem. An algorithm based on the min-cost max-flow algorithm is then presented to select a proper set of LUTs from the set of feasible LUTs. An enumerating algorithm to generate all feasible LUTs is also presented. We implemented this algorithm and compared the empirical results with other LUT-based FPGA mapping algorithms. The results demonstrate the eficiency of this algorithm.
The remainder of this paper is organized as follows. The terminology and a graph-based formulation of the problem are described in Section 2. An algorithm for solving the problem is shown in Section 3. Section 4 gives an algorithm to generate the set of feasible cones. The objective, the total area, are discussed in Section 5 . Experimental results are shown in Section 6.
Our concluding remarks are presented in Section 7.
Formulation of the Mapping Problem
An FPGA technology mapping problem can be formulated as a graph based problem. A combinational logic circuit can be represented by a directed acyclic graph (DAG), We assumed that a general LUT-based heterogeneous FPGA consists of LUTs of n types. Each LUT of one type has k inputs, kE {k,, k2, . . ., k,,}. Homogeneous FPGAs can be viewed as a special kind of heterogeneous FPGA with one and only one type of LUT. The technology mapping problem can be described as: Given a 2-bounded Boolean network, according to the objectives, find a set of feasible cones such that the union of all feasible cones includes all vertices. network can solve this problem. Given a directed acyclic graph, G = ( V,U KO, E), let V, be the set of feasible cones in G.
According to G, a flow network which: 1) V, is the set of vertices representing gates, 2) a vertex in V, represents a feasible cone, 3) s and t represent the source and the sink respectively, 4) there is an edge e,, E Ebp directed from a vertex i E V, to a vertex j EV, if i is a vertex in the k-feasible cone associated with j and the capacity cap(e,)=l, 5 ) there is an edge e,] E E,, directed from s to every vertex ic V, , and the capacity cap(e7,)= 1, and 6 ) there is an edge eJl E E,, directed from each vertexjc V, to t , and its capacity, cap(eJ,), is equal to the number of vertices in the cone C, = (?,I$) associated withj. It is assumed that a LUT in the heterogeneous FPGA has k-inputs, 21 k 5 4 . In Gun, an edge is said to be saturated if the flow through it is equal to its capacity. Since a V, vertex has only one out-edge, for convenience, a V, vertex is also said to be saturated if its out edge is saturated. On the other hand, if the flow through a V, vertex is zero, the V, vertex is said to be empty. Proof: Since there is one and only one edge directed from s to each V, vertex and its capacity is one, the upper bound of the total flow is the total number of vertices in Vr If the maximum flow is equal to the total number of vertices in V,, the mapping includes all of the vertices in V, . If we select the set of feasible cones corresponding to the vertices, V, is saturated. By the construction rules, every gate is included in one and only one selected cone..
Consider the flow shown in Figure 1 .b. The bold lines from a V, vertex to t are saturated edges; the others are zero.
Since the total flow is equal to 1 VJ, by Theorem 1, the set of cones induced by {U, v, w } and {x, y, z, m} can be selected to be an optimal mapping solution. There are several ways to generate a set of feasible cones [4] . If the set of feasible cones is not rich enough, it may not be possible to obtain a feasible solution. feasible cones is presented in section 4.
An algorithm to generate the set of 111. An Algorithm for Finding the Min-cost Max-flow as Mapping Solution
The mapping solution is not unique. Given an objective, we can define a cost fimction to assign a weight on each edge in G, , and find the min-cost max-flow as the solution. There are several optimization objectives in the technology mapping process. Two objectives, the minimum number of LUTs and the total area, are discussed in Section 5A and B. In this section, it is assumed the weight of cost on every edge is given.
There is an algorithm [13-141 that can find the maximum flow with minimum cost. However, a solution found by this algorithm does not ensure the flow through the sink edges is either saturated or zero. The bold lines in Figure 2 illustrate the min-cost max-flow network. It is seen that the flow through the vertex a is neither saturated nor zero. The flow path in Go, found by the min-cost max-flow algorithm must be changed such that the V, vertex is either saturated or zero. When the flow path is changed, an algorithm which increases the minimum cost bound is needed. It has been proven that this problem is a NP-complete problem [13] . We will construct a max flow solution by a greedy algorithm. The strategy of this algorithm is as follows: 1) Find the minimum-cost maximum-flow in the G,,,.
2) The cones associated with the saturated V, vertices are selected to be in the mapping solution.
3) A V, vertex, j , is called a candidate vertex if j is non-saturated and the cone corresponding to j includes no Vg vertices covered by selected cones. The total cost increases if we change the path of flow through other non-saturated V, vertices such that j becomes saturated. The increase of total cost to force j saturated is denoted ACG). 4) According to the calculation in step 3, find the candidate vertex,j, whose ACO) is minimum and change the path of flow to makej be saturated. 5 ) Select the cone corresponding to j to be in the mapping solution. If there exist candidate vertices, go to step 3.
For every candidate vertex, j , calculate ACO).
)
If every V, vertex is covered, a mapping solution is obtained. Otherwise, no mapping solutions exist.
The min-cost max-flow algorithm [13-141 first finds a maximum flow in GUAm. Then it constructs an auxiliary graph and iteratively reduces the total cost by finding directed cycles with negative costs in the auxiliary graph.
We need an approach to calculating ACG) of a candidate vertex j . Let C,=( 4, E,) be the cone corresponding to j and V, be a subset of V, such that in G,, the flow coming from a V, vertex goes into j . Let = vJ -V, and XE v, . Assume in G,, e, , is the edge directed from x t o j and the flow from x to a vertex y passes along the edge exy. If the flow passing along e,, is changed to pass along eXJ, the increase of total cost is equal to cost(e,,)-cost(e,).
A C ( j ) = ~(cost(e,)-cost(e,)) --Accordingly,
LEV,
where eXJ is the edge directed from x t o j and eXy is the edge passed by the flow from x toy.
Consider the example in Figure 2 . It is seen that the vertex c is saturated. Hence we select the cone induced by {x, y , z, m} which is associated with c. The vertex b is a candidate vertex. On the other hand, d is not a candidate vertex because the cone associated with d includes {x, y , z } which are covered V, vertices. To make b saturated, the flow along the three edges, <U, a>, <v, a>, and <w, a>, must be 
V The Optimization Objective
The total area is the most important objective in the technology mapping process.
A large LUT must use a large area. A mapping with the minimum number of cones may not lead to the minimum total area because the size of the LUT grows exponentially proportional to the number of cone inputs. It is better to optimize the total area.
Assume that C, is a cone associated with a V, vertex and AJ is the total area needed for using a logic block to implement C,. Similar to the cost fimction defined to minimize the total number of LUTs, if the cost of the edges between a V, vertex and the V, vertex is set to cost(e,) = A,h, and the cost of the other edges are zero, the total area can be minimized by finding the min-cost max-flow in G,,, where nJ is the number vertices included in C, .
Recall that the FPGA area includes the area for the LUTs and the area for routing. Let Ab be the area for a logic block, and A, be the area needed for interconnection if a cone C , is selected. Then A, =Ab +A,. An approach for calculating Ab, and a method for estimating A, are needed.
To calculate A,, the area for a LUT-based logic block, we used the logic block model shown in the literature 3 [ l ] . Let BA be the bit area required to store a static RAM bit and FA be the fixed area required to implement the D flip-flop and all of the other associated circuitry. The area for a logic block, Ab, is then:
( 1 ) Ab B A x~~ +FA. To estimate the routing area, we used the model proposed in the literature [3]. We can consider the needed routing area to be the space taken by the routing tracks on two of the four sides of the logic block, as shown in Figure 3 .
Let Np be the number of pins. Assume that the pitch of a routing track is approximated as the square root of the area required by a bit. The dimension of a channel is then Npx a . The area for interconnection is:
According to (1) and (2), the total area of a logic block is: A, = { B A x~~ +FA} + { ( N P ) ' x B A + 2~~x Npx&}.
According to the experimental results shown in [3] , Np must be at least k+l and proportional to the total number pins of CLBs. 
IV Experimental Results
We used C language to implement the proposed algorithm on a SUN ULTRA SPARC workstation and tested several circuits from the MCNC logic synthesis benchmark set. To produce a more accurate area analysis for heterogeneous FPGA, testing was accomplished using FlowMap [5], HeteroMap [IO] , and the proposed algorithm on XC4000 series FPGAs that can implement circuits with 4-LUTs and 5-LUTs. Using 1 . 2 5~ CMOS technology, BA was estimated about 40Qpm2 and FA was 5 1 0 0 p 2 . Therefore, the total area of a 4-LUT was estimated as { 4 0 0~2~+ 5 1 OO}+{ (5)2x400+2 J400 x z4 + 5 100 X~X 6 } = 4 2 9 4 8~. The results are shown in Table 1 . It confirms effectiveness of the proposed algorithm in terms of the area. 
