I. INTRODUCTION
A typical 2D-FPGA architecture is shown in Fig.1 . It consists of The functional blocks (or logic cells) are marked by L, which are separated by vertical and horizontal channels. There are W (called channel density) prefabricated parallel wire segments running between each pair of adjacent L-cells in both vertical and horizontal channels. The wire segments in a vertical (or horizontal) channel are aligned into W vertical (or horizontal) tracks; each track within a channel is assigned an integer in { 1,. . . , W } as its track ID. There are C-boxes in the channel between adjacent L-cells. A Switch Box (S-box), located at each intersection of a vertical and horizontal channels, contains programmable switches to connect wire segments running from its surrounding C-boxes. The S-boxes in Fig.1 are 4-way. A k-way S-box with W terminals on each side is denoted by (k, W)-SB.
When an FPGA is used to realize a specified Boolean function, the pins used to realize the Boolean function are partitioned into groups (called nets). Then the pins in each group are connected together to form a real net by using available wire segments and switches in both C-boxes and S-boxes; different nets are disconnected. The latter process is referred to as a routing. Conventionally, the routing process is divided into two subsequent steps, global routing and detailed routing. In this paper, we simply use the term of global routing to specify the connection topologies for all nets. The detailed routing decides the exact assignment of wire segments and switches used to materialize the complete routing. As the connectivity within C-boxes is complete, the routability of the entire chip is fully dependent on the structure and connectivity of the Sboxes [l, 2 , 3 , 4 , 7, 8, 9, 11, 12, lo] . Routability justification of an FPGA chip is always a tough job which is done usually by experiments as the results are router and benchmarks dependent.
In order to complete a detailed routing for the entire chip, the notion of a so-called greedy routing architecture has been addressed [9, 10, 121 . The approach starts the detailed routing from a pre-specified S-box. Then routings of adjacent C-boxes are determined and serves as the predetermined side(s) of other neighboring (and unmapped) S-boxes. This process is repeated (propagated) until the entire chip routing is done. Depending on the propagation order (e.g., either spiral or snake-like [9, 10, 12]), the process can be decomposed into a sequence of hside predetermined k-way S-box design problems fork = 3,4 and 0 5 h 5 k. An h-side predetermined (k, W)-SB B is said to be mappable with the h-side trackfired if for each k-way global routing GR with density at most W , all nets in GR can be realized in B simultaneously such that if a net N in GR joins terminals of some predetermined sides, then these terminals are reserved for N only. We simply call mappable if h = 0.
With this design scheme, we need to study the following two problems.
Problem ( jixed S-boxes, the FPGA chip is perfect mappable and contains the minimum number of switches.
In this paper, we focus on Problem (1) only. To judge a Sbox, [2] introduced the so called flexibility of a S-box which is the maximum number of switches joining one terminal in the S-box. Thus, it is desirable to design S-boxes which are mappable with h-side track fixed S-box, and have the minimum number switches as well as the minimum flexibility.
In [ 121, the authors designed an optimum mappable with 1-side track fixed (3, W)-SB with !' $ + 2W switches. In [8] , the authors proposed h-side predetermined (4, W)-SB (which is mappable with the h-side track fixed) with i W 2 + 3W, $ W 2 + 2W,5W2+ W switches and flexibilities W+3,3W, 3W, respectively for h = 1,2,3. A 4-side predetermined (4, W)-SB is the switch box with all possible switches and hence it has 6W2 switches of flexibility 3W.
The S-box family constructed in [8] can be shown to be optimum and mappable. However, in this paper, we will show that it is possible to further improve upon those "optimum" Sboxes should we be allowed to adopt a different design style. The goal of this paper is to design S-boxes to replace the hside predetermined k-way S-boxes proposed in [8] with (1) fewer number of switches; (2) smaller flexibility; and (3) easy to detailed-route. Our idea is to design a S-box with two levels. In Section 2, we provide the architecture of h-side extended k-way S-box for any 0 5 h 5 k. According to this architecture, we are able to provide workable h-side predetermined 4-way S-boxes (0 5 h 5 4) in Section 3, and compare them with the h-side predetermined 4-way S-box presented in [8] . Section 4 is a conclusion. 
THE EXTENDED

I
We usually have a full extension to ensure the mappability. By Theorem 1, the problem of designing extended S-boxes becomes the problem of designing mappable kernels. In We note that an h-side extended (k, W)-SB is a replacement of h-side predetermined (k, W)-SB in FPGAS.
THE EXTENDED 4-WAY SWITCH BOXES
We describe two families of (4, W)-SBs which will be used as the kernels in our extended 4-way S-boxes. 
, where all the unions are disjoint unions.
In [6] , we have designed the following optimum or near optimum mappable (4,W)-SB H(W) with at most 19W/3 switches.
Where H; (see Fig. 3 ) is a mappable optimum (4, i)-SB for i = 1,2,3,4,5, and a mappable (4, i)-SB for i = 6,7. Now we propose to use H(W) as a (4,W)-kernel in our h-side extended (4, W)-SBs. Let ES(W, h) be the h-side full extension of H(W). Fig. 4 shows the general structure of 4-way S-boxes are widely used in industry. We have provided a family of better S-boxes than the h-side predetermined S-boxes proposed in [8] in terms of the number of switches. Moreover, since H ( W ) consists of some HgS and at most one Hi fori = 2,3,4,5,7, we can design an efficient detailed routing algorithm for H ( W ) and hence for ES (W,h) . This is an useful observation as the mapping of global routings to detailed routings has been shown to have no efficient algorithm in general.
If the nets in a global routing are 2-pin nets, the authors in [4] designed the following (4, W)-SB U ( W ) , which contains 6W switches, and is mappable for 2-pin net global routings. 5.ES(3,h)forh=0,1,2,3,4 .
TABLE I COMPARISONSOFES(W,~) AND G(W?h)
For FPGAs in which only 2-pin net global routings are concerned, we propose the h-side extended S-box, ESu (W, h) to be the h-side full extension of U ( W ) . 
IV. CONCLUSIONS
"his is probably the first work giving the theoretic justification for designing multi-level (or multi-staged) S-boxes for improving P G A mappability. In this paper, we have investigated the problem of optimum S-box designs for achieving a perfect mapping upon the entire chip. Such a mapping can be achieved using the greedy routing structure that can construct an optimum routing on the entire chip from local optimum routings around individual S-boxes. To achieve such a goal, we have proposed a novel notion of extended switch box architecture which consists of a kernel and an extension level. Applying the reduction design technique for S-boxes, we can design all types of mappable S-boxes with significantly reduced number of switches and switching flexibilities. In particular, we have obtained a family of 4-way S-boxes which is the best in any aspect compared to other known 1-level S-boxes. These S-boxes can be used in the conventional FPGA chip (as shown in Fig.1 .) to achieve a perfect mappability on the entire chip applying the greedy routing schemes.
It is quite interesting to see that by adopting the proposed extended S-box design style, the switch density of an S-box can be significantly reduced in achieving the same routability compared to the conventional single-level based S-box designs. The results seem to point out a new avenue for designing new FPGA routing structures breaking beyond the conventional boundary between S-boxes and connection boxes (e.g. by merging the extension level of the proposed extended Sbox with the neighboring connection boxes). Meanwhile, the trade-offs between the number of manufactured switches and signal path delays attributed to the multi-level structures would be an important issue too. This will be our future focus of investigation.
We also note that the two-level construction of S-boxes provides more freedom for designers. We may not need full extension if we do not want perfect mappability, this way we will have various S-boxes for various purpose. We also point out that it is important to investigate Problem (2).
