This paper presents the application of geometric programming for combined high-level and low-level architecture parameter exploration. This paper builds an geometric programming framework for reconfigurable architectures, and presents a full delay and area model of an FPGA. This optimization allows high-level architectural parameter selection and the transistor sizing to be done concurrently. The transistor values are derived using 45nm predictive technology model. CVX framework for MATLAB is used to run the geometric programming framework. The area and critical path delay are determined for given cost function by single-stage and multistage approach.
INTRODUCTION
In recent years, a considerable evolution in the architecture of FPGAs and this enhancements in the architectures, results in a expensive and time consuming experiments. Commercial FPGAs of every generation contains refined or new routing, memory, logic and embedded block structures. New or existing CAD tools are used by FPGA architects to map benchmark circuits to the architectures [3] , [5] . Recent work, has suggested that the analytical techniques serve as the supplement to the experimental approach, in which the FPGA architectures are constructed using a model of relatively simple equations. Using this technique, the FPGA architect can investigate a much wider range of architectures.
The advantage of analytical approach is that the values for many architectural parameters can be optimized concurrently, which is different from experimental approach, in which typically one parameter is swept at a time. In [1] , the simultaneous optimization of several parameters of routing structures of an FPGA by creating analytical equations and the impact of these parameters on the area of an FPGA is been shown, and a Geometric Programming (GP) framework is used to determine the values for these parameters.
In this paper, the simultaneous optimization of transistor sizing and architecture is done, which allows us in optimizing the both area and critical path delay (speed) of the FPGA. The work can be summarized as follows:

A framework allowing, concurrent optimization of both low-level (transistor sizing) and high-level (architectural) parameters.
Using geometric programming an area-delay model of an FPGA fabric is formulated.
 Concurrent optimization of low-level and high-level parameters will lead to a significantly different architectural conclusions compared to a traditional flow. Particularly, cluster size should be increased rather than decrease leading to delay improvements as delay becomes more important than area.
In [7] , the general formulation for optimization of an electrical design based on an iteration process, involving successive routing and placement of circuits onto FPGA architectures is presented. In this work, the iterative refinement is removed, building an FPGA modeling technique and using a geometric program, a step of transistor sizing concurrently to a number of architectural parameters.
A geometric program is a constrained optimization problem of the following form: where the coefficient c must be positive. As an example, in this paper a monomial cost function (T total ) z (A total ) z is used, where T total and A total represent the variables delay and area respectively and z is a constant. Geometric programming is used extensively for circuit design problems, which include wire sizing, transistor sizing and robust design. Use [6] for extensive review of geometric programming in the context of circuit design.
ARCHITECTURE FRAMEWORK
An island style FPGA is assumed in which an array of blocks are connected using tracks organized in vertical and horizontal channels with single driver routing, as represented in VPR 5.0 [3] . Figure 1(a) shows the structure of a logic block of an FPGA architecture which consists of configurable logic blocks (CLBs), which are tightly packed with K-input lookup tables (LUTs) connected with N LUTs and with I external inputs. A K-level pass transistor multiplexer tree is used to implement a K-input LUT. F c,in : number of tracks that connect to each logic block input, F c,out : number of tracks that each logic block output can connect and F s : number of track end points that connect to each channel driver.
In FPGA except for LUT multiplexer, all other multiplexers are implemented using a two stage pass transistor. The multiplexers are used to configure the signal routing paths around the device , and thus are connected to the SRAM configuration memory.
DELAY MODEL
It has been shown previously that geometric programming is capable of optimizing the transistor sizing for delay [15] . Here this type of delay optimization technique is employed to model the combination of pass transistor structures and CMOS present in FPGA devices. The delay of a critical signal path through a circuit implemented on FPGA is considered. The critical path will pass through CLBs, CLB feedback paths, LUTs, connection boxes and switch boxes. The formula in (1) is used, where each term is described below. Each transistor in the circuit can also be represented as an RC network as shown in the figure 2. The capacitance and resistances values can be derived from the SPICE models of MOSFET devices. Here the values are derived using the 45nm predictive technology model [14] . Each resistance value for a transistor in architecture takes the form (2) and capacitance of form (3) or (4). R C , C D and C G represent the channel resistance, diffusion capacitance and gate capacitance of a transistor respectively. S i represents the width of transistor assuming all the transistors have minimum length. R nom and C nom,x nominal values are dependent on the process technology, the type of transistor and in case of capacitance, whether it is nominal diffusion or gate capacitance.
Fig. 2. RC delay model for a MOSFET
The delays through each of the paths (1) are evaluated, by employing the Elmore delay model [13] . The Elmore delay model is used to represent the delay in the networks of RC trees and previously been shown to model the delay in the FPGA routing pass transistor networks [8] . The Elmore delay is calculated by evaluating the sum of each segment delay from signal to its sink, as shown in (5), where delay of each Rent parameter of a given circuit Number of 2-LUTs in a given circuit Depth of circuit netlist in number of 2-LUTs
International Journal of Computer Applications (0975 -8887) Volume 82 -No 18, November 2013
6 segment is the sum of the resistance along the path multiplied by capacitance of that segment. T = C R i source to C i elmore paths i source to sink 
The expression of the total delay can be derived by breaking down the delay terms T x in (1) into its constituent paths. Each path begins at VDD or GND and ends at a transistor gate input, which leads to a number of paths as given in figure 3 (aj).
The delay T reg to OMUX corresponds to the delay from the register producing critical signal path over the MUX selecting whether the LUT output is being registered or not, and over the two-level buffer as given in Figure 3 (c).
The delay T LUT F/B path corresponds to the delay from the BLE output buffer over the pass transistor based MUX on the LUT input to its buffer, as given in figure 3 (b).
The delay T LUT delay corresponds to the delay from the LUT driver over all the levels of the multiplexer implementing the LUT, the 2:1 MUX and to the LUT output driver. This delay is the sum of the paths given in Figure 3 (h) and 3(e).
The delay T O/P CB delay corresponds to the delay from the BLE output buffer over the switch box multiplier to its first inverting buffer, as given in Figure 3 (b). In this case the Elmore delay is through the path to the switch box driver.
The delay T SB delay corresponds to the routing path signal between switchboxes. This represents sum of the driver delay and delay over the two level switchbox multiplier, as given in Figure 3 (a) and 4(i).
The delay T input MUX delay corresponds to the delay from the connection box output driver over the LUT input select multiplexer to the LUT input driver.
The delay T I/P CB delay corresponds to the path over the switchbox to the connection box, where the routed signal is consumed. This is the sum of the delay over the connection box, the driver delay and over the two inverting drivers in the connection box. These are given in Figure 3 (a), 3(i) and 3(j) respectively.
Finally, the delay T LUT to reg delay corresponds to the delay over the multiplexer implementing the LUT to the register input where the critical path terminates, as in Figure 3 (f).
Since each driving gate has two paths to consider: the charge and discharge path from the driving gate, the terms representing delay are given as inequalities. As an example, the inequality for the charge path through the nMOS transistor which corresponds to the delay T O/P CB delay as given in (6), as in the Figure 3 
AREA MODEL
An area model of an FPGA routing fabric alone was first presented in [1] , and is summarized in section 4.2. In this work the model is extended to deal with variable buffer sizing and include both the routing and logic architecture. The area model is based on minimum width transistor sizing model employed in [5] .
To evaluate how much logic area is consumed, use  , where n c is the number CLBs and estimated using the formula in [4] . The total area of an FPGA, A total corresponds to the sum of the logic area A l and routing area A r , as in (7). 
Logic Block Area
The area of the logic block is sum of the area dedicated to the following: the LUT input select multiplexer; the LUT, 2:1 multiplexer register and output buffer combination; the clock buffer and the set/reset logic. The clock buffer sizing and set/reset logic are assumed to be constant irrespective of the logic block architecture, the values are taken from [5] . Similarly, the size of the register size on the LUT output is assumed to be constant.
The LUT area is composed of pass transistors multiplexer cells, SRAM cells and internal drivers. The pass transistors size in the multiplexer (S n,LM ) is assumed to be equivalent. Similarly, the input buffer scheme is assumed to have the same size transistors. B li represents the sum of these buffer areas. This leads to (8) as an expression for the area consumed by K-input LUT, where B li , is the size of buffer driving LUT input and S SR is the size of an SRAM cell.
The 2:1 multiplexer consists of one level pass transistor multiplexer. The area A 21mux , given by (9) , where S n,21mux 
The sum of the transistor areas for the two inverters implementing the driver gives the output buffer combination. The area consumed by these combination of inverters is given by (10) .
where S n,LOdrv* and S p,LOdrv* represents the size of each nMOS and pMOS transistors respectively in a CMOS inverter.
An expression for the approximation of multiplexer area is given by (11) and (12) . The S n corresponds to the size of pass transistor and E corresponds to the number of inputs.
Each input select multiplexer is fully connected; every output feedback path and every input from the connection box can reach any LUT input, which leads to the expression in (13), which gives the area devoted to each of these multiplexers. S n,ISmux corresponds to the size of the pass transistors implementing the input select multiplexer. Since there are I+N inputs to the multiplexer, in (14) E IS,tree corresponds to the number of pass transistors in the multiplexer tree and in (15) E IS,tree corresponds to the number of SRAM bits. 
Routing Area
The amount of silicon area devoted to the routing fabric to consist of all the connection box and switch box multiplexers, in addition to their configuration memories and output buffers. Thus, the routing area will depend on the size of the multiplexers used to connect the signals to and from the logic blocks and I/O pins, the transistor sizing, the channel width and the size of the grid of logic cells.
The estimation of multiplexer sizes in the connection and switch boxes is based on the observation that the expression for the area of two level multiplexer in (11) can be approximated as given in (12) . The size of these multiplexers will depend on the channel width of the device.
The model developed in [2] is used to estimate the channel width. The model for architectures with the wires that span one logic block is shown in (18), where the minimum channel width W min is described by (19), and β, α in, α out and p f are empirical constants. In (19), λ corresponds to the average number of inputs used on each logic block and R corresponds to the average point-to-point wire length. The methods given in [9] are used to calculate the value of pointto-point wire-length for different logic parameters. 
In routing fabric there are two types of multiplexers: connection box multiplexers and switch box multiplexers. Using the approximation of (12) gives the expression of multiplexer area for the connection box as given in (20) and (21) gives the approximation of switchbox area. 
GEOMETRIC PROGRAMMING FORMULATION
This section show that the model can be expressed in a form conformable to GP. It is essential to express the model as posynomial terms less than or equal to one or as monomial terms with equality to one. The cost function is considered first, which takes the form of a monomial (23). It is possible to by varying the exponent weight z, for example, targeting only delay by setting z1  , or an equal weighting by setting z 0.5  . The exponent weight must be constant for each run of the GP. The model presented in before section is not in a form that is conformable to GP. The GP representation of the routing architecture model was given in [1] , here the focus is on presenting the logic area constraints in the correct form.
The sum of nMOS and pMOS transistors for each inverting stage gives the area of its each respective buffer in the FPGA , for example B lo in (10) . The expressions (24) and (25) gives the transformation into posynomial form for buffer area, 
The expressions (26)- (30) gives the area constraints in a standard form GP representation. The sum of logic area and routing area in (7) maps directly to the inequality constraint in (31), which gives an example, how the model maps to the constraints. 
The expression in (41) used to ensure that the transistor size does not violate the smallest feature size possible in the process technology, where TECH S corresponds to the constant representing the minimum feature size. The final constraints must be applied to all transistors in the GP.
The Geometric program takes approximately 43 seconds to run on a Intel Dual Core i5-2450M 2.5GHZ running windows 7. This is for each K and N logic parameters sweep.
RESULTS
To demonstrate the power of Geometric Programming framework, the framework is run using CVX framework in MATLAB [11] . Two different flows using Geometric Programming framework are modeled, to demonstrate the impact of concurrently optimizing the low-level and highlevel parameters.
In the first experimental approach, the K and N logic parameters are fixed for each run of the optimization tool. To find the optimal set of parameters, it requires to sweep across the values of interest. Each run of the tool reports the value of the total area, critical path delay and the objective function. The best architecture is selected for which, the values contributes the best value of objective function. Figure 4 
CONCLUSION
This paper presents the use of Geometric Programming for fast and early stage exploration of configurable architectures. This approach allows the concurrent optimization of highlevel and low-level architecture parameters, and shows that it is possible to gain in performance. In this experiment, the transistor values are derived using 45nm predictive technology model (PTM). The graphs are plotted for area and delay of the architectures for both single and multi-stage approach by varying the exponent z in the objective function 
11
In future work, spice is used to extract accurate delay information and wire delay model can also be included to improve the accuracy of the modeling approach.
