In existing synthesis systems, the influence of the area and delay of the controller is not or not sufficiently taken into account. But the controller can have a big influence, especially, if a certain data-path realization requires a huge number of states and/or control signals. This paper presents a new approach on controller estimation during high-level synthesis for FPGA-based target architectures. The estimator, presented in this paper can be invoked after or during every synthesis-step, i.e. allocation, scheduling and binding, respectively. By considering the controller influence on the overall area of a design, design space exploration can he made more accurate and less error prone. We present an approach for estimating area of the controller based on information which are easily accessible during each step of highlevel synthesis, so no explicit description of the contmller, which usually will he generated after the binding, is necessary. This is particularly valuable in the allocation phase, where intensive design space explorations have to be done, based on fast and accurate estimates.
INTRODUCTION
High-level synthesis consists of several steps and during most steps some parameter, that have influence on various properties of the resulting hardware, need to be adjusted. The properties, that are important are the delay of the longest combinational critical paths, the latency and the area that it will occupy on the tar-
[Computer-Aided Engineering1:compurer-nided design (CAD)
Design.
Area Estimation, Controller, FPGA, High-Level Synthesis.
Permission to make digital or hard copies of all or paa of this work for p e r s~~a l or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific Drmission and/or a fee. ISSS'02, October 2.4.2002, Kyoto, Japan. Copyright 2002 ACM 1-581 13-576-9/02/00lO ... $5.00. get architecture. High-level synthesis is very expensive in terms of the number of calculations that have to be made. Furthermore, area and delay constraints can he checked not before all synthesis steps have been performed. For this reason it is necessary to have estimation methods to he able to predict those properties without actually performing the synthesis steps. Fast and accurate estimates are especially needed during the allocation phase, where the examination and evaluation of a huge number of possible candidates has to be performed. Our estimator predicts the expected data-path, based on the available information of the (panially-) synthesized design. Instead of building a real data-path, for estimation it is sufficient to have a model of the RT-structure which preserves the relevant data. The .creation of this model is based on a heuristic function, which predicts the expected schedule and binding, if one or both are not already performed. Additionally, the area of the expected controller is estimated (which will be the content of this paper). The RT-model includes all kinds of multiplexors, i.e. those needed for resource sharing and those resulting from the control-structure. Information about the number and location of registers in the data-pdth is also included. Many estimation techniques have been proposed so far and some considering the controller, but nearly all of them need at least a behavioral description of the controller. Because it is necessary to have accurate estimations even before scheduling and binding, where no such description is available, we propose an approach which is able to predict the area of a controller, based on information, which are available before the controller acrually is generated. After a brief overview of existing estimation approaches in Section 1.1, and the target architecture in Section 2, we will present our approach, where we first show, why a purely analytical approach can not sufficiently solve the problem and therefore, why we have to use experimental examination (Section 3). Afterwards we describe which parameters are available, or can easily be estimated during high-level synthesis and which of them are appropriate candidates to estimate the area of a controller (Section 4). In Section 5 we examine the dependencies between these parameters and the area and how to estimate the area of the controller. The results, which we have obtained are presented in Section 6. We will finish this paper with a summary and an outlook in Section 7.
Related Work
Several estimation approaches have been proposed in the past. Some focus on ASICs, some on FPGAs as target architechue, but only a few among all of them take the influence of the controller on the overall area of a design into account. Nedjah and de Macedo Mourelle propose in [4] an approach to predict the number of CLBs required to implement the data-path and the controllcr, as well. Unfortunately their data-path is restricted to have exactly one general purpose functional unit. The number of CLBs required to implement the controller equals the number of flip-flops needed for the states, where they assume a maximum of 10 flip-flops will he sufficient and so they fix the size of the controller to be equivalent to a mean value of 6 flipXu and Kurdahi focus likewise on FPGAs as target architecture in [6] , which is based on the previous work of Ramachandran and Kurdahi [SI. Their approach requires an already scheduled behavioral specification. Aftcr they have finished the binding, a gate level netlist is constructed, which is used to estimate the controller. Quite impressive results on area estimation were presented by Mitra and Panda [3], but their work was restricted to standard cell target architectures. A PLA-based implementation of the controller is assumed by Katkoori and Vemuri in [I] . By our knowledge, their approach is the only one which does not require a description of the controller. Unfortunately the presented results are not convincing, because of large estimation errors. None of the published approaches has ever estimated the area of a controller, implemented on a FPGA target architecture, without having at least a behavioral description of the controller. flops.
TARGET ARCHITECTURE: FPGA
FPGAs (Field Programmable Gate Arrays) are digital circuits, which can be programmed by the user. They consist of a regular, symmetrical structure of single interconnected functional blocks.
We used in our work FPGAs of the XC4000-series, manufactured by Xilinx Inc., but we are convinced of the usability of our estimation for other types of F'PGAs as well (please see Section 7, where some examinations with the Virtex-series are described). The functional blocks in the XC4000 series are named CLBs (Configurable Logic Block). In Figure 2 the general structure of one CLB is depicted.. inputs. Each k-input-LUT is able to realize every arbitrary hoolean function with k variables. Additionally, each CLB has two flip-flops, which enables the implementation of buffered outputs. Although each CLB has nine inputs, it is not possible to realize every arbitrary boolean function with nine variables, but only a few of them. However, all boolean functions with a maximum of five inputs can be realized.
ANALYTICAL vs. EXPERIMENTAL APPROACH
A controller can be fully described by the following 6-tuple:
, where I is a finite set of inputs, 0 a finite set of outputs and S a finite set of states. 6 describes the state transition function, while h describes the output function. The dedicated initial state is defined by So. Inputs, outputs and states, as well as the initial state are known or can easily be estimated, even during the allocation phase of high-level synthesis, whereas 6 and h are first available after the binding has heen finished. However, we know that N f = rld(lSI)l+ 1 0 1 functions have to be implemented overall, which are depending on a maximum of rld(lSI)1+1Il variables. The number of needed CLBs to implement all functions can now be calculated by: NCLB = N f x K , where the cost factor R describes the cost for one funcuon and depends on the number of dependent variables and the type of available LUTs. If the number of dependent variables exceeds the number of inputs of a LUT, K can not he determined, because not all boolean functions with more than five variables can be realized. To calculate K for those cases, it would be necessary to have exact knowledge about the boolean function to be realized, to determine whether one CLB is sufficient or not. Leg1 et al. proposed in [2] to solve this problem by using the following formula:
where n is the number of depending variables, k the number of inputs of one LUT and NLUT the resulting number of needed LUTs. We know the maximum number of dependent variables, but not the exact number, required for each function. The LUTs used in XC4000 series have different number of inputs, therefore this approach does not solve our problem. Another problem is the fact, that with the increasing complexity of a controller, the complexity of the interconnect is growing as well. Therefore some CLBs can not be used optimally. In some cases additional CLBs are needed, only to feed signals. In summary one can say: An analytical approach is not possible if we assume that no explicit description of the controller is available. So we decided to solve the problem by using experimental examinations.
1
, n 5 k number of states of the controller. We are only interested on the number of additional transitions, so the parameter c is defined as:
N, is calculated in advance hy our data-path estimator.
Number of Unused States(d)
With parameter 
PARAMETERS
in each output can be calculated by determining the probability, As stated before, we assume, that no explicit controller description is available, as it applies to all phases in high-level synthesis (except the controller generation phase). The first question to be answered is: Which parameters supply sufficient information to get accurate estimations on the area of a controller? We have examined several different parameters and evaluated their applicability. In the following sections we will describe which on the one hand are easy to determine or to calculate and on the other hand lead to accurate estimates. Our goal is to determine a single formula, which can be calculated very fast and provide the desired accurate results. Dependent on the synthesis steps, which have already been executed, our data-path estimator will be used to build a model of the RT-structure. i.e. the missine results will be estimated. how often a component will be selected. The signal on the output in these states is inverted in all other states, where the component is not selected. We will explain this by an example. We assume an estimated scheduling and one arithmetic component which is able to execute additions and subtractions. We are interested in the '1' distribution probability of the ,select" input of this component. On the left side of Figure 3 , a part of the dataflow graph is depicted. The marked area includes those operations which are estimated to be executed by the arithmetic component. In the middle of Figure 3 , a part of the state-diagram is shown, which is the result of the estimated scheduling. It should be noted, that the number of states is hown, but not the concrete mapping of operations to these states. Actually, this is not a problem, because the ratio between 'Os and 'l's is independent from I this fact. According to the here used model, the number of executed operations on this component is invariant.
Number of Flip-Flops(a)
The number of flip-flops plays a dominating role for the area of a 
Number of Outputs(b)
After the allocation of functional units, the number of controlinputs and outputs is hown for those units. Additional outputs are needed for the required multiplexors, either induced by the control structure (case, loop, etc.) or by resource sharing, and for registers. The set of outputs is defined as 0 = (01, 0, . . . 
Number of Transitions(c)
The minimum number of transitions ( N T ) of a controller is IS1 -1 . If the schedule requires a repetitive execution, than the minimum number of transitions is I S1 . As soon as the controller has at least one input, the number of transitions will exceed the For the parameter e, we have to distinguish between Mealy-and Moore type controllers. This is necessary because the maximum number of bits in an output signal depends on the type of fhe controller. For a Mealy type controller, we define for each output Oi the parameter ei,mealy as follows: 
AREA ESTIMATION
To extract significant results out of the experimental examinations, it is necessary to have as much data available as possible. Therefore, we have developed a controller generator, which is able to build arbitrary behavioral descriptions of controllers, based on the parameters a, b, c, d and e. For each given set of these parameters, a huge number of possible controllers exists. We examined the common features among those controller, which are generated out of the same parameters. Additionally, we concentrate on the dependencies between each parameter and the number of required CLBs. This is done by setting all parameters, except the one under consideration, to a fixed value, whereas the value of the interesting one changes.
In the following we will show some of our examination results, regarding these dependencies. Because of marginal differences and for the sake of clarity, each curve in the following figures stays as a representative for a multitude of generated controllers with the same parameter set. The dependency between the 'O'/'l'-ratio (parameter e) and the resulting area is shown in Figure 8 . In this case an approximation can he done by a parabola. Because of the different shapes of the curves, describing the dependencies for each parameter alone, it is not possible to derive directly a formula like: The kj are the results from the linear regression and are not depicted here, due to space limitations.
In the following step the influence of the parameter d and e have to be integrated. We dp this, by applying two correction functions Kd(d) and K,(ej to NCLB, which are defined as:
K,(e) = 3,9426e3 -6,3106e2 + 3,8367e+0,1720 The number of CLBs, required to implement a controller with the given parameter a to e can now be calculated as:
RESULTS
We tested the quality of our controller-estimation on seven algorithmic descriptions of different designs, where dzffeeq is a differential equation solver, maha, ar, end ellip are different kinds of filters. In addition to the last mentioned typical benchmark designs, we usedlpeg, which is thc compression algorithm of the jpeg-coding, as well as subband and hybrid, which are both processes of the MP3-coding algorithm. To compare our results we synthesized the designs with two different HLS-systems. RT-synthesis was done by a widely used commercial synthesis tool, while the mapping to the FPGA was done by xacf from Xilinx Inc. 
SUMMARY AND OUTLOOK
We presented a new approach on estimating the area of a controller during high-level synthesis, when FF'GAs are used as targetarchitecture. This estimation can be done even during the allocation phase, because it depends on easy accessible data. It can be calculated very fast and is independent from the complexity of the controller. We have made additional efforts to estimate the delay of a controller, based on the same parameters as used for area estimation. Till now it is possible to estimate the number of CLBs on the different paths through the controller. These estimates are quite accurate, but to know the number of CLBs on a path is not only sufficient to infer the delay from it.. To get more accurate predictions, additional information about the number of switch-matrices, the type of wire (single., double-or longline) and the distance between the CLBs have to he taken into account. One obvious issue, that has to be worked on in the future is the examination of different types of FPGAs. Achlally we start such examinations for the Virtex series from Xilinx Inc. The results obtained so far, show, that all dependencies between the parameter a to e, are the same in terms of the trend of the c w e s . Only the constants have to he adjusted, i.e. recalculating the kij and the correction functions Kd(d) and KJe) .
Another issue to be examined in the future will be, to remove the restriction to binary-coded states, Especially the one-hot coding will be of interest, because their application reduces the complexity of the next-state logic, drastically. 
