I. INTRODUCTION
The International Technology Roadmap for Semiconductors for 2001 (http: //www. itrs . net/) has emphasized that early performance estimation is an essential step in any System-on-Chip (SoQC) design methodology. Especially performance parameters like time, size, and power consumption can be as important as meeting the functional requirements [1] .
Within the refinement process of an algorithm, starting from a high level description down to an implementation, different kinds of software properties, usually called metrics, can be identified (Figure 1 ). The accuracy of those metrics increases, the more hw details are known. Early design decisions have a drastic impact on the final system performance [2] . About 90% of the overall costs are determined in the first stages of a design, thus early design decisions have a much higher resulting cost span than design decisions taken at the end of the development time. the implementation is already known. With the introduction of high level synthesis tools [3] and thus the exploitation of potential parallelism, re-timing, and other techniques, prediction of the execution time has become more challenging.
Especially scheduling relies on estimations of timing in the area of real time systems in order to fulfill the required response time. Here also possible interference by other programs has to be taken into account, which results in a worst case response time (WCRT). For the task of hw/sw partitioning [4] execution time contributes to the cost function, which should be minimized. For this task, where heuristics could be used, already correct relations between two estimated values are sufficient in order to optimize a system.
Usually the timing analysis focuses on the prediction of the worst case execution time (WCET). However, the so called longest executable path [51 is not necessarily equal to the longest structural path. In this paper also the analysis of the best case execution time (BCET) and all other feasible paths for a function is presented in order to identify the run time interval of a process.
The rest of the paper is organized as follows. Section II gives an overview of related work in the field of execution time estimation. The design flow for developing complex SoC with the utilization of hardware accelerators (HA) is presented in Section III. This is followed by Section IV, where the static analysis ofthe timing estimation is described. In Section V the execution time estimation is performed on several examples and compared to the synthesis results of a high level synthesis tool. Finally, in the last section some conclusions are drawn.
II. RELATED WORK Execution time estimation can be achieved by simulation or static analysis. Simulation based approaches usually try to enrich the code with logging statements in order to obtain simulation traces. Hence, this procedure is appropriate for investigating the standard working conditions. Estimating the run time of a hardware implementation on an FPGA is achieved in [6] by simulating the algorithm and using a performance model of the FPGA. Also performance estimations for FPGAs are shown in [7] . In [8] reliable estimation of the execution time of an algorithm implemented in software running on a processor is presented. Analysis of the execution time achieved by synthesis of behavioral descriptions is given in [9] .
0-7803-9294-9/05/$20.00 ©2005 IEEE Static analysis allows for identifying the boundaries and usually gives a pessimistic estimate. Nevertheless, in the case of real time systems this is essential. Static analysis is usually applied in the software area. High level estimation without compiling for a target architecture requires the development of a virtual processor as shown in [10] . Usually such a processor model is not available and rather time consuming to develop. A commercial tool for WCET estimation is Abslnt (http: //www. abs int . com/) for software running on an ARM processor.
III. HARDWARE ACCELERATOR DESIGN FLOW
The algorithm, which is described in a high level language (Matlab, Simulink, CoCentrics Sytem Studio), has to be mapped onto a hardware structure by performing hardware software partitioning. A general architecture of a SoC consists of one or more CPUs (DSPs and/or ,uPs) and HAs connected by a bus system. Those HAs are used to execute time critical tasks of the algorithm. They are already available as previously designed IP or it is necessary to develop them from scratch. Out of an algorithmic description usually a group of functions is selected to be implemented together inside one HA. For example, in Figure 2 the functions A, B, and C from the algorithmic description are grouped together to form the hardware accelerator HAI of the SoC. A bus interface has to be added as well to allow communication with the other units on the bus. The function E will be implemented in software running on the DSP. Either a manual or a tool supported partitioning process needs estimates of the final execution time properties of the selected functions inside the algorithmic description. Further refinement of the HA can be obtained by high level synthesis (e.g. Synopsys behavioral compiler, SPARK [11] ). Algorithm consortium (http://www.spiritconsortiumr . com/). Nevertheless, the full potential of capturing design parameters like timing can be only exploited, if those design properties are available to the designer and to all other tools in the design flow. For this reason it is necessary to store those properties persistently and to define interfaces, which allow for seamless access to the computed parameters. An integration of these design properties into a design database and its usage in a design flow is shown in the Open Tool Integration Environment (OTIE) [12] , [13] .
IV. EXECUTION TIME ESTIMATION
Static analysis of a function is performed on its graph representation. A function can be decomposed into its control flow graph (CFG) built up with interconnected basic blocks (BB). Each BB contains a sequence of data operations ended by a control flow statement as last instruction (if, for, while). Those data operations can be represented as expression trees. A full timing characterization of the execution time of a hardware accelerator includes not only worst case estimation, but also best case estimation, as well as estimation of all other execution paths ( Figure 3 ). This is accomplished by extracting all the possible paths from the CFG, starting at the root end ending at the exit node of the CFG. The computation of all possible paths seems to be feasible for functions restricted to a certain complexity, which is the case in a CFG derived from algorithms in industrial context. A process run time interval Tint can be identified by Especially in the case of integrating foreign IP into a design, timing characterization is an essential step in order to identify the usability in a project. An exchange method of those designs is for example described by the SPIRIT Tint = WCET -BCET (1) Not all possible paths from the CFG contribute to the number of paths, but only those paths which are feasible. A path is feasible if the boolean product of its conditions is not false.
In Figure 4 , a CFG is depicted with four paths: that are unchanged by the iteration of one or more surrounding loops out of the loops, thus eliminating redundant computation. Also elimination ofcommon sub expression and constant propagation is deployed. In Table I (2) In regard of applying the cost estimation heuristic within transformational design space exploration, the ability to quantify relative dependencies of design characteristics is much more important than the ability to capture absolute values. For comparison of values it is only needed to achieve a high fidelity [16] value as proposed by Gajski. In Figure 6 the execution time profile of the function predcasel is depicted. In Table II Early and reliable estimation of performance parameters is a necessary step for reducing time to market in an SoC design methodology. Especially design decisions for the design of a complex SoC like hw/sw partitioning need reliable information of the timing of HAs, as well as the scheduling of real time systems. The presented static analysis approach has been applied to several functions out of the embedded systems area. It has been shown that in cases of relative values, a high fidelity measure can be achieved. Also removing of the non feasible paths has demonstrated, that the estimates especially for WCET can be tightened, which allows for better utilization of the system resources.
