Abstract-Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2ns when comparing estimated area and delay with logic-synthesized area and delay.
I. INTRODUCTION
A digital signal processor core is generally composed of a micro processor core and several hardware units for digital signal processing such as multiple memory buses, addressing units, and hardware loop units [7] , [8] . However, if a particular application program runs on a general digital signal processor, some hardware units can be often used and other hardware units can never be used. We consider that an appropriate configuration for digital signal processor cores is required depending on the requirements for a given application program as well as hardware cost for a processor core. Hardware/software codesign can be one of the powerful design methodologies in order to obtain an appropriate configuration for processor cores. Several hardware/software codesign systems for processor core design have been reported such as in [1] - [3] , [6] .
We have been developing a hardware/software cosynthesis system for digital signal processor cores [9] , [10] . Given an application program written in C and a set of application data, the system synthesizes a hardware description of a processor core. It also generates an object code and a software environment (a compiler and a simulator) for the processor core. In the system, one of the most important processes is * Presently, with Matsushita Communication Industrial Co., Ltd. hardware/software partitioning. In hardware/software partitioning, the system determines which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. Execution time of an application program can be derived from critical path delay of a processor core and the number of clock cycles to run an application program. Thus we require area and delay estimation of a processor core in hardware/software partitioning. Area estimation has been discussed in [4] , [6] . They first logic-synthesize each of hardware units and then they obtain estimated area by adding area for hardware units used in a processor core. However, this approach does not consider controlling area for hardware units in a processor core. We consider that we must take into account controlling area for hardware units in our system since it has many types of hardware units. Delay estimation has not been discussed so far and then we must establish a delay estimation technique for our system. Based on the above discussion, we propose in this paper area and delay estimation equations for digital signal processor cores, which are incorporated into our hardware/software cosynthesis system. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. This paper is organized as follows: Section II defines our processor model and its hardware parameters; Sections III and IV proposes area and delay estimation equations for digital signal processor core; Section V shows experimental results and evaluates the proposed area and delay estimation equations; Section VI gives concluding remarks;
II. PROCESSOR MODEL AND AREA/DELAY ESTIMATION APPROACH
In this section, we first define an architecture model of our processor core with its hardware parameters. The processor architecture model in this section is used in our hard- Fig. 1 . Processor core configuration. Our processor core is composed of one of the two processor kernels, register files, and hardware units.
ware/software cosynthesis system [9] , [10] . Then we discuss area and delay estimation approach. Fig. 1 shows an architecture model of our processor core. The architecture model in Fig. 1 is based on the digital signal processor in [7] , [8] . A processor core is composed of one of the two processor kernels, one or two register files, and several hardware units. In the following, processor kernels, register files, and hardware units are defined.
A. Processor Kernel
We have two types of processor kernels; (i) a RISC-type kernel and (ii) a DSP-type kernel. One of them is selected depending on a given application program by our hardware/software cosynthesis system. A RISC-type kernel has the five pipeline stages (IF, ID, EXE, MEM, and WB) as in the micro processor of [5] . A DSP-type kernel has the three pipeline stages (IF, ID, and EXE) as in the DSP processor of [7] , [8] . The number of pipeline stages and processes in each pipeline stage are fixed and cannot be changed. A processor core will be a general-purpose RISC core if a RISC-type kernel is selected. It will be a DSP core if a DSP-type kernel is selected.
Each processor kernel has a Harvard architecture and consists of (c-i) a bus for an instruction memory, (c-ii) a bus for an X data memory (X-bus), and (c-iii) an ALU (Arithmetic Logic Unit) and a barrel shifter. In addition to these (c-i)-(c-iii), (c-iv) a bus for an Y data memory (Y-bus) is optionally added to a kernel. Data bus width of the instruction memory, the X data memory, and the Y data memory can be changed but their address bus width is fixed to 16 bits. The data bus width of the X data memory must be the same as the bit width of the Y data memory. Data bus width of the instruction memory can be determined based on a set of instructions included in a processor kernel.
Instructions in our processor kernel are grouped into basic instructions such as ADD and MUL and parallel instructions such as (ADD || ADD) and (ADD || MUL). The basic instructions correspond to the functions of our processor kernels and hardware units. A parallel instruction executes more than one basic instructions. 1 The hardware parameters for a processor kernel are a kernel type t knl (t knl will be RISC or DSP), basic bit width b knl for a kernel, basic bit width b knl, f u for the ALU and shifter, the number n bank of data memory banks (n bank = 1 or n bank = 2), the number n p of instructions executed concurrently, data bus width b data and b inst of the data memory and the instruction memory, and a set I of instructions in a processor kernel. t knl determines a processor kernel type. b knl determines bit width of pipeline registers. If n bank = 1, the X data memory is used only. If n bank = 2, both the X data memory and the Y data memory are used. n p determines the maximum number of instructions executed concurrently. n p is called a parallel factor.
B. Register Files
We have two types of register files; (i) a register file RF 1 and (ii) a register file RF 2 . The register file RF 1 in Fig. 1 is used for all the instructions including arithmetic operations, logical operations, and address operations. The register file RF 2 is optionally added depending on a given application program by our hardware/software cosynthesis system and its bit width is longer than that of RF 1 . RF 2 is used to store intermediate results for multiplication.
The hardware parameters for register files are the number n r1 and bit width b r1 of registers in the register file RF 1 and the number n r2 and bit width b r2 of registers in the register file RF 2 . If n r2 = 0, the register file RF 2 is not added to a processor kernel and then the processor kernel has a single register file RF 1 .
C. Hardware Units
Our processor core can have hardware units of (1) functional units (shifters, ALUs, multipliers, and MAC (multiply and add) units), (2) addressing units, and (3) hardware loop units. 2 All these hardware units (1)-(3) can be added to the DSP kernel. Only the hardware unit (1), i.e., functional units can be added to the RISC kernel.
The hardware parameter for hardware units is a set HU of hardware units which are added to a processor kernel. If hu ∈ HU is a functional unit, it has parameters of basic bit width b f u for operations. For example, a 16-bit ALU has basic bit width of 16 bits (b f u = 16 for the 16-bit ALU). If hu ∈ HU is an addressing unit, it has parameters of an addressing unit type t addr (t addr = 2 or t addr = 3), 3 the number n d p of address registers, and the number n dn of index registers. If hu ∈ HU is a hardware loop unit, it has a parameter of the number n loop of loop registers.
bination of basic instructions should be a parallel instruction.
D. Area and Delay Estimation Approach
Given an application program and a set of application data, our hardware/software cosynthesis system for digital signal processor cores [9] , [10] tries various sets of the hardware parameters described in Section II and determines each of the hardware parameters for a processor core which optimizes processor core area as well as execution time of the application program. In this optimization, the system requires area and delay estimation for a given set of hardware parameters. Thus we establish area and delay estimation equations of a processor core for a set of hardware parameters. Here we estimate logic-synthesized area and delay without really logicsynthesizing a processor core.
In order to establish area and delay estimation equations of a processor core, we first generate a variety of processor cores using our hardware/software cosynthesis system by varying a set of the hardware parameters. Processor cores are written in VHDL. Then we logic-synthesize these processor cores. We use Synopsys Design Compiler 4 as a logic synthesizer with the VDEC cell libraries (CMOS and 0.35µm technology). 5 We finally analyze the logic-synthesized area and delay of processor cores and establish their estimation equations based on their hardware parameters.
Based on this approach, we propose area and delay estimation equations of a processor core in the subsequent sections.
III. PROCESSOR CORE AREA ESTIMATION
Our processor architecture model indicates that area for a processor core can be computed by adding up area for a processor kernel, area for register files, and area for hardware units. Thus Sections A-C discuss area for a processor kernel, area for register files, and area for hardware units, respectively. Then Section D summarizes area estimation equations.
A. Area Estimation for a Processor Kernel
A processor kernel is determined by a kernel type t knl , basic bit width b knl for a kernel, basic bit width b knl, f u for the ALU and shifter, the number n bank of data memories, a parallel factor n p , data bus width b inst and b data of the instruction memory and the data memory, and a set I of instructions. A kernel area is dependent on all of them. It is also dependent on register files and a set HU of hardware units. For simplicity, we assume that b knl = 32, b knl, f u = 16, and b data = 16. We also assume that bit width b r1 and b r2 of register files RF 1 and RF 2 are 16 bits and 32 bits, respectively. These assumptions can be applied to typical digital signal applications for our hardware/software cosynthesis system. Thus for given t knl and n p , we first analyze the relation between kernel area and the rest of hardware parameters including register file and hardware unit configuration. Then we establish area estimation equations for a processor kernel. Since the DSP-type kernel and RISC-type kernel have the same hardware for the IF stage and ID stage, the above observations are independent of a kernel type. They are also independent of a parallel factor n p . Thus we have the following observation:
Observation 1 Area for the IF stage in a processor kernel is linearly increased as b inst is increased. Area for the ID stage in a processor kernel is linearly increased as |I| is increased.
Kernel area and data memory banks: If the Y-bus is added to a DSP-type kernel, the hardware for the ID stage and EXE stage is changed. If it is added to a RISC-type kernel, the hardware for the ID stage and MEM stage is changed. Kernel area is increased by the hardware for the Y-bus. From the experiments, we observed that the amount of the increased kernel area is independent of n p . The hardware for the Y-bus is independent of other parameterized hardwares. Thus we have the following observation:
Observation 2 Kernel area is increased if Y-bus is added depending on a kernel type.
Kernel area and register files: Let b opr be bit width of an operand field of an instruction. 6 b opr must be greater or equal to lg(n r1 + n r2 ), where n r1 and n r2 are the numbers of registers in the register files RF 1 and RF 2 , respectively. If the number of registers in each register file is increased, b opr must be increased and then it must cause area increase in a processor kernel.
Let us assume a processor kernel where t knl = DSP, n p = 1, n bank = 1, b opr = 2. In this case, if b opr is increased, we have the amount of increased kernel area compared with the assumed processor core shown in Table II . The amount of increased kernel area is linearly increased as b opr is increased. From further experiments, we observed that it is independent 
A.2. Area Estimation Equations for a Processor Kernel
First we define a minimum processor kernel MP (see Fig.  2 ). For MP, b knl = 32 for the kernel basic bit width and b knl, f u = 16 for the ALU and shifter in a kernel. t knl can be DSP or RISC and n p can be 1, 2, or 4. b opr and b inst are defined as 2 and 0, respectively. We assume that MP does not include any instructions. By logic-synthesizing a minimum processor kernel for various t knl and n p , we have area c 0 k (t knl , n p ) for minimum processor kernel as follows:
Based on Observation 1, we have the amount of increased kernel area, c 1 k (b inst , I), for a set of instructions as follows:
2) The first term shows the IF stage area and the second term shows area increase for the number of instructions. Table IV shows our hardware unit library. Then hardware unit area c hu (hu) for each hardware unit hu ∈ HU is estimated as following.
For area estimation of a functional unit, we directly use area in Table IV .
An addressing unit has the parameters of type t addr , the number n d p of address registers, and the number n dn of index registers. If n d p = n dn = 0, Table IV gives an addressing unit area. Area c r,addr for address registers and area c r,idx for index registers can be estimated as:
Addressing unit area is estimated by adding up area for an addressing unit in Table IV and area for address registers and index registers.
A hardware loop unit has the parameter of the number n loop of loop registers. If n loop = 0, Table IV gives a hardware loop unit area. Area c r,loop for loop registers can be estimated as: c r,loop (n loop ) = 32004 × n loop + 12801 (9) Hardware loop unit area is estimated by adding up area for a hardware loop unit in Table IV and area for loop registers.
D. Area Estimation Equation for a Processor Core
Based on the above discussions, processor core area c [µm 2 ] is estimated as:
IV. PROCESSOR CORE DELAY ESTIMATION
If a processor core has a RISC-type kernel, critical path delay d can be estimated by Register write delay: In typical digital signal processing applications, we can assume that n r1 > n r2 , i.e., the register file RF 1 has more registers than the register file RF 2 . Then we consider writing delay for the register file RF 1 as the register writing delay. Table V shows the writing delay for RF 1 for each parallel factor n p . For each n p , delay for RF 1 is increased as the number of registers is increased. Assume that our processor core has at most 32 registers in RF 1 
V. EXPERIMENTAL RESULTS
In order to verify the established area and delay estimation equations for a processor core, we generated several processor cores using our hardware/software cosynthesis system and logic-synthesized them using Synopsys Design Compiler. and errors of delay estimation are less than 2ns when comparing estimated area and delay with logic-synthesized processor core area and delay.
VI. CONCLUSIONS
In this paper, we proposed area and delay estimation equations for digital signal processor cores. In the future, we will establish power estimation equations for digital signal processor cores.
