ABSTRACT: Performance modeling is wideij used to eficiently and rapidly assess the abiliity of multiprocessor architectures to effectively execute a given algorithm. In a itypicul design environment, VHDL perjormance models of hardware components are interconnected to form structural models of the target multiprocessor architectures. Algorithm features are described in application spec$c tools. Other automated tools partition the sojiiare among the various processors. Performance models evaluate the system performance. Several iterations may be needed before a suitable configuration is obtained. This paper describes a set of tools that directly interface the VHDL perfonname models to the algorithm partitioning tools, which will significantly reduce the time and effort needed to prepare the various models. A methodology that integrates several commercial tools is provided.
Introduction
Performance modeling is often used to characterize the system level of abstraction in a digital system. A performance model is applied in the early stages of system development, where it supports early trade-off studies when the actual detailed design is not yet available. "Performance modeling can aid evaluation of design alternatives, capture design decisions and assumptions, examine system behavior at boundary conditions and help determine bottlenecks and overdesign." [I] . A performance model expressed in VHDL serves as a simulatable specification and supports performance validation. It can be used for capturing and documenting architectural level designs, and as a testbed for architectural performance analysis studies [I] .
Performance modeling simulates system performance without the unnecessary details. It is a means of speeding up simulations by ignoring the lower level details unnecessary to the system performance. There are three basic statistics collected as part of the system performance evaluation: latency, utilization and throughput. These are called performance metrics.
Latency -The time it takes for an uninterpreted token to travel from the source to the destination. This includes the time delays encountered by the token due to traffic, bus contentions etc.
Utilization -Time busy I total simulation time Throughput -Data processed /time period * Performance and trade studies will be interpreted with respect to these metrics.
The VHDL component library used in this work is the Performance Model Library (PML), a library of hardware components developed by Honeywell, Inc. and marketed by Omniview, Inc [2] .
"The Algorithm Partitioning Tool (APT)
developed by Research Triangle Institute is designed to faciIitate the transition from functional algorithm description and HDL or schematic hardware architecture and component description to a dynamic performance model with the algorithm mapped onto the architecture." [3] In the prototype used in the current application, the algorithm characteristics are captured by the Signal Processing Worksystem (SPW) of the Cadence &TA Group; component characteristics are captured from VHDL characteristic files and Architecture connectivity is captured either f b m a VHDL structural model or a network 11.5 (CACI Products Company) generated schematic. This prototype employs Access Technology's 2020 spreadsheet.
The spreadsheet is employed in two modes which are referred to as "sizing" and "mapping". Sizing mode is employed in early trade studies when architectures have not yet been defined. It is a static analysis, and it addresses issues such as how many processors of a given type and how much memory are required to execute a primitive, a section of the algorithm, or the entire algorithm within the allotted time [3] . Mapping mode is employed for static analysis of architectures already defined and to analyze alternate mappings of an algorithm onto the architecture [3] . The spreadsheet output that is produced in this mode is used to construct a dynamic simulation model that can be used to assess additional issues such as latency and resource contention.
One of the tools used in the current application is VTIP, a VHDL parser that is used to identify the various VHDL components and classify them in order to facilitate easy extraction of information from VHDL code [4] .
The present work is a part of the Rapid prototyping of Application Specific Signal Processing (RASSP) research project at Virginia Tech. Extract the processor characterization data from the VHDL Processor Characterizations Package using the Processor Characterization Extraction Tool (PCET). Determine the required number of processors to run a given application from a Spreadsheet Analysis. Build a VHDL structural model of the architecture.
Automatically create the Analysis Spreadsheet using the Architecture Characteristic Extraction Tool (ACET). Use the Spreadsheet to obtain the partitioning of the software application algorithm. Given the partitioned software application specifications, convert the software to PML code using the data output by the Connectivity Extraction Tool (CONET), and map it onto the specific processors in the hardware architecture model. Evaluate the performance of the model. If the results are not satisfactory, change the partitioning of the software or the hardware model and repeat the steps. This paper concentrates on interfacing the VHDL structural model and the processor characterization packages to the algorithm partitioning tool using the PCET, ACET and CONET tools.
Methodology for Developing Interfaces between
VHDL code and Partitioning Tools VHDL code and partitioning tools follows.
An approach for developing interfaces between
Perform a thorough study of the various components available in the libraries. The purpose of this step is to become familiar with the available components and resources and to recognize and understand the h c t i o n of each parameter in each component. Perform a thorough study of the target system in order to gain an understanding of the interface required to be developed. Determine the restrictions that need to be applied on the source code in order to facilitate the development of the interface tools. These restrictions may need to be defined in advance to make the interface possible, or they may be arrived at after the development as a result of the inherent properties of the tools being developed. Determine the parametric information required to be extracted by the interface tools. Depending on what the final format of the tool output needs to be, some component characteristics are obtained directly from the DLS Library after analysis by the VHDL VTIP analyzer. But some characteristics are obtained only after some post processing is done on one or more parameters extracted from the library. These parameters need to be identified, and the post processing specified before the tools required to extract them can be designed.
Modify the VHDL components as necessary. The VHDL Tool Integration Platform's (VTIP) VHDL analyzer does not accept all of the VHDL syntm's. Some changes may need to be made to the library components to make them analyzable by VTIP. Develop the extraction tools to meet design and format specifications. This is done using the DLS @esign Library System) Browser and the SPI (Software Procedural Interface) function calls. Test the output of the tools for accuracy. If the results are unsatisfactory due to inappropriate parameters extracted or formatting errors, modify the extraction tool as required. Several iterations may need to be performed before satisfactory results are achieved.
The Seventeen Processor Raceway Architecture
The RACETM architecture was introduced in May 1993 by Mercury Computer Systems [6] . It has been deployed in systems scaling from four to more than 700 processors. As technology advances (for example in processor design, chip packaging and software tools) it becomes evident that an architecture must be able to rapidly evolve to maintain parity with the available technology. The RACE architecture for multiprocessors provides a highbandwidth, low latency system for solving real-time applications [6] .
In the present study, a seventeen processor mercury raceway architecture has been developed to illustrate and test the tools. The schematic of the architecture model is shown in Figure 2 .
As can be seen from Figure 2 , the model contains an input device which models the radar. This is the stimulus to the system under study. This architecture has a provision for a DMA(direct memory access) where the radar writes directly to the memory without the intervention of the CPU. (with "3" written inside the block).
There are two biu-four components in the seventeen processor architecture. In Figure 2 , these components are represented as diamond blocks(with "4" written inside the block). The first biu-four is employed to connect the radar, B16-cpu16, B16-mem and crossbar5 to a common bus. The second biu-four is used at the output to connect the output device, B17-cpu17, B17-mem and crossbar6 to a common bus.
Interface Tools igure 2. Schematic Of The Seventeen Processor Raceway Architecture
A set of extraction tools was developed to interface the VHDL processor models from the PML library, and the VHDL structural model developed by the user to the APT. These tools will be used to extract important characteristic information from the VHDL models which is required by the APT in order to determine candidate partitions of the algorithm for various multiprocessor architectures. Once an algorithm partition is developed by APT, the partitioning information is used to create a performance model. This model is executed to evaluate the effectiveness of the partition on that architecture.
Three interface tools were developed. The tools are as follows.
Processor Characteristic Extraction Tool (PCET)
This tool is used to extract processor characterization data from the HoneyweWOmniview Performance Model Library (PhfL)(see Chapter 2). The output is in the form of a processor characterization file suitable for use by the APT to determine system parameters such as number of processors needed and memory size requirements. This file is required by the APT to conduct the sizing analysis.
Architecture Characteristic Extraction Tool (ACET)
This tool is used to extract architecture characterization data from a VHDL structural model with PML components. The output is in the form of a script file suitable for use by APT for determining candidate software partitions using the mapping analysis.
This tool is used to extract connectivity information from a VHDL structural model built fiom PML components. This information is needed to construct the final performance model.
Connectivity Extraction Tool (CONET)

Processor Characteristic Extraction Tool (PCET)
In the Ph4L library, each processor is characterized by its own instruction set and clockspeed. Each instruction in the instruction set has an instruction count associated with it, which refers to the number of clock cycles needed to perform one such instruction. These processors are characterized in the file proccore-c.vhd1, which is the configuration file for the different processor cores.
The source of the PCET is the VHDL processor characterization packages in the PI' vfL library. These packages are carehlly studied to gain a complete understanding of the various parameters of the components. The PCET is used to generate a processor characteristic file for each processor of interest. This target processor characteristic file has a filename equal to the processor entity name, with a c20 extension (e.g. pentium.c20). This file contains all of the parameters needed by the sizing spreadsheet. The output format for a PCET file follows.
! processor~name.c20
>[*,11#/ 'processor-mme#l where the words in italics represent the parameters extracted by the PCET. Each line in the above file is a spreadsheet macro, and hence needs to have a special format.
Since the source code in this case is the PML libmy, there are no restrictions to be applied to it. The parameters that are required to be extracted by the PCET follow. 1. Name of the processor entity (processor-name).
The Clock Rate in MHz.
Each Instruction (intsmction-namei) along with its cycle
Some of these parameters are not extracted directly. Some other parameters are extracted which are then postprocessed to yield the required parameters. The parameter processor-name is not extracted directly. A parameter processor entity-name is extracted, for example proccore-96Omx. The first 9 characters of this parameter need to be stripped off, and a "c20" appended to the result thus giving the processor-name 960mx.c20, which is also the name of the characteristic file for the processor entity proccore-96Omx. The parameter Clock Rate is derived fiom another parameter extracted called Cycle-cost. This parameter gives the clock period in Ns. This needs to be converted to the clock frequency in Mhz, to give the parameter Clock Rate. The names of the instructions such as FLADD, FLDIV etc. along with their instruction costs given by the parameter instr-cost can be extracted. No postprocessing is required.
An example of a processor characteristic file generated by the PCET for the i96Omx processor follows. In the above characteristic file, clock represents the instruction cycle in MHZ. Each of the instruction numbers is the number of cycles necessary to execute one of the instructions. If the cycle time is not explicitly specified, the time for multadd is generated as equal to flmlt; the time for cmult equal to 4 * flmlt; and the time for btfy equal to 6 * flmlt. Only basic instructions needed in DSP applications are included.
Architecture Characteristic Extraction Tool (ACET)
The where the words in italics represent the parameters extracted by the ACET. The output of the ACET is a shell script that is used to call another shell script, hw-build. Thus, the first line is
This is followed by a single (no carriage returns) line consisting of the call hw-build, followed by the parameters for the processors and memories comprising the architecture.
The extraction tool needs to extract the instantiation names of the components in the structural model developed by the user. These instantiation names need to follow some naming conventions in order for the extraction tools to work on them, and also to satisfy the spreadsheet requirements. The spreadsheet is designed so that columns and rows can be addressed by name (the character string in the first row of the column or the first column of the row, called a label). The component names are used for addressing in load calculations. Therefore all names need to be unique, need to start with an alpha character or an underscore (i.e. must be a string), and cannot contain a period or a blank (periods denote "ranges" in 2020 addresses). In addition names must not be able to be interpreted as cell addresses. For example, PE1 is not valid.
Hence, an underscore is appended as the first character in ali the instantiations by the ACET. In order to allow differentiation of fields in the output files, component names must be limited to a maximum of 10 characters including the underscore. If the names are longer than 10 characters, they get truncated by the ACET. Hence, the first 9 characters of a component name (excluding the underscore) have to be unique. The parameters that are required to be extracted by the ACET follow. pentium, i96Omx etc (Processor-Tjpe).
4.
For Device-Type memory, the time in ms required to read or write one wordunit of data (Read access timemrite access time).
S.
The wordsize or unit in bytes (Unit).
. The capacity of the memory in bytes (Capacify).
The extracted parameter Device-Name needs to be post-processed to meet output requirements. It has an underscore appended as the first character to it in every case, and the length of the name is limited to 10 characters, with the name getting truncated in case it is longer than 10 characters. If the first IO characters of each instantiation name is not unique, then it will result in erroneous results, as that name will occur twice in the ACET output. The DeviceType can be extracted directly from the structural model. No post processing is required. The parameter Processor-Tpe needs to be post-processed. If the type is "386" or "960mx", the name is changed to "i386" and "i960mx". If the type is "sharc" the name is changed to "21062". The parameter throughput-info is extracted and the Read access time and Write access time are derived from this parameter. The parameter throughput-info is a string variable, and it needs to be converted into a numeric. This is done by first stripping off the first 9 characters of 'constant 100000'. Then the remaining string variable is converted to a numeric. The throughput gives the number of words read or written per second. This is converted to yield the Read and Write access times in ms, which is the time required to read and write a word or unit. The parameter wordsize or Unit does not require any post processing. The parameter, Capacity may need to have its unit converted from Mbytes, Kbytes or Bit-size to byte. This is done by multiplying the capacity by a factor corresponding to its unit.
An example of a spreadsheet building script file generated by the ACET for the seventeen processor Raceway architecture follows. In the above file, there are seventeen processors of type "shad'. These are converted to "21062" in the output file. There are seventeen memories in all, with one memory attached to each processor. These memories have a Read and Write access time of 0.025 ms, with a Wordsize of 4 bytes and a Capacity of 46080 bytes. The indevice is a Radar, and an outdevice is also present in the architecture. The indevice and outdevice are formatted as type 'process' in the output file.
Connectivity Extraction Tool (CONET)
The CONET tool extracts connectiviQ information and parametric values for transfer devices. The name of each of the transfer devices is extracted, along with its parametric values and a list of devices that are connected to its ports. This information is required to complete the mapping of partitioned software onto the candidate architecture.
The VHDL source code, and target format of the CONET need to be studied in order to understand the working of the parameters, and to decide the parameters required to be extracted. The source code for CONET is the VHDL structural model, which in turn derives its primitives from the PML library.
shown below.
Transfer Device Name
Parameter1 Parameter2
The output format for CONET is
Value1 Value2
Signal name1 Connected component1 Signal name2 Connected component2
The restrictions or naming conventions that need to be applied to the source code and the parameters that need to be extracted by CONET are determined. Since the source code in this case is the VHDL structural model, the following restrictions need to be applied to it. The instantiation names of all the transfer devices, for which connectivity information is required, need to have the first three characters to be either "cba" or "biu".
The parameters that are required to be extracted by the CONET follow. 1. The name of the transfer device starting with "cba" or "biu" for which connectivity information is desired (Transfer Device Name). This information is obtained by tracing each signal mapped to the port of the transfer device, to the port of another device. This signal connects the two devices. No post processing is required for the parameters extracted by the CONET.
An example of a part of an output file generated by the CONET for the seventeen processor Raceway architecture follows. connects Bi-cpui, Bi-mem and a crossbar. Hence biu-star1 connects bl-cpul, bI-mem and crossbarl. This can be seen from Figure 2 . In the example shown above, the connectivity information provided for the component biu-starl is consistent with Figure 2 . Similarly, according to the connectivity information for the component cbarl extracted in the example shown above, crossbarl is connected to biu-starl, biu-star2, biu-star3, biu-star4, cbar5 and cbar6. This can also be verified from Figure 2 . It should be noted that the signals corresponding to the components cbarl and biu-star1 are both intportl, thereby implying that the two components are connected by the signal.
Validation
The tools developed using the methodology suggested, need to be tested and verified for accuracy. There are two kinds of tests involved.
TEST 1 : The parameters that are extracted from the model or the library are changed, and then the model is reanalyzed. The tool is run to produce the new output, and the output is checked to verify that the parameter is indeed changed in the output. If this is not the case, it might be due to two reasons.
The extraction of the parameter using the VTIP DLS browser is inaccurate. If so, the parameter needs to be re-extracted correctly. The wrong parameter is being extracted. In this case, the correct parameter to be extracted is decided upon, and the extraction completed.
TEST 2 : The output files from the extraction tools are used for spreadsheet generation and checked for satisfactory performance. Here too, failure could be due to two reasons.
Formatting errors. Since the output files are script files, every character needs to be formatted exactly right. Sometimes, the output file has unwanted characters like carriage retums, commas, special characters etc. These can cause a failure in the spreadsheet generation. If this is the case, the output needs to be reformatted to suit requirements. The wrong parameters are being extracted. In this case, the correct parameters need to be carefUlly decided upon, and then extracted again. In all of these cases, the iteration of testing and modifying the code to suit the specifications is done until satisfactory results are produced.
. Results
This paper has presented a methodology to interface VHDL performance models to algorithm partitioning tools which will help significantly reduce the time required to arrive at an optimal multiprocessor architectural configuration for a specific algorithm. Also, an efficient means of testing the interface tools has been suggested. This method helps to make validation faster and less difficult. Moreover, this paper has resulted in the development of a high level library of hardware models that have already been analyzed by VTIP. These models can be reused as primitives for the development of other new models with little or no modification.
