Abstruct-Realistic traffic patterns for a multi-processor M P E G 4 architecture are used to evaluate the performance of network-on-chip (NoC) implementations. In particdar, we study the characteristics for a design that is based on CDhlA switching techniques and a star-network topology. The rcsults are compared to those for a morc conventional mesh-topology NoC. We In this paper, we study the performance and area overhead , bandwidth.
I. INTRODUCT~ON
Multimedia applications are widespread and will become even more important in the future. Video telephony, digital television, video games, virtual reality simulators, etc. are growth areas of the future and the MPEG-4 standard has emerged as a key ingredient in many of these systems [I] , [2], [3] . Therefore, efficient hardware platforms to perform the set of algorithms within the standard are of great interest. Since these computations are varied, it is necessary to include a range of hardware resources in the system such as a DSP processor, RISC CPU, graphics engine, etc. [IO] . As a resutt, there is a need for high-throughput communications links between these blocks, and this can become a performance bottleneck. A busbased interconnect schemc is a shared medium which does not scale well to large systems requiring very high aggregate Networks-on-chip have been proposed as a way to overcome this limitation and provide a scalable interconnect environment [4]. Several types of network switches and topologies havc been proposed, but most of the performance analysis is done using random traffic models where the computational blocks are simply modeled as random number generators without respect to any particular application. These types of analyses are only of limited utility, since they do not address the actual traffic requirements that one would find in an actual application. Some recent papers have used more realistic traffic models. IV, we give the specific mapping that results for the MPEG-4 application onto our NoC, as well as on a baseline mesh topology NoC. Section V presents our results for performance and area overhead and our conclusions are given in Section VI.
MAPPING METHODOLOGY
Our overall procedure for characterizing the performance of the NoC for a particular application such as MPEG-4 is illustrated in Figure 1 .
We start with the given communication characteristic parameters such as bandwidth requirements, payload size, buffer size, and operating clock period, etc. The traffic generator, which is actually a resource (IP) model written in a hardware description language, generates traffic with a given probability of the packet being sent out to a given destination resource. The generated input traffic data is used to simulate the application on the NoC platform. Using the specified communication characteristics, a mapper groups the resources which communicate frequently onto the same switch in order to reduce latency. The transmitted and received packet traffic is traced in a log file. In a post-processing phase, we use the log file to analyze the performance. in particular, the latency of packet transmission and the FIFO buffer full signal are monitored for their performance. These steps can be iterated until all the requirements are met, After finding the best case structure and mapping for the NoC platform, wc can then synthesize it and thereby obtain the estimated frequency and area overhead of she system. (lo) , RISC CPU (1 l), scaling unit (12) and upsampling unit (1 3). The communication requirements between these blocks are specified in the data structure of Figure 2 . In this 
111
. CDMA STAR NoC ARCHITECTURE In a wired CDMA communication network, each data bit is represented as either an L-bit Walsh codeword or its one's complement depending on whether the bit is a 0 or a I , respectively [l 11. We refer to this process as modulation.
While this leads to an increase in the number of bits to be transmitted by each resource, it is offset by the fact that up to L -1 resources transmit concurrently through a switch. Each packet is composed of group identification, source address, destination address and payload fields. The transmitter module selects a codeword to use depending on the destination field of the packet and sends out a corresponding modulated codeword. The modulated codewords from the different sources attached to a switch are then summed together using a code adder block.
At the receiving side, the demodulation module recovers the original transmitted data using the same codeword that was used for'transmission.
Each transmitter module includes a FIFO buffer. This buffer is used for storing packets when other transmitters also wish to send a packet to the same destination at the same time. In this case, the scheduler controls which packet to send depending on a predefined scheduling algorithm. If the FIFO is full, the packet is not dropped. Rather, the transmitter sends a "buffer full" signal to the corresponding resourccs. Those resources will stop sending packets until that signal is deasserted. 
TV. MAPPING OF MPEG-4
In this section we consider the mapping o f the MPEG-4 application onto two types of network-on-chip structures, namely our proposed CDMA star network and a crossbar mesh network. From the given communication characteristic parameters such as bandwidth requirements, -payload size, buffer size, operating clock p'eriod, etc, the traffic generator creates packets with the specified probabifity. The generated input traffic data is used to simulate the MPEG-4 computations mapped onto the NoC platform.
A . Mapping onto Stur NoC with CDMA Switch

B. Mapping onto Mesh NoC with Crossbor Switch
To compare our CDMA NoC platform with another implementation, we considered mapping onto a crossbar-based mesh topology NoC, as shown in Fig. 6 . The crossbar switch has the same input buffer scheme and a buffer size of 8. Because there are a total of 12 IP blocks, we used a 4-by-3 mesh topology to accommodate all of these resources. The shortest path routing scheme is used for this mesh topology.
v. SIMULATION RESULTS AND PERFORMANCE ANALYSIS
In order to generate the results of interest, we logged all input and output packet transactions into a file. In a postprocessing phase, we use a script to analyze the average time to deliver packets and the buffer utilization that was obtained. We also compared the hop count for both pIatforms. For the area comparison, we synthesized both the CDMA star and the crossbar mesh switches using the Synplify ASIC tool with the Chip Express CX4001 0.25 p n structured library.
A . Hop Cozrnt
TABLE I HOP COUNT COMPARISON
The hop count for a packet is defined as the number of routers it has been forwarded through. Table I shows the average number of hops for both platforms. The table indicates that the CDMA star topology has favorable (Le., lower) hdp count values compared to the crossbar mesh topology.
B. Area Overhead
The synthesized area includes the total cell area for either network. In other words, it compares the total area for the 2 required CDMA star switches vs. the total area for the 12 required crossbar mesh switches. We used a buffer size of 8 in both platforms. The estimated maximum frequency is 76 MHz.
The Table I T shows that our CDMA NoC platform is about two times larger than the crossbar mesh topology platform.
The reason is that our prototype CDMA switch uses a more complex algorithm in the TX and Tzx modules compared to the simple crossbar input and output buffers. where N is the total number of received packets.
times faster than the general crossbar mesh topology. 
D. Bandwidth Constraints
The highest bandwidth requirement in our system, as given in Figure 2 , is 455 MBytesisec. We would like to determine if our CDMA NoC can meet that constraint The largest possible bandwidth is the maximum clock frequency, which was found to be 76 MHz, multiplied by the payload size in bytes. Of the cases considered, only the 64-bit payload size is sufficient to meet this constraint: 76 MHz x 8 bytes = 608 Mbyteslsec. This is a best-case value that does not take into account possible effects due to contention. However, the number is sufficiently high to strongly suggest that the network is fast enough to meet the MPEG throughput requirements.
VI. CONCLUS~ONS
We have obtained performance and area overhead results for a high-performance a CDMA-based network-on-chip implementation of an MPEG-4 processor. Realistic traffic rates between the IP resources in the design were used to determine the average latency €or packet transmission. In addition, we compared our proposed NoC implementation to one based on a traditional mesh topology and crossbar switches. TI was determined that the latency for the CDMA design is about one-ninth of that for the crossbar mesh network. However, synthesis results show that the area overhead is about 2 times as much for the CDMA network. We also illustrated the basic mapping and traffic generation techniques that can be applied to any multimedia application.
VII. ACKNOWLEDGMENTS
We thank Sang Woo Rhim, Bumhak Lee and Euiseok Kim of Samsung Advanced Institute of Technology (SAIT) for their help with this manuscript. This research work is supported by it grant from SAIT.
