We present a system-level modeling framework to model system-on-chips (SoC) consisting of hetemgeneous muliprocessors and network-on-chip communication sfmctures in order to enable the developers of todays SoC designs to take advantage of the flexibility and scalability of network-on-chip and rapidly explore high-level design altematives to meet their system requirements. We present a modeling appmach for developing high-level performance models for these SoC designs and outline how this systemlevel performance analysis capability can be integrated into an overall environment for eficient SoC design. We show how a hand-held multimedia terminal, consisting of JPEG, MP3 and GSM applications, can be modeled as a multipmcessor SoC in our framework.
Introduction
Networks on chip (NoC's) are receiving considerable attention as a solution to the interconnect problem in highlycomplex chips. The reason is two-fold. First, NoC's help resolve the electrical problems in new deep-submicron technologies, as they structure and manage global wires. At the same time, they share wires, lowering their number and increasing their utilization. NoC's can also be energy-efficient and reliable, and are scalable compared to buses. Second, NoC's also decouple computation from communication, which is essential in managing the design of billiontransistor chips. NoC's achieve this decoupling because they are traditionally designed using protocol stacks, which provide well-defined interfaces separating communication service usage from service implemenration. Using networks for on-chip communication when designing systemson-chip (SoC), however, raises a number of new issues that must be taken into account. This is because, in contrast to existing on-chip interconnects (e.g., buses, switches, or point-to-point wires), where the communicating modules are directly connected, in a NoC, the modules communicate remotely via network nodes. As a result, interconnect arbitration changes from centralized to distributed, and issues like out-of order transactions, higher latencies, and end-toend flow control must be handled either by the intellectual property block (IP) or by the network.
Multimedia is an increasingly important application area In this paper, we present a system-level NoC model, which is an extension of our previous multiprocessor SoC modeling framework [4]. The extended model is able to model heterogeneous multiprocessor architectures interconnected through a an on-chip network architecture, such as a mesh or a torus. We show how a hand-held multimedia terminal, consisting of integrated JPEG encoding and decoding, and M p 3 decoding as well as GSM encoding and decoding for the wireless transmission, can be modeled at 0-7803-8558-6/04/$20.00 02004 IEEE the system-level in our modeling framework. To address the system-level design challenges described above, we need an extended system-on-chip design process, including the effects of the network-on-chip, with the ability to evaluate options and make critical architectural decisions based on a system-level representation in advance of a detailed design. A key pre-requisite is a library of ahstract component models that captures their respective performance, power, and physical characteristics.
The primary goal of system-level modeling for emhedded systems is to formulate a model within which a broad class of designs can be developed and explored. Moreover, the difficulty of verifying the design of complex systems can be reduced by decomposing a system into smaller subsystems, independently verifying an implementation of the subsystems, and then proving that the composition of the subsystem specifications satisfies the overall system specification. In order to do so, accurate modelling of the sys-.em and all the interrelationships among the diverse processors, software processes, physical interfaces and interconnections is needed.
The scheduling problem, central to the analysis of the complexity of concurrent programs, depends on the way in which the scheduled tasks are mapped on the processing elements which, in tum, is linked with the physical architecture of the computing platforms.
A real-time operating system is meant to provide some assurances ahout the timely performance of tasks. Unfortunately, most mechanisms used in the basic RTOS services are not compositional in nature. Even if a mechanism can provide assurances individually to each task, there is no systematic way to provide assurances for an aggregate of two except in trivial cases.
To supporl the designers of single chip-based embedded systems, which includes multiprocessor platforms running dedicated RTOS's, we have developed a modeling environment based on SystemC [2, 4] . In our abstract RTOS modeling framework, we deal with generalized abstract tasks, processing elements, and communication infrastructures. For the purposes of modelling, three distinct but closely-related RTOS services have heen identified, namely, task scheduling, execution synchronization, and resource allocation.
Model Implementation
We have implemented our system-level modeling framework in SystemC. SystemC is in a class of languages that target modeling of hardware and software systems, and it has the desirable feature of being able to simulate models at a very high level of abstraction together with low-level ones. Figure 1 gives an overview of our system-level SoC model, including the processor model and the NoC model which will be described in this section. In our model, such an application is represented as a multithreaded application comprising a set of tasks where each task, z, can be decomposed into a sequence of task segments, zj. Each task segment, zj, is required to precede a given set of other task segments. Moreover, each task segment also excludes a given set of other task segments for the use of shared resources. For each task, we are given a release time, rk, a release-time offset, oi. a start time, sk, a best-case execution time, bcer,, a worst-case execution time, wcerj, a deadline, d,, a period, E , and a context switch time, cswj. A similar set of parameters can be computed for each task segment, z;, relative to the beginning of the task containing that task segment. The mu1tiprocesr:or platform is modelled as a collection of Processing Elements, PEk, and Devices, Dk, interconnected by a set of Communication Channels, Ck. Each PEk is modelled in terms of the RTOS services provided to the tasks comprising the application. Based on the principle of composition, three basic RTOS services are modeled a scheduler, a synchronizer, and a resource allocator.
C I
The scheduler is modeled around the priority-based preemptive scheduling policy which is one of the most preferred scheduling policies for the execution of tmks in realtime systems due to its higher schedulability. According to our scheduler model, whenever a task becomes ready or finishes execution, the scheduler is called and it then looks for a ready task with maximal priority to continue execution. In our synchronizer model, synchronization is regarded as a means to prevent undesirable task interleavings by the scheduler. Our synchronizer model is responsible for establishing the correctness of the results computed by the multiprocessor platform and it implements the Direct Synchronization (DS) protocol [9] .
Extension of the Abstract RTOS Model to Model NoCs
For the purpose of forming a system-level NoC simulation model, unlike a network simulator, we have abstracted away all the low-level network details except the most essential ones (e.g., topology, latency, etc.). We treat the on- The MP3 decoder is the most critical multimedia application and mapping its task graph on a single processor, even on a fast processor, reveals that some tasks m i s s their deadlines. Therefore, the MP3 application task graph has been partitioned and mapped on two fast processors which, as mentioned above, are interconnected through a NoC. The P E G encoder and decoder applications are mapped to the same two fast processors as the MP3 decoder, whereas the GSM encoder is mapped onto a third fast processor and the GSM decoder is mapped on a slow processor. This mapping results in the exchange of communication messages between the two fast processors over the NoC.
In order to illustrate the capabilities of our modeling framework, we are using two different schedulers. RM scheduling is used on the two fast processors to handle P E G and MP3, whereas the two GSM applications are scheduled using EDF scheduling. Table 1 summarizes the characteristics of the multimedia application.
Conclusions
We have presented a system-level, system-on-chip modeling framework and discussed how our original SoC model has been extended to handle the effects of the on-chip interconnection infrastructure, i.e., the network-on-chip. We have demonstrated the capabilities of our modeling framework by modeling and simulating a hand-held multimedia terminal application mapped on a heterogeneous 4-processor SoC architecture interconnected through a torus on-chip network topology. It is worth mentioning, however, that our system-level modeling framework supports more sophisticated scheduling policies and NoC topologies. Moreover, features like including the effects of the network interface and memory accesses as well as dynamic load bdancing support can be built upon by adding more components to the existing framework components. We are currently extending our modeling framework to include radio and transducer components in order to be able to model wireless sensor networks, i.e., a distributed system of SoCs.
