Title reuse of existing modules. Flexibility is required in order to avoid early decisions. It allows the designer to decide quite late in the design process on which technology will be used for the design of each module. When combined with modularity, flexibility allows to change the implementation for a given module at any stage of the design process. For instance a software module may be converted into a hardware module for performance reasons. Scalability allows to adapt the same architecture model for applications of different complexity scale. For instance increasing the number of processors or communication buses.
Related work
Most of the existing works target monoprocessor architectures, and the most used model in this class is the single CPU, single ASIC target architecture. Even thought this architecture is a special and limited example of a distributed system, it is relevant in the area of embedded systems [12] . In this class of work we can cite LYCOS [11] , COSYMA [13] , CoWare [16] and PMOSS [6] . Another design systems, such as Vulcan [8] , TOSCA [1] and COBRA [10] , can support more than one ASIC. Several research groups tried to target multiprocessor architectures; we can cite POLIS [2] , Chinook [4] , SpecSyn [7] and the work led by Wolf and Ti-Yen [17] . In the POLIS [2] system, the target architecture is a system consisting of general-purpose processors combined with a few ASICs and possible other components such as DSPs. The target architecture in the SpecSyn [7] system is a heterogeneous multiprocessor with any number of processors, coprocessors, ASIP or FPGA, communicating through multiple buses. Besides these academic research projects, there were also several industrial trails of open standards and design methodologies [3] [15] [18] [19][20] [21] that try to deal with the more and more complex system on chip designs. However, we believe that in all above works, target architectures still lack generic aspects and thus only tackle a restricted application field. In fact, most of the above mentioned systems restrict the kind of components used and/or the communication network to few proprietary and/or specific models designed to be plugged together.
Contribution
Our long-term objective is the definition of an efficient multiprocessor SoC design environment applicable to large application fields and able to generate multiprocessor architectures. The main contribution here is the definition of a modular, flexible, and scalable architecture model -MFSAM-that may be used for an efficient multiprocessor SoC design flow and handles a large class of applications. The main futures of this model are:
1. As stated above, the model allows for modularity, flexibility and scalability. 2. The model allows for automatic generation of architectures. This point needs some harmonization with the input model and the targeting algorithms.
The model is made of a set of processors communicating through a communication network (Figure 1 ). 
Figure 1. Generic architectural model
A processor may be hardware, software or an IP. The communication network may be of any complexity; it may be made of a single bus or a network with complex protocols. Processors are linked to the common network through communication interfaces. The scalability of this architecture depends on the scalability of the chosen communication network. Modularity is ensured by the use of specific interfaces to link processors to the communication network. This gives the possibility of designing each part of the application separately; even we can include pre-designed modules (IPs). The generic assembling scheme of our model increases largely its modularity. This separation between processor and communication network through specific interfaces also provides high flexibility. In fact, if we change the technology implementing a given module (processor) the only part of the architecture that needs to be changed is the interface of the corresponding module. This chapter focuses on the definition of the architecture model and its use within an architecture generation flow.
Architecture in System Design
In this section, we give a brief analysis of system architectures encountered in electronic systems [5] [14] . Both embedded system architectures and general-purpose computer architectures are made of three elements: basic components, communication network, and organization scheme. 
Communication network
The communication network constitutes the hardware links that support the communication primitives between components. The direct way to connect the components of a system is to have a dedicated communication link between every two communicating components. Between this fully connected network and the bus, there are a wide range of interconnection networks. These networks make a major factor to differentiate modern multiprocessor architectures. Interconnection networks are built up of switching elements and a topology. The topology is the pattern in which the individual switches are connected to other components, like processors, memories, I/O devices, and other switches.
Organization scheme
It is defined as the method of composition or assembling of the basic components to construct the system architecture. A system architecture can vary from a simple controller to a massively parallel machine. Here what is interesting is the roles played by the different basic elements at the global control level of the system. Thus, we can classify system architectures into two categories: monoprocessor architectures and multiprocessor architectures. In addition, communication network and programming model play an essential role in the classification of system architectures. A monoprocessor architecture consists of one CPU and one or more ASICs. This scheme follows a master-slave synchronization scheme where the top controller acts as a main processor in charge of coordinating the activities of the other components which are acting as co-processors: although very useful in several application domains, the single processor architecture can only provide a restricted performance capability because of the lack of true parallelism.
A multiprocessor architecture allows more flexibility and improved performances thanks to the distribution of computation among processors. However, it is much more difficult to handle due the parallelism.
MFS Architecture Model
The new architecture model that we propose here allows for modularity, flexibility and scalability. This model is based on the analysis results of existing architectures in system design (done in section II). We chose from that huge design space the most appropriate elements that fit our needs. These needs are -in addition to modularity, flexibility and scalability-the adaptability for multiprocessor SoC design and the possibility of an automatic generation of the final architecture. Also, we are mostly interested by architectures suitable for embedded systems on chip which is very relevant in many potential application fields. In this section, we will develop the new architecture model we propose. 
Components
The components of our architecture model belong to the three essential categories: software, hardware, and communication components. It consists of software processors, hardware blocks, memories, and communication interfaces. 3.1.1. Software processors.
In this category we can include off-the-shelf microprocessors, microcontrollers, DSPs, and application specific processors (i.e. ASIPs). The attributes requested by our architecture model for this category of components can be verified by most of them. The processor should be able to communicate through an external memory bus (synchronous or asynchronous). It should also be able to handle external interrupts. Additional I/O ports and internal peripherals may be useful but not necessary. The processor-memory interface and interrupts are essential for building the processor-network communication interface.
Hardware blocks.
To add a pre-designed hardware block (i.e. IP) following our architecture model it must be provided with its communication interface adapted the to the network. The reason for that is that there is no normalized bus interface for such components. 3.1.3. Memory. Memories are essential components in system design. They are integrated in our architecture model as pre-designed blocks. Thus, to use a memory block only its access scheme and access timing are required. A memory controller with an address decoder must be added to adapt the processor bus or network bus to the memory (local and global memories).
Communication Interfaces. Communication interfaces provide bridges
between previous components and the communication network. For software processors, these interfaces can be generated according to both: the processor attributes (e.g. memory bus, interrupts and I/O ports) and communication network. For hardware blocks, communication interfaces are often built in or provided with.
Communication Network
Although a wide range of communication architecture exists, only few ones are suitable in embedded systems on chip design methods. At a high abstraction level, communications are established through abstract channels, however at the physical level these abstract channels should be transformed in physical wires. The more commonly supported communication network is the point-to-point network. The great advantage of this network that it already leads for an automatic architecture generation from system level specification. Such a network achieves great performances but has a high cost. A more sophisticated network that can also satisfy same conditions as the previous one is the hierarchical buses network. However, it is more difficult to be supported by system design methods. These two networks meet quite well the increasing complexity in embedded systems on chip.
Organization scheme
As figure 1 shows, the proposed architecture model has a multiprocessor organization scheme. The model is made of a set of components communicating through a
Draft version
Title communication network. The control is distributed among the different components that constitute the architecture. Off course, the components and communication network specifications are those mentioned above.
Architecture Generation Flow Based on MFSAM
In this section, we present the overall flow for architecture generation based on the MFSAM presented in section 3. As we stated before, the final aim of this work is to build an efficient multiprocessor SoC design tool able to generate a complex application specific architecture, and that is for a large class of applications. The modularity, flexibility and scalability of the proposed model lead for wide application domain, but require to fix a large number of parameters at the design of each application. Thus, in order to make practical the use of the proposed model as template for multiprocessor SoC design tool we chose to drive from it a set of architecture platforms, each of which target a set of applications (i.e. an application filed), see figure 2. This specialization of the architecture model should also assist the designer in the implementation choices of the application. In addition, as all of these architecture platforms are derived from the same architecture model, the same architecture generation flow and generation tool can support all of them. The architecture generation tool will have at its entry an architecture platform (depending on the application field) and the application specific parameters that will configure the platform. These application specific parameters may be the results of a system level synthesis tool (e.g. a hardware/software codesign tool). The architecture generation flow is shown figure 3. 
Title
Further illustrations about the implementation and efficiency of this flow will be given in the next section through a demonstration example.
Demonstration Example
In this example we show the feasibility and prove the efficiency of the proposed architecture model. A multiprocessor architecture platform based on MFSAM is proposed. Then, several application-specific architectures are generated, and that is for two application examples. The validation of the generated architectures was done by cosimulation. The results analysis shows the effectiveness of the proposed architecture model.
A multiprocessor architecture platform based on MFSAM
As we already dispose of the development kits of the two software processors ARM7 and MC68000, we considered an architecture platform based on these two processors. The architecture platform constitutes of the following components: two types of software processors (ARM7 and MC68000), local memories and communication interfaces. The communication network is a point-to-point network. The block diagram of this architecture platform is given figure 4. The free parameters of this architecture platform are the number of software processors, the number of I/O channels for each software processor, the interconnections between them and the interconnections with external systems. These parameters show the scalability of the platform and therefor enable the design of application specific architectures of different scales. In fact, as we mentioned in section IV, the application specific architecture is generated thanks to an architecture generation tool. This tool is the subject of other work in our research group. Actually, communication interfaces are almost automatically generated as hardware blocs according to the processor attributes and to the application parameters (communication channels). In order to validate the generated architecture, we need a cycle accurate executable architecture that can run the application. To that end, we used a cosimulation approach [9] . In this approach software processors are replaced by cycle accurate ISSs (ISS + BFM). All other parts of the architecture are modeled in Title VHDL RTL and executed by a VHDL simulator (e.g. VSS). The cosimulation tool ensures the interconnection and synchronization of the running simulators for coherent execution of the overall system.
Application examples
Many applications can be mapped on the architecture platform presented above. Off course performance and cost aspects must be taken into account. We have used this platform to implement two applications: a Packet Routing Switch and an IS-95 CDMA mobile station. As mentioned in the section IV, to map an application to an architecture platform we have to fix the application specific parameters. Then, the architecture generation tool takes in charge the configuration of the platform and the generation of the application specific architecture.
5.2.1. Packet Routing Switch. It constitutes a powerful solution for large frame or cell switching systems. The version we present here is a simplified one; it consists of two input controllers and two output controllers. Each of the controllers handle one communication channel, and the communication links between input and output controllers is configured by an external signal to be direct or switched. We chose two architectures to implement this switch, one with four processors (two ARM7 and two MC68000), and the other with only two processors (one ARM7 and one MC68000). The configuration of the architecture platform with these parameters led for the two architectures shown in figure 5 . The difference between this architecture and the one with four processors presented in figure 5 is the I/O for each processor, the interconnection between processors, and the external I/O.
As we stated in the V.1, these architectures have been validated by cosimulation at RT level. 
Analysis of the results
Many other applications of different scales can be mapped on this single architecture platform. For this example, the architecture generation was done manually and it took about 1 day to generate one architecture. However, when the architecture generation tool will be accomplished, this time will drop significantly and it will be reduced to the time to capture the application specific parameters, i.e. few minutes. This example illustrates the feasibility and the efficiency of our architecture model. With this model, multiprocessor architectures become much easier to handle. We illustrated how generation of application specific architecture can become sample and very quick.
Note that the architecture model we propose in this chapter is far more generic than the architecture platform we presented in this example. This leads obviously for a huge application field. Other kinds of software processors (and DSP cores) can be integrated and used in the same manner. This shows the great flexibility and modularity of the proposed architecture model. The modularity of our architecture model appears in the organization scheme, which consists of separated modules communicating through a communication network. It separate the behavior from the inter sub-system communication. In addition, each module can be designed separately, an assembling scheme is provided to efficiently connect them and to enable the reuse of existing modules. This assembling scheme is quite structured and permits easily the reconfiguration of the architecture. Thus, technology choice can be done late in the design process which lead to a great flexibility. The scalability of our architecture model is also achieved thanks to the assembling scheme. It depends on the scalability of the chosen communication network. This scalability allows to adapt the proposed architecture model for applications of different complexity scale. For instance increasing the number of processors or communication buses.
It is worth noting that we consider the software processor cores without their peripheral components. This fact make our approach more generic and optimized as various issues emerge when considering these peripheral components (software targeting, communication interfaces...etc).
Performance aspects of the architectures generated according to the proposed architecture model are subject of our running work.
Conclusions
In this chapter, we presented a generic architecture model for multiprocessor embedded system-on-chip design The proposed model is modular, flexible and scalable. It permits an efficient generation of multiprocessor architectures for embedded systems on chip. This work forms a promised step towards the definition of an efficient multiprocessor SoC design environment applicable to a large application domain. The key point when defining such an environment is to define the architecture model that will fix the class of architectures handled by the system. The model allows for automatic architecture generation. This chapter focused on the definition of the architecture model. The feasibility and effectiveness of this architecture model was illustrated by a significant demonstration example.
