On-chip communication architechrres can have a greai influence on the speed and area of Sysiem-on-Chip designs, and this influence is expected io be even more pronounced on reconjgurable Sysiem-on-Chip (rSoC) designs. To date, liitle research has been conducted on the .pe$omance implications of different on-chip communication architeciures for rSoC designs. 
Introduction
System-on-chip (SoC) technology has evolved as the predominant clrcuit design methodology for custom ASICs. SoC technology moves design from the circuit level to the system level, concentrating on the selection of appropriate pre-designed IP Blocks, and their interconnection into a complete system. However, modem ASIC design and fabrication are expensive. Design tools costs hundreds of thousands of dollars, wfule tooling and mask costs for large SoC designs now approach $1 million. For low volume applications, and especially for research projects in universities, reconfigurable Syskm-on-Chip (rSoC) technology is more cost effective. Like conventional SoC design, rSoC involves the assembly of predefmed IP blocks and their interconnection. However, here the fabrication technology uses mega-gate FPGAs, rather than custom ASICs.
First generation commercial rSoC products are now offered by most FPGA vendors.
Nevenheless, simply mapping ASIC designs on to reprogrammable devices will not yield efficient and optimized results, Due to the underlying architectural differences between reconfigurable devices and ASICs[l], such as much more constrained wiring channels, conventional design methods may not always be appropriate for rSoC. Therefore, there is a need to revisit current SoC design techniques and analyse the suitahility for application on rSoC platforms.
A key component in any SoC design is the interconnection fabric that is used for inter-module communication. Up until now, the most common interconnection strategy is a parallel system bus. A number of different "standard" busses are already being used in custom SoC designs. Other interconnection strategies for SoC have also been proposed, such as crossbar switches and packet-switch networks. For rSoC, a number of different bus "standards" have also been proposed. It is not clear that these bus structures, which have been used more or less u n c h g e d from custom SoC designs, are the most appropriate for rSoC. To date, little work has been done on altemate rSoC interconnection networks. This paper presents ow early work on evaluating the applicability of existing and proposed interconnection structures for rSoC designs and outlines our future methodology for quantitatively evaluating the relative pdormance of these different approaches.
'The bus architecture
The bus architecture is the most popular integration choice for SoC designs today. I t is derived from traditional bus design and organized in a hierarchical fashion to allow frequently accessed, faster devices to operate at their peak performance while letting slower components to be interconnected at lower cost.
'The main advantages of the buses are its flexibility and extensibility Components from different vendors can be used to build a system as long as they are designed to the staidard specification. Having a large inventory of 1P blocks is also an athactive attribute for system integrators.
This has encouraged standards such as the CoreConnect and AMBA busses for SoC design as described in the following sections.
IBM Coreconnect Bus
The IBM Coreconnect [Z] bus is a hierarchical, multibus based architecture. It provides three different buses for connecting IP cores. A typical system, shown in figure 1 , uses a high speed bus for connecting processors and controllers, a peripheral bus for lower bandwidth cores and a data control register bus. implement, therefore it is more appropriate for applications that communicates over longer distances. Recent advances have allowed the serial protocol to provide high transfer rates making it a viable option to consider on rSoCs.
Communication-based Architecture
Researchers at Stanford University have presented a SoC interconnect architecture that models a tightly coupled multi-processor system [ 5 ] . The interconnect scheme is based on a 2-D mesh topology, where each IP core is designed to fit into a rectangular area as a single tile. Each tile interconnects to one another in a mesh topology, allowing a structured network as shown in figure 2. Inter-tile communication follows a packetlmessage based paradigm. Data packets are routed through the network by a simple router interface with virtual flow control mechanism that resides on each tile. 
WISHBONE
The WISHBONE [4] SoC Interconnect architecture was developed by Silicore Corporation and supported by the Opencores organization. The WISHBONE standard takes a simpler approach of using a single level bus. The major difference and advantage of the WISHBONE archtecture is the ability to provide more customized interconnection schemes. System integrators can choose among four types of interconnection between IP cores: point-to-point, share bus, data flow and cross bar switch.
Other U 0 buses for rSoC
Another category of buses that are o k n used on desktop workstations is the I/O serial buses. Unlike parallel buses, serial buses require several clock cycles to complete a single phase, resulting in a much smaller bandwidth. However, serial buses are inexpensive to 
Analysis of Existing SoC Architectures
The popularity and wide acceptance of the bus architecture is perhaps due to the fact that it is easy to adopt and well-known among the computer industry.
Buses are also relatively inexpensive to implement.
However, wide data paths buses are known for their power inefliciency with studies showing that wires accounts for 40.50% of the total energy consumption onchip [6] .
Another criticism is the lack of scalability: as the number of components increases the effective bandwidth is shared. One solution employed by many highperformance buses i s to increase bus width, but this inevitably leads to an increase in power consumption.
Whilst wide buses are appropriate for the abundant wiring resources available on custom ASICs, they are much less appropriate to Wiring-constrained rSoC systems [7].
Network-on-chip presents a stmcme that models multiprocessor systems. By using a mesh topology, total system bandwidth increases when a new IP core is added, as number of paths for data transmission increases. This architecture is complimented by use of data packets, which allow concurrent communication among exiting IPS for better utilization of the bandwidth The use of pointto-point connection between cores also eliminates the need for different interfaces for different bus widths, as needed by hierarchical buses topology. One major disadvantage of the on-chip network is silicon usage. In order to guarantee more predictable electrical and performance parameters, the system specifies fixed tile areas to allocate IP cores. This is easy to achieve for systems where multiprocessors or duplicated computational Units are used, but inefficient for general systems. IP cores differ greatly in logic usage, and simple cores will leads to unused silicon mea within a fixed-area tile. This is particularly of concern in resourcelimited rSoC systems.
Router design is another cost concern for the network architecture. Unllke bus architectures, which employ a central arbiter, mesh nehvorks needs to implement distributed arbitration logic (also known as muters) in each tile space. The complexity of the router and the number of system components increases silicon usage in comparison to the bus architecture.
Proposed Methodology

Architecture selection
We propose a selection of architectures from three main categories: serial schemes, parallel busses and communication-based networks. While parallel buses have been used extensively for rSoC, Little work has been done for the others. Therefore, we plan to implement new schemes basedbn serial and communication architechlres for comparison with existing bus-based schemes.
Interface and interconnect design
IP blocks should be able to be used with any particular interconnection network. Hence we separate IP-core design from interconenect architecture design.
The logical design of the communication architecture will be divided into two main parts: The commnnication interface layer and the physical interconnect (as shown in figure 3 below).
Figure 3: Proposed methodology overview
The Communication Interface Layer, CIL, should provide reliable data communication with other CILs. Typically, only one type of CIL will be required for a particular scheme. An Interface Adapter Logic (IAL) will automatically generate the required logic to match a generic IP block to a particular CIL.
Each CIL acts as an edge node of the network -the rest 01' the network (such as routers, arbiters, wires and switches) is lumped together as the 'F'hysical Interconnect" box in figure 3.
Design Implementation
Design platform
The main design platform will be based on the Xilinx VLtex I1 system with the Microblaze soRcore processor. This platform is supported by Xilinx ISE EDA tools which will also allow us to bench mark the supplied CoreConnect bus IP cores with other implemented architectures.
Architecture implementation: Serial Bus
It is proposed that a modified version Dallas ]-Wire bus [XI is implemented as an on-clup serial bus The I-, Wire architecture presents a simple and reliable bus protocol that is able transfer data, address and control information over a single wire. We have chosen this bus for our initial exploration because of its simplicity and the significant contrast to wide parallel buses. In the first phase of the project a system supporting single master and multiple slaves will be implemented, as shown in figure 4. The master interface will he connected to the Microblaze softcore processor using a custom OPB interface. Tlus will allow for more rapid development, as the Microblaze core is designed to the CoreConnect specification.
The master interface will be able to address and initialize transfers with slave devices connected on the one-wire bus using the basic protocols and ROM commands. The slave interfaces listens on the one-wire bus and sent out the presence pulse when a reset pulse is issued by the master. When addressed, the particular slave will then respond to ule commands issued and performs required operations.
In the second phase of the one-wire protocol design, support for multiple slaves and masters will be implemented (as shown in figure 5 ) . A simple arbiter will be implemented so that the effect of different arbitration schemes can be analyzed. Two modifications to the original specification will be made to improve the performance. Firstly, the speed of the protocol will be modified to operate at higher transfer rate with the availability of faster clock frequencies and better signal integrity on-chip. Secondly, the ROM address bits will be reduced to eliminate overheads. As the original intenlion for the 64-bit address is mainly for unique identification purposes, such measnre is not required for on-clup IPS.
Architecture implementation: Network
ATM networks have high performance characteristics that includes a connection-orientated protocol aimed to provide guaranteed bandwidth. However, complex mechanisms are required for achieving QoS. Therefore we plan to implement a modified "ATM-Like" packetswitched network for on-chip communication. The ATM-like interface should implement simple flow control mechanisms that w i l l allow the cocxistence of slow and fast system components. SwitchiRonting control logic should be used to connect the different components, implementing a packet-switched architecture. Finally, a protocol should implement establishment of virtual circuits to provide guaranteed bandwidlh and QOS.
System analysis
System analysis involves using the implemented architecture in a system with a processor and IP peripherals. Quantitative analysis will bs conducted on the cost of logic utilization, power consumption and performance benchmark in different types of systems.
Conclusion
We have presented onr initial motivation and methodology for investigating efficient on-chip architectures for rSoC. Although onr work is still at an early stage, we note the following outcomes to date:
SoC interconnection fabrics are not necessarily the most appropriate for rSoC systems, because of the relatively higher cost of communication in rSoC systems. There has been little research into altemative rSoC interconnection architectures. Use of altemate interconnection strategies will depend on the availability of generic IP block designs. Serial hnsses provide a potentially attnctive interconnection fabric for low speed peripherals. Packet-based connection networks provide a more flexible and extensible communications fabric for complex multi-processor systems, but the cost of routers and switches may be problematic. More research, design and testing is planned to provide quantitative results about the relative communication network performance.
