In this study, we discuss communication aspects concerning a segmented bus plarform. Placed somewhere midway between the classical system bus and the network on chip approaches, the segmented bus architecture provides certain performance improvements in comparison with the firsf, while employing a much simpler communication structure and algorithm than those thought for the second. Our implementation strategy targets an FPGA technology. The result comes as a parameterized communication scheme for system on chip designers.
Introduction
The growing diversity of devices within the boundaries of a modem system-on-a-chip (SOC) places an increasing stress on the design goals such as performance, power consumption, communication. Both the system design and performance are limited by the complexity of the interconnee tion between the different modules and blocks that are integrated into these chips. Single clocked designs are not anymore a solution. Different data transfer speeds are required. as well as parallel transmission. The traditional system bus may not he suitable for such a design. Since only one module can transmit at a time, the bus is slow due to large capacitive load [ 5 ] caused by the interfaces of the modules that are attached to it and the large physical length. Additionally, the modem SOC designer assembles the system using ready-components -IPS, which might not be easily adaptable to different clocking situations.
One solution to the above mentioned problems is a segmented bus design combined with a globally asynchronous locally synchronous (GALS) [3] system architecture. In this approach, each distinct module of a SOC system works based on an optimized local clock, whereas interactions between those modules are asynchronous. Hence, the routing of the clock signal and the clock skew are no more system level design issues; they are limited to a local synchronous segment.
i In large, this study describes the realization of a synchronous segmented bus system. Each segment can be identified with a different clock domain. Between segments, there are FIFO structures with additional control logic, which we call segment borders. The focus of the paper is on describing the communication flow, from a synchronous segmented bus implementation perspective. The most important aspect resides in the accommodation of two neighboring clock domains, upon which no frequency assump tions can be made. The underlying implementation technology is represented by the ALTERA APEX device family. We offer, thus, a parameterized segmented bus platform composed of VHDL, schematic and software elements.
Related work Aspects regarding the design of a segmented bus system have been originally analyzed in [Ill, and detailed in [lo] . The intended architecture was, however, a fully asynchronous one. This relaxed several analysis assumptions, as the request-acknowledge handshake signals provided the self-timed synchronization required for data transfers.
Perhaps one of the most illustrative studies on the implementation of segmented bus systems is the work of lone et al. in [7] . In comparison, we offer an FPGA platform, as opposed to the ASIC approach. Even more importantly, we also bring into consideration distinct clock domains.
A synchronizing buffer between two mutually asynchronous clock domains is presented by Kessels et al. in [8] .
However, the presented structure is unidirectional. In order to be able to use it in our approach, where bidirectional data transfers are necessary, penalties in area would hecome too expensive (double size). One of the main elements on which we base our a p proach is the "glitch protection for unrelated clock sources" device (GPD), as described in [2] . It offers the possibility to choose between two clock signals and it is a tested element in the FPGA community.
The Segmented Bus Architecture
A bus partitioned into two or more segments is called a segmented bus. Each segment acts as a normal bus be-tween modules that are connected to it and operates in parallel with other segments. Segments can be dynamically connected to each other, in order to establish connection between modules located in different segments. The concept of segmenting the bus (re)appeared in recent approaches [4, 91, in the context of single-chip devices. Due to the segmentation of this resource, parallel transactions can take place, thus increasing the performance. When parts of the segmented bus are not involved in transactions, or they only transfer intra-segment data, they become isolated from the rest of the bus.
We decide to soften the somehow classical division of the participants in a bus-based system into masrers and slaves. In our view, the processing elements traditionally viewed as slaves, will also have the right to request bus access, but this will follow a previous request from a (traditional) master device. In brief, if we categorize data sent by the "master" to a "slave" as a write transaction, the reply activity, i.e. returning the processed data to the master, is a read transaction, initiated hy the "slave" and targeting the original requesting "master". This view on the subsystems taking part in the communication along the bus lines simplifies the signaling mechanisms between components and therefore reduces the pre-transfer communication overhead. The module that requires services from another module must not only present the destination address, but also its own address; this will be used by the processing component when answering to the request. Henceforward, we will refer to the master-slave communication as a module-to-module (m2m) communication. In a one-bus approach, the current m2m connection occupies the whole length of the bus, even though the communicating devices are physically closely placed. The segmented bus approach would allow this kind of connections to occupy only'a reduced length of the bus, while other devices could use the remaining segments. A high level view on a segmented bus architecture is illustrated in Figure 1. 
Communication Analysis
Transferring data along the lines of the segmented bus platform is split into two levels: the local, intra-segment and the external, inter-segment communication procedures. Arbitration. We think the segmented bus as having a single central arbitration unit (CA) and several local arbitration units (SA), one for each segment. We employed a simple round robin arbitration [I31 at the level of SAS, and a similar one at the CA level. These modules may be implemented as hardware as well as software units.
A request for an inter-segment transfer must present to the CA the address of the target segment. Differently from the asynchronous solution [IO] , the read I write mode is not necessary, as.the connections will always open from the requesting segment towards the receiver one. The CA delivers further to the selected segments information regarding the imminent occurrence of an inter-segment transfer (the signal OP) and the direction in which data will travel (Dir). The initiating and target segments also receive indication on this matter (IS, 73 lines, respectively), such that we do not implement, at the level of each segment, an address decoder for identifying these characteristics. Requests for inter-segment transfers have a higher priority compared to the local ones. However, the appearance of an inter-segment transfer request (after Op is set) will not interrupt the current job on the requested segment; instead, the transfer will take place immediately after the current transaction finishes.
The SAS ensure a certain limit size of the transfers, for both intra-and inter-segment. This limit is a common characteristic for the whole system.
Segment organization
In [ l l ] , the communication was viewed from an asynchronous design perspective: once a certain device received access to the bus, for an inter-segment transfer, it had direct visibility of the target. From the synchronous standpoint, this is no longer a valid assumption. The intermediate bus buffers, are not only elements used for improving the signal characteristics, but they are also clocked devices, acting on different clock sources.
We base the development of our synchronous system platform on a "store-and-forward'' communication policy.
At each segment border, one has to make use of synchronizers in order to adapt to the different clock domains. Additional synchronizers are used in the communication with the CA. However, these synchronization elements are only used in connection with communication protocols, and do not interfere with the actual transfer of data. There are six signals that need synchronization: four of them come from the CA: Op, Dir; IS, TS; signaling that the FIFO is full comes through the internal border Rag FF, or the clock selection signal, selc, for a right-to-left, or a left-to-right data move, respectively.
Each of the SAS controls one of the logic blocks (composed of one GPD and some additional logic) that deliver the operating clock signal to the segment border elements.
The exception to this rule is either one of the extreme seg-ments. We have chosen to assign each border to the adjacent left segment, for control purposes. 
Inter-segment 'kansactions
We describe the communication control and data flow by imagining a transfer towards the right side of the bus, from the segment k to segment n (n > k). The reverse situation follows a slightly different procedure, which we will shortly explain at the end of this section. We also offer a generic, not detailed, illustration of the SA operation in Fig. 3 .
Suppose that a module in a segment k requires access to a module situated in segment n; for this, it starts by raising a request to the SAe, specifying also the address of the target module. Only one of such requests can be granted at a time in each segment. The SA,, forwards the request to the CA. After a while, the CA answers favorably by infoming all the segments from k to n that there is a transfer on the way: the Op and Dir signals go high. During this deliberation, the modules within all these segments proceed with their granted tasks, including the segment k. Upon receiving the grant from the CA and allowing the current operation to finalize, the SAe grants in its turn the bus access to the requesting module. It also forces the clock delivery block to forward its own clock signal ( c l k k ) to the border FIFO.
The transfer may now begin, with the granted module filling up the border FIFO between segments k and k + 1.
At the end of the transfer, SAk switches the delivered clock to clka+l. Simultaneously, it also raises the flag OF (''finished), to the CA. After this, the border FIFO is isolated from the segment bus lines, and some local transfers may start, following an arbitration round; notice that it is also possible that a new inter-segment transfer request to be forwarded to the CA, only if the target would be towards the left of the segment. The next segment, k + 1, notices the change in the clock signal delivered to the FIFO on its left side, through one of the synchronized signals between the two segments (selck). The right border FIFO clock is clkk+l and, after allowing the current job to finish, it connects the left border FIFO to the right one, for transferring the data packet. The procedure of informing the CA and the next segment (k + 2) about the end of the transaction, while also isolating the right hand side border FIFO from the bus lines is repeated; after this, it may start another arbitration round. Again, possible data moves towards left may be requested.
Figure 3. Segment arbiter operation diagram.
The described control and data transfer procedures are repeated until the packet of data reaches the segment n, where the targeted module may read it, from the corresponding left border FIFO.
In case of a left-transfer of data (supposing from segment n to segment k), once the CA raises the Op signal (Dir = 'O'), the segments k to n -1 will switch the clock for the respective border FIFOs to match the right segment clock signal. The filling of the FIFOs, from the right, is signaled by FF, triggering the change of the delivered clock signals, and then the transfer towards left, once the current operation on the respective segment is finished.
Throughput
It is easy to derive, from the above description of the data transfers, which is the maximal latency that characterizes the segmented bus stmcture. The computation, considered here in clock ticks, has to take into account the number of data items that form a packet (PL), the transfer of the packet from one segment end to the other, the synchronization delay (SSreps) and the number of segments to be traversed (NRSegs). We arrive to the following formula:
The first term of (I) sums the number of ticks necessary for filling the border FIFOs on the way from the initiating master to the target; the second term represents the synchronization of signals between all two adjacent segments, on the way from segment k to segment k + NRSegs. The third term basically gives the variation of the latency formula. Relation (I) considers the worst case scenario, estab lished when every segment in the transmission chain starts a local job, at the very same moment when the synchronized information from the previous segment in the chain has reached the local SA. Therefore, the data packet has to wait for a complete local transaction to finish, before continuing its trip towards the target segment.
Platform Parameters
The segmented bus platform is presented as an I P module with several parameters meant for accommodating the structure to a variety of requirements: the number of segments (2 to 4). the bus width (2 to 32). transfer limit (2 to 16), the number of synchronization steps (2 to 4). The first three parameters are global characteristics of the bus system, while the last one may be specified for each segment, in order to efficiently trim the cooperation between adjacent segments.
Simulation Results
We have performed the simulation of our platform in the Altera's Quartus environment. We have considered a three-segment bus, and masters having to transfer a total of loo0 data items over the bus, either in the same segment, or to the other ones. The assumed load is presented in Fig. 4 ; the numbers of relation (1) are, respectively: PL = 10, SSteps = 2, NTSegS E {l, 2). The same setup applies to a single bus framework. The maximum frequency was established after the synthesis process, as 124 MHz. The total communication time, in clock ticks, for the single bus construction sums up to 3000 ticks, without considering arbitration time. With the same assumption, we obtain a 2880 ticks figure for the segmented bus actual transfers, in the most unfavorable situation, i.e., the last term in relation (I) is maximal. These results are captured in Fig.  4 . Clearly, they are not satisfactory, even though we obtain a slight improvement. The explanation comes from the fact that the parallelism of the platform was not used at its p e tential: there were many idle periods on one segment, when the other two were communicating. The advantage of the 208 approach is visible in the results corresponding to the Relative load' columns of Fig. 4 : we replaced the mentioned idle periods with local activity. In the same budget of ticks we obtain a 300% increase in the number of local transfers.
Even compared to the asynchronous realization of [I 1, IO] , the above figures are outstanding. Considering also the reduced number of communication signals, these characteristics show extended improvements in the overall performance.
Conclusions
We have presented communication issues addressing the realization of a synchronous segmented bus platform. Performance-wise, the platform is placed half-way between the classical system bus and the network on chip approaches [6] . It provides certain performance improvements in comparison with the first, and employs a much simpler communication structure than those thought for the second.
The information extracted from the simulation runs obliges us further to improve our approach, by a detailed mathematical investigation. Such an analysis is currently under study [12] . The success of a segmented bus implementation depends on the profile of the accesses between different hardware units, on the organization of the segments, and on the assignment of the units to segments.
