In this study, we discuss arbitration aspects concerning a segmented bus platform for SOC, and analyze a software implementation of the related procedures. Placed somewhere mid-way between the classical system bus and the network on chip approaches, the segmented bus architecture provides certain performance improvements in comparison with the first, while employing a much simpler communication structure and algorithm than those thought for the second. Our implementation strategy targets an FPGA technology.
1.
Introduction The growing diversity of devices within the boundaries of a modern system-on-a-chip (SOC) places an increasing stress on the design goals such as performance, power consumption, communication. Both the system design and performance are limited by the complexity of the interconnection between the different modules and blocks that are integrated into these chips. Single clocked designs are not anymore a solution.
Different data transfer speeds are required, as well as parallel transmission. The traditional system bus may not be suitable for such a design. Since only one module can transmit at a time, the bus is slow due to large capacitive load caused by the interfaces of the modules that are attached to it and the large physical length. Additionally, the modem SOC designer assembles the system using intellectual property ready-components, which might not be easily adaptable to different clocking situations. One solution to the above mentioned problems is a segmented bus design combined with a globally asynchronous locally synchronous (GALS) [I] system architecture. In this approach, each distinct module of a SOC system works based on an optimized local clock, whereas interactions between those modules are asynchronous. From other perspective, the hardware platform is usually meant to support, or interact with specific, increasingly complex software applications. Appropriate interfaces, drivers and translation procedures are to be eventually built in order to efficiently and correctly accommodate the cooperation between the hardware and software modules. This study describes the sofhare implementation of the arbitration units, for a synchronow segmented bus platform. Generally, each segment can be identified with a different clock domain. Between segments, there are FIFO-like structures with additional control logic, which we call segment borders. Most of the communication and processing task should concern devices placed within the same segment borders. Their access to the shared resources is analyzed by local arbitration elements. Whenever inter-segment cooperation is required, a central arbitration unit will analyze it and deliver the expected answer. We view both the local and the central arbitration units as software modules cooperating with a much more relaxed hardware realization of communicating modules and FIFO logic. The underlying implementation technology is represented by the ALTERA APEX device family.
Related work. Aspects regarding the design of a segmented bus system have been originally analyzed in
[6]. The intended architecture was, however, a fully asynchronous one, thus relaxing several analysis assumptions: the request-acknowledge handshake signals provided the self-timed synchronization required for data transfers.
Perhaps one of the most illustrative studies on the implementation of segmented bus systems is the work of Jone et al. in [4] . In comparison, we offer an FPCA platform, as opposed to the ASIC approach. Moreover, based on the offered facilities, it becomes possible to select parts of the design units to be implemented as hardware or software elements. We selected the local and the central arbiters to run as software modules, and let the other participants be implemented in hardware.
A synchronizing buffer between two mutually asynchronous clock domains is presented by Kessels et al. in [5] . However, the presented structure is unidirectional. In order to be able to use it in our approach, where bidirectional data transfers are necessary, penalties in area would become too expensive.
2.
The Segmented Bus Architecture A segmented bus is bus partitioned into two or more segments. Each segment acts as a normal bus between modules that are connected to it and operates in parallel with other segments. Segments can be dynamically connected to each other, in order to establish connection between modules located in different segments. Due to the segmentation of this resource, parallel transactions can take place, thus increasing the performance. parts of the segmented bus are not involved in transactions, or they only transfer intra-segment data, they become isolated from the rest of the bus. In our approach, we decide to abdicate from the somehow classical separation of bus-system participants into musters and slaves. Therefore, either of the modules placed on the bus will have the responsibility of requesting access to the bus, but, internally, this decision could come as an answer to a previous solicitation from another module. Thus, we simplify the signaling mechanisms between components and therefore we reduce the pre-transfer communication overhead. On the other hand, the module that requires services from another module must not only present the destination address, but also its own address; this will be used by the processing component when answering to the request. In a one-bus approach, the current module-to-module connection occupies the whole length of the bus, even though the communicating devices are physically closely placed. The segmented bus approach would allow this kind of connections to occupy only a reduced length of the bus, while other devices could use the remaining segments. A high level view on the segmented bus architecture is illustrated below. 
Bus Communication and Arbitration
Transferring data along the lines of the segmented bus platform is split into two levels: the local, intra-segment and the external, inter-segment communication procedures.
3.1 Segment organization. We think the segmented bus as having a single central arbitration unit (CA) and several local arbitration units (SA). one for each segment. We employed a simple round robin arbitration policy [SI at the level of SAS, and similar one at the CA level.
Further. the development of our synchronous system platform is based on a store-and-Jonwrd communication policy. At each segment border, one has to make use of synchronizers in order to adapt to the different clock domains. Additional synchronizers are used in the communication with the CA. There are six signals that need synchronization: four of them come from the CA:
Op, Dir, IS, TS; signaling that the FIFO is full comes through the internal border flag FF, or the clock selection signal, selc, for a right-to-left, or a left-to-right data move, respectively. Each of the SAS controls one of the logic blocks that deliver the operating clock signal to the segment border elements. The exceptions to this rule are the extreme segments. We have chosen to assign each border to the adjacent left segment, for control purposes. An averagedetail representation of the segment structure is given in Fig. 2.   Fig. 2 . Segment components. 3.2 Arbitration. At the local level, the SA h& the responsibility to give access to the bus to the modules asking for a transfer. The SAS also ensure that, for both intra-and inter-segment, a specified limit size of the transfers is not passed. This limit is a common characteristic for the whole system. The modules present to the SA the request signal accompanied by the segment address of the target. The decision procedure starts by analyzing if the request concerns a destination within the same segment limits, or if an outside recipient is targeted, based on the value of the requested segment address. In the later case, the request is forwarded to the CA. However, only one request from each segment can be forwarded to the CA. While the SA awaits for the answer from the CA, the inter-segment requesting master is placed on hold, and any other master requesting access to outside segment resources is ignored. Most of the characteristics of the SA operation are captured in Fig. 3 from the requesting segment towards the receiver one. Actually, the CA only informs the segments that compose a vimal continuous connection, that some data will be present at their borders -by setting up the OP signal (operational), and the direction from where it will come ~ the Dir signal is high for a left-to-right connection and low for the reverse sense. The values of these signals help the logic that controls the FIFO-like corresponding line OP. If all these lines are low, they are placed on high, signaling to the segments that there will be an incoming request in the specified direction. At the same moment, based on the values of the initiating and the target segments, the CA also establishes the value for the Dir signal. The inherent need for synchronization with the arbiter is done at the hardware level. The initiating segment receives information on the acceptance of its request, through the signal IS (initiating segment). Afler the job currently taking place within the segment, the respective SA gives the highest priority to the master kept on hold, which is granted the ownership of the bus. On the other hand, also the terminal segment, from the point of view of the granted transfer, identifies its position in the communication chain from the value of the TS line (terminal segment). Another option would have been to let each of the segments on the path to decode the target address and, if it does not match their own, to let the data pass through. We considered that the presence of the TS signal significantly reduces the complexity ofthe control logic at the segment level.
Communication between Segment and Central
Arbiters. The fact that the segment and central arbitration units operate on different clock signals has an important impact on the communication performance, that is, on the speed at which requesting modules receive ownership of the bus. As we detail below, this influences the relative moment, with respect to the current activities in each segment, when an inter-segment transfer may actually start [7] . The internal granting activity of the CA is illustrated in Fig. 5 . For every requesting segment j, the CA first checks if the respective segment finished an intersegment transaction, signaled by OPF = '1 ' (through its synchronized version, SOPF). If this is the case, the corresponding OP line is released (Of' <= '07.
After establishing in this way which segments may be reserved for a new inter-segment transfer, the CA checks the new requests, browsing the corresponding vector indexed by j. For each of these requests, the CA ensures that, on the trajectory from the initiator towards the target, all the segments are ready for an inter-segment communication, by determining the value of the 
4.

Implementation Aspects
Our choice for implementing the described arbitration units is the ALTERA's NIOS processor, a facility offered by the ALTERA EXCALIBUR package. The code is written in the C language. In the following we will analyze the operation of both local and central arbiters for a setup containing three segments. Each of these is composed of a local SA (implemented as a single NIOS processor) and four processing devices. The CA runs also on a dedicated processor. As details of a local communication are similar to any other bus system, we focus on the analysis of a right-to-left data transfer. The initiating segment is segment 2, while the target one is segment 0.
-- Fig. 6 . Initiation of inter-segment transfer.
4.1 Local operations and initiation of an intersegment transfer. In Fig.6 . we depict the local operation on segment 2 (address "10"). Of interest to us is device 1, which receives from the segment SA the grant for operation on the line ack-r[l]. In a first phase, this unit operates locally, after which it changes the target address value to "OW, requiring thus the services of another device, placed in the segment 0. The corresponding request to the CA is forwarded by the local arbiter, while, visibly, segment units 2 and 3 obtain the right to perform within the segment borders. 43. Initiator segment operation. When the corresponding OP line is set, the initiating segment (in our analysis segment 2) was in the middle of transferring intra-segment data, driven by device 0. after this unit completed its operation, the OP signal is detected and it is matched with the device asking for inter-segment connection (device I), which obtains the ownership of the bus. The grant-le? signal goes high, informing the border logic that there is available data on the bus. After tilling the border buffer, device 1 lowers the request signal, followed by setting the OPF line high, by the SA. When this is noticed by the CA, the OP line is released, followed by a release of OPF, too. activities are carried on until the FF signal goes high, meaning that the right segment border FIFO is full. Hence, the multiplexer selection (busseo is set to 'Z', the selection line ( s e 1 0 goes high, thus delivering the own clock signal to the border logic; the right and left border FIFOs are given the right for transfer I receive (grant-right = 'I ; grant-le? = 'I 3. At the end of the transaction, the bus source selector goes low, the grants are removed and the segment corresponding OPF signal goes high; it is followed by the release of the OP line by the CA. In a similar manner, the target segment (segment 0) operates in order to receive the right-border data. The difference resides in the fact that the target is not another border FIFO, but an internal device.
5.
Discussion and Conclusions
We have presented arbitration issues addressing the realization of a synchronous segmented bus platform. We also described the protocol governing communication, from the arbitration point of view. The solution of implementing the arbiters as software modules targeted a simpler management of design and an increased modularity. Most importantly, this processor based approach also gives us the possibility to bring the platform closer to the application level, an important subject for further developments, which would bring us closer to the network on chip paradigm [3]. Compared to the hardware implementation [7] , however, the software arbiters offer a much lower operating speed the average period between two arbitration rounds, at the SA level, amounts to 270 clock cycles, depending on the relative numbering of the requesting devices within the segment. It thus becomes important what happens during this period of time, that is, when incoming requests from the CA, or borders arrive in between arbitration rounds.
One fast solution is to use interrupts; their appropriate utilization is also a subject of forthcoming studies. Fig. 9 . Middle segment operation.
