Ahtrwf-We approach, in this study, the realization of a w e n t e d -b u s platform, where the higher levd arbitration unit, deaIing with requests for inter-segment communication is removed from the system. Another commlmicaticm protocol, built as a token-passing seheme, provides an autonomous solutian. 
I. INTRODUCTION
Following the rapid technological evolution, the complexity becomes one of the most constmning aspects in the design of of modern digital systems. Power consumption and timing issues come along to add to the difEculties in realization of system-on-chip applications, where many IPS (Intellec- tual Property) such as microprocessor cores, memories, DSP processors and peripheral devices a~ placed together, on a single die. These modules communicate, most often, by means of a shared resource, the on-chip bus. The increasmg complexity of the individual devices, the increasing demand for higher bandwidth on tbe bus lines and an operating frequency hittug new limits w i t h almost every new design, pIaw the bus concept into the un-pleasant position of being the bottleneck of the communication system. In addition, the diversity of IP producers and their varymg techological approaches bring into the development Bow new aspects, like multiple clock domains.
One of the proposed solutions to this problem A hi& level view on a segmented bus archikclure is illustrated in Figure 1 , while Fig. 2 illustrates an architecnnal view of one segment.
A. COmmWiCQfim Arta!ysis
We start by exploring the current principles that govern the operation of the segmented-bus platform. Transferring data the OP flag, the logic contolling the segment border issues a request to the following segment in h e .
Each of the SAS controls one of the logic blocks jcomposed of one glitch protection device {I] and some additional logic) that deliver the operating clock signal to the segment border elements. The exception to this mle is either one of the extreme segments. We have chosen to assign each border to the adjacent left segment, for conbol purposes.
CENTRAL ARBITRATION
The communication between the CA and the segments of the system follows the protocol visualized in Fig. 3 .
While the flags signaling a forthcoming inter-segment transfer (OF' and the accompanying information regardug direction and the ends of the transfer) are set simultaneously for all the participating segments, their respective reset is reached in a cascaded manner Thus, after finishing the current local transfer, the initiating segment starts operating on the requested and granted inter-segment transfer. It fills the corresponding border buffer (as specified by the Drr flag) with the data to be transferred towards the destination segment. The termination of this activity is signalled to the CA by raising its own OF flag. Shortly, the CA will respond by resetting the respective OP line, keeprng the other ones, on the way to target, set.
Disadvantages of central arbitration. The centralized arbitration unit manifests itself as an omniscient director over all the segmented bus transfers. If the information upon the tasks to be done is complete and accurate, the decision about the optimum path is chosen through the arbiter algorithms.
Unfortunately, there'is not so much time to run complicate optimization algorithms, adapted to the immediate and concrete task. So, the major advantage of the cemalized arbitration is not so well exploited, due to the need of a quick time response for the decision. More, the links between CA and segments are constrained to a star topology, totally nonefficient to the bus technology that permit only a small length increase at each segment added.
But the major disadvantage of the CA is the complexity of decisions to be made simultaneously for the whole bus.
All segmented portions of the bus send information to be processed and need as SoOn as possible the convenient responses. The CA concentrate too much works to be done in too little time.
These are the reasons that conducted us to take in consideration another ways to administer the by segments. ' ' Iv. NEW ARCHITECTURE It comes naturally to consider that, if there is a strong correlation between the activities performed in the different segments composing a segmented-bus platform, the order in whch the corresponding requests are served at the CA level can be expressed as a [machine code) programme. The same applies to the local arbitration units, but this is not the focus of the present study.
If such a strong correlation is not characteristic for a specific application targeting also a segmented-bus implementatioq then we run into faimess 1 starvation problems at the level of the central arbitration unit. Cerhjn measures must be enforced, so that cach of the request should be guaranteed an eventual service. Another solution to this situation is described in the following paragraphs. It is inspired by the utilization of looped container virtual circuits [6] , used by the authors in order to implenient the guarantee bandwidth and latency for any possible virtual circuit (transactions between a source and a destination for the package), in the context of the Nosbum network-on-chip system.
A. Container Based Appraoch
Suppose that we consider a virtual container, of the size of the bus package, traveliq to and forth on the bus lines, through the segment borders, from one end of the bus to the other. The container is accompanied in its excursions by information concerning the status of the container, that is, empty of full, its presence at the border location (here) and the direction in which it travels. Conveniently, these three information items can be coded on a two-bit additional bus. The presence of this virtual container at either of the segment borders is avarlable for reading to every SA. If a given (internal segment) SA identified the presence of the confainer i n one of its delimiting borders, and the SA has a request targeting a module In the same direction as the container move, it will "hold" the container in the respective location until it is ready to fill it with the transaction data, if the SA also identifies that the container is empty. The operation proceeds in the usual manner. Upon completion, the data and the status of the container i s forwaded to the next segment border. The following SA, notices the presence of the container, and, whenever possible (after finishing the possible local activity), it will also forward the package on the corresponding direction. When arriving at the target specified by the initiatmg segment, the container will be un-loaded, after which the corresponding SA sends it further, with the status changed ("empty").
Whenever the container+reached one end of the bus, the direction is changed. Each segment, through the contained SA, is responsible of either one of its neighboring borders, depending on the direction of the container.
The top level view on the platform is represented in Fig. 4 . The resulting impact on the segment architecture is visualized in Fig. 5 . 
B. Analysis
The container cycling between the limits of the segmented bus actually resembles, in a virtual manner, a perfomanceoriented round robin algorithm, as it could have been described in the CA: the chance of transferring data to external modules is offered, in a sequential manner, to aI1 the segments, but, the first to be able to carry on such an offer will be served. Improvement towards fairness is, however, reached, because once the target of an inter-segment transfer has been reached, and data has been unloaded, the target itself may win access to the bus lines, if appropriate requests are on (that is, requests for initializing a transfer in the m e direction as the one specified by the info field of the container).
In order to take advantage of such possibilities, the arbitration policies at the SA-level should allow the preservation of two inter-segment requests, on for each direetion.
One can generalize the container approach to chains of consecutive bus segments useful for the contenders to bus re-
source. An initialization process design to use seyeral segment bus chains. Each of them is the path base for a virtual circuits VC, with a virtual circuit number VCN, so, when a transfer is requested over one segment bus chain, specified VCN is associated with the information. Only few bits are needed to specify VCNs, as the whole bus is usually not so fragmented, and the advantages of parallel transfer decreases with n. (The most advantage -100% -is at 2 fragments, addingthe third takes 50% more performance, the forth only 33%, and so on, at a rate of 1/(1+ n)).
In theVC approach, the SA has means to locally store the idormation for a maximum number of VC's at each input. The direction is not necessary to store, being linked with the associated input. Only the fact that the VC continue or returns requests only one bit per VC.
The containers are rolled continuously over VC's. Race conditions, a3 Temporally Disjoint Networks (TDN) and buffer stages, presented in [6] , p m i t containers not to interfere each with others, so no stamtion appears and time delay is guaranteed. When dormation is needed to transport, the source chooses a convenient VC and waits the container arrival.
The information is 'loaded" and the destination is attained following the direction and a mean of addressing. A convenient way is to add a hop number, decreased at each transfer between segments. The destination segment "download" the container upon null hop number d and put the container status line at "empty".
In the Virtual Circuit approach, transfers on the segmented bus are flexible, with maximum parallelism exploit. The minimum hardware added to each SA is justified by the simplicity of the ahitration mechanlsm and the efficiency to conk01 information transfer, In the present set-up, we cannot, yet, establish a guarantee regirdmg the moment when a request may be served. Howmer, mce a request is possible to be served by the passing container, the arrivai time is still established as analyzed in [SI.
Also, due to the fact that the length of a segmented bus is much shorter than the connection lengths that may be necessary to be established in a networked chip, the situation is not dramatic.
Moreover, on plafoms containing m m than three segments, the presence of multiple containers wouid ease the access to
