Abstract. This paper describes a method that allows the speed up of parallel processes in distributed arbitration schemes as used in Futurebus+. It is based on special arbitration codes that decrease the maximal arbitration time to a specified value. Such codes can be applied with few, if any, minor changes of the hardware. The general structure of these codes is given.
this time each processor decides whether it participates in the arbitration process or not.
The objective of the arbitration process is to find the processor of highest priority among all participating ones.
Arbitration is done in a decentralized way. Each processor has its own local arbitration circuit which is connected to m common arbitration lines. On these lines the logical OR function is formed of the corresponding outputs of the local arbitration circuits. In Fig. 1 this circuit is shown with outputs in positive logic corresponding to expressions (1) . In a real implementation negative logic would be used, i.e., the outputs would be inverted so that the OR function could be realized as a wired-OR on opencollector lines. It is important to distinguish between the state of an arbitration line and the state of the corresponding output of an arbitration circuit. Such an output can be changed by a single processor by assertions or withdrawals of bits of its arbitration word. This may or may not influence the state of the arbitration line.
Usually the arbitration circuits are realized as combinatorial logic and the OR functions are computed as a wired-OR using open-collector drivers. The output signals depend on the word assigned to the processor and on the signals currently formed on the arbitration lines.
Any change of the output signals of a processor -assertions as well as withdrawals -will be called a switch of the processor. If a processor is not participating in the process it does not assert its output signals, i.e., keeps them in the "zero" state. When the arbitration process starts, all participating processors assert all bits of their words onto the arbitration lines. Simultaneously they compare their own bit values with the current state of the corresponding lines. If a line is set to a logic "one" and the corresponding bit of the processor´s word is zero, the processor switches. This means it determines the most significant bit that is not equal to the state of the corresponding arbitration line and withdraws this and all its less significant bits from the lines. If the condition is no longer true the processor switches back to the previous state.
Let all processors have the same speed. Then the arbitration process for the j-th participating processor can be described by the following expressions 1 . b 1,j (t+τ)=s(t)a 1,j , b 2,j (t+τ)=s(t)a 2,j (a 1,j ∨¬Β 1 (t)), b 3,j (t+τ)=s(t)a 3,j (a 1,j ∨¬Β 1 (t))(a 2,j ∨¬Β 2 (t)), 1 If the local arbitration circuits of the processors taking part in the arbitration process have different delays τ for switching their signals on the arbitartion lines it is necessary to take into account results from [6] to minimize the maximal arbitration time.
... (1) b m,j (t+τ)=s(t)a m,j ∏ r=1,...,m-1 (a r,j ∨¬Β r (t)),
where j∈P, P is the set of processors participating in the process, t is the current time (-∞<t<∞), a i,j is bit i of the word which is assigned to processor j, (a i,j ∈{0,1}, a 1,j is the most significant bit), Duration of the arbitration process. The arbitration process is a sequence of switches. From expressions (1) it follows that every processor can execute at most m switches in one arbitration process. The following example shows that m switches can indeed occur. Let m=4, let three processors participate in the process, and let them have the words ‹1010›, ‹1001›, and ‹0111›. The processors need a first switch to assert these numbers onto the arbitration lines. After the first switch, the lines are in the state ‹1111›.
The second switch is needed to withdraw the third bit of word ‹1010›, the forth bit of word ‹1001›, and the last three bits of word ‹0111›. The arbitration lines will therefore be in the state ‹1000› after the second switch. Now the first and the second processors assert their last three bits again. Thus, after the third switch, the arbitration lines will be in the state ‹1011›. Finally, the withdrawal of the last bit by the second processor represents the fourth switch. The word ‹1010› will appear on the arbitration lines and will be stable. By comparing this word with its own arbitration word each processor decides now if it has the maximal priority.
The duration of the arbitration process is proportional to the number of switches executed. Since we consider a situation such as Futurebus+, neither the delays of other modules nor the modules participating in the process nor their arbitration words are known. Because the duration of the process cannot be determined dynamically as a function of the participants and their delays, every module has always to assume that the maximal number of switches will occur and that they will be executed by the slowest modules, i.e., the worst case. We, therefore, can assume without loss of generality that the delays are identical. Thus to get a correct result of the process for any combination of m-bit arbitration words, the participating processors have to wait the time m·τ before they make their decision. However, the maximal number of switches can be considerably smaller than m as shown below.
Denote by |K| the number of words in code K, i.e., its capacity. 1. If a code of width m has a capacity |K|<2 m , it is possible to reduce the maximal number of switches that can occur if any combination of words from K will participate in the process. Consider, e.g., a code of width 4 and of capacity 15. If the words ‹1010› and ‹1001› are included in the code K and both participate in the process, in the worst case it will require four switches as shown above. However, if we exclude from K either ‹1010› (the only word possibly requiring four switches) or ‹1001› (the only word creating this kind of conflict with ‹1010›), then the arbitration process will require at most three switches for all combinations of the 15 remaining 4-bit arbitration words. In this case a speed-up of 25% is obtained. 
The process starts at t=0. For t<0 the arbitration lines are set to zero. Thus, the arbitration process of the single word 0 (m) does not require any switches. Any other word generates at least one switch during which its non-zero bit values are asserted onto the arbitration lines.
Lemma 2. The code Ex 1 m has the capacity (m+1) and is constructed by any bit permutation applied to all words of
Proof. Let a 1 =‹a 1,1 ...a m,1 ›, a 2 =‹a 1,2 ...a m,2 ›, with a 1 >a 2 , be two words of a code Ex 1 m . If a i,1 =0 (i∈ [1,m] ) then a i,2 =0 also has to be true. Otherwise, the arbitration process of {a 1 ,a 2 } could require two switches, the assertion of a 1 and a 2 and at least the withdrawal of the bit a i,2 =1. This is impossible for a code of a depth equal to 1. Therefore, w a1 >w a2 if w a1 and w a2 are the number of non-zero bits in the words a 1 and a 2 . Let us arrange all words of the code in increasing order and prove that their number s is equal to m+1. If w j is the number of non-zero bits in the word a j , then the expression 0≤w a1 <w a2 <...<w as ≤m is valid. Obviously, the number of non-zero bits in the word a j is (j-1) and the maximal possible value of s is m+1.
These conditions are only satisfied for the codes Ex 1 m defined in Lemma 2. For them the signals B i (t) are settled after the first assertion of the maximal word, i.e., after the first switch. Fig. 2 shows examples of codes Ex 1 4 with the corresponding permutation of the words defined by (3). 
The code X (m) has the maximal possible capacity according to its definition and can cause a sequence of m switches but not more, as it was shown in [3, 4] .
Theorem. The capacity of any extremal code Ex n m is given by the equation |Ex n m |=Σ i=0,...,n C i m , m≥1, m≥n≥0
where C i m are the binomial coefficients.
Proof. Equality (5) is obviously true for the cases n=0,1,m that have been considered in the previous Lemmas. We will prove (5) for m>n>1 by a double induction with parameters n and m. The hypothesis is that (5) is true for the codes Ex n m-1 and Ex n-2 n-1 , Ex n-2 n ,..., Ex n-2 m-2 .
From this we will derive that equation (5) ... Basis. Assume that the following equations are true.
|Ex n-2 k-1 |=Σ i=0,...,n-2 C i k-1 , k=n,(n+1),...,(m-1).
Induction. We intend to prove (5) for the code Ex n m for m>n>1 assuming the truth of the basis. First we construct a certain code R n m and consider its depth and capacity. The code R n m =∪ j=n-1,...,m E j (
is the union of the following subcodes (Fig. 3a) E n-1 ={1 (m-n+1) *X (n-1) }, E k ={1 (m-k) *0*Ex n-2 k-1 }, k=n,(n+1),...,(m-1), E m ={0*Ex n m-1 }.
Obviously E p ∩E q =∅ for any pair of subcodes (p,q∈[n-1,m]). Thus, the capacity of the code R n m is the sum of the capacities of all subcodes |R n m |=|E n-1 |+|E n |+|E n+1 |+…+|E m-1 |+|E m |= =|X (n-1) |+|Ex n-2 n-1 |+|Ex n-2 n |+…+|Ex n-2 m-2 |+|Ex n m-1 |.
Taking into account the expressions (6) and (7), assumed to be true as the basis, we can derive (see Appendix)
|R n m |=2 n-1 +Σ i=0,...,n C i m-1 +Σ k=n,...,m-1 Σ i=0,...,n-2 C i k-1 =Σ i=0,...,n C i m . (9) Below it will be shown that the depth of R n m is n. All words of E n-1 , E k , and E m consist of two parts, an identical leading part, and a trailing part which is different for every word (Fig. 3a) . Thus if only words of one of the subcodes E n-1 , E k , or E m participate in the arbitration, switches happen only in the second part. Therefore, the process takes no more than n-1, n-2, or n switches correspondingly.
Below we show that if words from different subcodes participate in the process then it cannot have more than n switches. Consider the case when words from E m and E m-1 are participating in an arbitration process simultaneously. The words from E m-1 are headed by the bits ‹1›, and the words of E m are headed by ‹0›. Thus, after the first switch the logical OR of the words of E m and E m-1 appears on the bus. After the second switch all words of E m will be withdrawn and only the words of E m-1 are left. In the worst case additional n-2 switches are required to determine the maximum word among them. The same argument applies to any other combination of subcodes from R n m which proves that the depth of R n m is n.
The code R n m has depth n, width m, and the capacity defined by (5) . Let us show that there exists no other code of the same depth and width which has a capacity greater than |R n m |, i.e., R n m is the extremal code. So, no code of depth n and width m can have a higher capacity than determined by
We proved that the capacity of the extremal codes is determined by expression (5) for all m and n. Some values are given in Tab. 1. In [4] it was shown that the capacity of the binomial codes of width m and depth n is also determined by the same expression.
Therefore, the binomial codes are extremal.
Construction of the Extremal Codes.
During the proof of (5) it has been shown that any code R n m constructed in accordance with (8) has the depth n and the maximum capacity. It also has been proven that any code different from R n m has a lower capacity.
In view of this, expression (8) represents the general structure of the extremal codes for any n (n≠0,1,m). We can, therefore, rewrite (8) as
Ex n m ={0*Ex n m-1 }∪{1 (m-n+1) *X (n-1) }∪ i=1,...,m-n {1 (i) *0*Ex n-2 m-i-1 }, n≠0,1,m. (11) Using expressions (2)- (4) and (11) Here the maximal number of processors is the width of the arbitration words plus 1.
Thus, the highest speed-up is obtained with a relatively small number of processors.
Nowadays this is the most relevant case. Consider, e.g., a Futurebus+ system with nine processors. Since Futurebus+ uses a priority word of width m=8, it is possible to achieve a speed-up s=8 by applying the extremal code of depth n=1. For a higher number of processors the achievable speed-up decreases because codes Ex n m of increasing depth are required to assign arbitration words to all processors.
Tab. 3 shows the achievable speed-up for a system where all arbitration words are used. In this case the speed-up is obtained by providing additional arbitration lines. This allows to apply extremal codes of higher width and smaller depth. Note that the arbitration time can be reduced by a factor of ≈2 by adding a single line only [4] . Higher speedups can only be obtained using additional arbitration lines.
Note that after the maximal word is finally asserted onto the arbitration lines switches in the local arbitration circuits of processors can still occur. Consider, e.g., an arbitration between the two words {‹1111›,‹1101›}. The arbitration process takes only one switch to assert the word ‹1111›. After this moment the processor having the word ‹1101› with-draws its least significant non-zero bit. Depending on the technology used this withdrawal might or might not influence the arbitration line which is already set to a logic "one". If, e.g., open collector lines are used to implement wired OR logic, the withdrawal of a signal from a line in the logic "one" state may result in a "zero" glitch [5] . It is possible to avoid this glitch by using the code
as the Ex 1 m code. In this case the code Ex n m is uniquely defined for any m and n. For even n it coincides with the binomial code.
Conclusions. In this paper we derived the structure of codes which reduce the maximal arbitration time to a defined limit. We obtained a recursive formula that allows one to construct codes of highest capacity for all possible numbers of switches. No higher speed-up can be achieved by using any other code. It was shown that, in some cases, a considerable number of extremal codes exist. Binomial codes are only one representative of these codes. The recursive formula (11) that gives us the general structure of extremal codes is quite complicated. However, one has to use it only once when the arbitration system is set up. For most arbitration systems the codes can be applied directly without changes of the hardware.
APPENDIX.
Here we present the detailed derivation of (9). The properties
are used below. For any m and n (m>n>1) we can derive 
