remaining steps assume no type-I or type-2 degeneracies are in the data set. The benign degeneracy signaled by Degen-type-3 set true can in fact be ignored and is dealt with below.
IV. CONCLUSION
We have presented an optimal algorithm for computing illumination regions and associated illuminated vertex sets for one convex polygon P contained in another convex polygon K. This 0(n + m) algorithm can be used to replace the less efficient construction procedure I in [1] for deciding simplex coverability of similarly situated convex polygons. After the set of illumination regions and illuminated vertex sets have been obtained using the optimal illumination region algorithm presented here, one can solve the simplex coverability problem by applying construction procedure 2 in Silio [ 1] . Note that line 3 in step 2 of-construction procedure 2 in [11 contains a typographical error and incorrectly reads: "while NEXTI 5£ false and NEXT2 $ false, do." This line should be corrected to read "while N EXT 1 5 false or N EXT2 % false, do"; thereby replacing the "and" with "or." With this correction in place construction procedure 2 in [1 ] proceeds in linear time to produce a collection of what are called coverable candidate sets of vertices of P with associated sets of illumination region combinations to be used as constraint sets. To determine the existence of a covering triangle, one solves systems of second order equations, also in linear time, subject to the finite list of constraint sets provided by construction procedure 2 in [ 1] . Hence by using the optimal illumination region algorithm presented here, coupled with construction procedure 2 in [11, an optimal simplex coverability algorithm for convex polygons results.
When applying these optimal algorithms to stochastic sequential machine covering problems, the input vertices may be randomly related sets of points; in which case, one must first apply an appropriate convex hull algorithm (e.g., [9] , [10] ) to find the hulls in time O(n log n + m log m). One can then apply the 0(n + m) procedure described here first to compute and name illumination regions and then apply the O(n + m) algorithm from [1] that uses the illumination regions to generate and solve constrained simultaneous equations to determine simplex coverability.
[10] R. Graham, "An efficient algorithm for determining the convex hull of a finite planar set," Inform. Proc or the multiple-bus. We consider a system with N processors and N memory modules, in which the processor requests to the memory modules are independent and uniformly distributed random variables. We consider two cases: in the first the processor makes another request immediately after a memoryservice, and in the second there is some internal processing time.
The results of simulations show that the multiple-bus interconnection network with a number of buses slightly higher than N/2 produces a very small degradation with respect to the crossbar.
In addition, we propose an organization with partial buses that is more economical than the multiple-bus for the same effective bandwidth.
Index Terms-Bus arbitration, memory bandwidth, multiple buses, multiprocessors, shared memory.
I. INTRODUCTION
In many multiprocessor systems the shared memory is divided into independent modules so that each of these modules permits one access per cycle. The processors are connected to the memories by means of an interconnection network (Fig. 1) .
Two types of operation of the system have been proposed: SIMD, in which all processors execute the same instruction on different data, and MIMD in which each processor executed a different instruction [1] . In this paper we are interested in the MIMD case.
Several interconnection networks have been proposed for these systems, such as the crossbar [2], single-bus [3] -[51, multiple-bus [6] , shuffle-exchange [7] , and others [8] , [9] . Of these, the crossbar provides the largest potential bandwidth because there are no conflicts in the network. Nevertheless, it has a high cost, which is prohibitive for a large number of processors [10] [13] . For N = 16 the value is obtained from the approximate model proposed in [15] , equation (14) . We conclude that with the indicated hypotheses the potential bandwidth of the crossbar is not fully utilized due to memory conflicts. As a consequence of the high cost of the crossbar for large N, its bad fault tolerance features and the degradation due to conflicts, it seems convenient to consider other alternative networks.
In this paper we study the performance of the multiple-bus interconnection network. The N processors are connected to the M memory modules by means of B K min (N,M) buses. For B < min (N,M) the network produces a degradation in the bandwidth with respect to the corssbar, due to conflicts that occur in the network when the number of requests to different memory modules is greater than B. We are interested in evaluating this degradation as a function of B and N.
To evaluate this degradation we could use the exact or approximate models proposed in [6] . Nevertheless, the hypotheses used there are restrictive (exponential request and service times), and they are difficult to evaluate for the cases in which the number of processors, buses, and/or memory modules is large. For this reason, we use simulations which are sufficient to validate our conclusions. The results obtained indicate that for B N/2 (for N = M), the degradation with respect to the crossbar is approximately 5 percent.
We also performed simulations for the case in which the processor does not issue a new request as soon as the previous one is serviced. This represents the case in which there is some internal processing. Following our hypothesis that the processors are synchronized, and using the validation performed in [15] , we assume that in each cycle a processor issues a request with a fixedprobability p. As will be seen from the simulation results, for p = 0.5, the crossbar is very underutilized and therefore for the same degradation (of 5 percent with respect to the crossbar) fewer buses than for p = 1 are required.
As is discussed in the following sections, the multiple-bus network is less expensive than the crossbar. Nevertheless, for large N and B, its cost is still important due to arbitration time and complexity, and to the capacitive loads and drive requirements. To reduce further the network cost, we propose a network with partial buses, discussed in Section IV, in which the arbitration is simplified and the drive requirements reduced. This network produces a lower bandwidth than the multiple-bus, for the same number of buses. The bandwidth is determined by simulation, and we conclude that there are configurations with partial buses which produce roughly the same bandwidth as the multiple bus at a lower cost.
1I. MULTIPLE-BUS ORGANIZATION AND COMPARISON WITH THE CROSSBAR
As was indicated in the introduction, we are interested in evaluating the performance of the multiple-bus interconnection network. In this case, the N processors and M memory modules are connected through B buses. Each processor is connected to all buses and each bus to all memory modules, so that a processor can access any memory module through any of the buses (Fig. 2) .
The number of connections of the multiple-bus network is proportional to B(N + M). The number of wires is proportional,to B and each of the buses supports a capacitive load proportional to M + N + K(M + N -1). The value of K is dependent on the technology, and for present day tristate circuits, is much smaller than one; therefore, the capacitive load is proportional to M + N. The multiple-bus network requires an arbiter to assign the buses to the outstanding requests. To assign the buses to the memory modules, an M-users B-servers arbiter is needed. This arbiter selects min (B,J) of the J memory modules with at least one outstanding request. Once a bus is granted to access a memory module, only one of the processors that demanded that memory must be chosen.This choice is implemented by an N-users1-server type arbiter, since there are N demand inputs (each associated with a processor), and only one The multiple-bus interconnection network is fault-tolerant because it can operate in a degraded mode after the failure of a subset of the buses.
The above mentioned characteristics are compared with those for the crossbar in Table II . The arbitration is simpler for the crossbar, since it only needs M N-users 1-server type arbiters, each of them controlling the access to a memory module. Also, the time such arbiter requires to decide which processors will access memory is less than the time the arbiter needs for the multiple-bus structure [17] . The crossbar is less fault-tolerant than the multiple-bus structure because a failure in one of the M buses disconnects completely one memory module. III e) The propagation delays and arbitration times associated with the interconnection network are not included explicitly but may be thought of as forming part of the memory cycle. f) In each cycle, the buses are assigned cyclically to the memory modules that have at least one outstanding request. For a module that receives a bus, a processor is selected at random from those with outstanding requests for that module.
With these hypotheses we performed a simulation using the technique of multiple independent repetitions. The relation defined by Lavenberg [23] To further validate these simulations, we compared them with known results. For B = N (corresponding to the crossbar), the simulation results coincide with those determined using an exact mathematical model in [ 13] for N = 4 and 8, and with the value obtained using equation (8) in [ 16] for N = 16 .
In view of the results of Table III 
IV. MULTIPLE Bus ORGANIZATION WITH PARTIAL BUSES
The results obtained in the previous section show that the multiple-bus organization might be an attractive alternative to the crossbar. Nevertheless, the former structure still might be too costly for large N in some applications, due to the arbitration and drive requirements. This has led us to consider another network which is based on the multiple-bus, but has a lower cost.
This network consists also of B buses to connect the N processors to the M memory modules. Each of these buses is connected to all N processors, but only to a subset of Mlg memory modules. That is, the memory modules are divided into g groups, and in each group, all memory modules are connected to the same Blg buses. Fig. 3 shows the case in which g = 2. In this type of network the number of connections is B(N + M/g) and the load of each bus is proportional to N + M/g.
As can be seen, the number of connections and the loads are reduced with respect to the multiple-bus. Also, the partial-bus network requires g arbiters, but these arbiters are less complex and faster than those for the multiple-bus, because the arbitration time is a function of the number of buses and memory modules connected to each bus.
As was discussed in Section III, in the multiple-bus structure with B buses, the maximum possible bandwidth is not Tables III and IV shows the additional degradation produced by the network with partial buses with respect to the multiple-bus case.
Moreover, for the values N considered, the configuration with (N/4 + 1, N/4 + I) partial buses produces a higher bandwidth than the one produced by the multiple-bus structure with N/2 + I buses. As discussed before, the network cost and arbitration time are lower in the first case.
Of In Table V we present simulation results for the multiple-bus networkforN=4,8, 12and 16withp =0.5andp = I (forcomparison). As might be expected, when p = 0.5 the bandwidth obtained for the crossbar (values for B = N in Table V) is less than for p = 1, and therefore, the crossbar is even more underutilized. Moreover, the degradation of the multiple-bus with respect to the crossbar is smaller for p = 0.5 than for p = 1. For example, for the case N = 16 and B = 8, forp = 1, a degradation of 17.8 percent is obtained, while forp = 0.5 the degradation is only 2.1 percent. Therefore, the multiple-bus configuration is even more attractive for p < 1.
In Table VI we present simulation results obtained for the partial-bus organization. Again, it is evident that for p < 1 the degradation with (N/2, N/2) buses is less than that for p = 1. In Fig. 4 Similarly, Fig. 5 shows the values obtained when p = 0.5.
VI. CONCLUSIONS In this paper -we have compared the effective bandwidth of multiprocessors with shared memory using crossbar and multiple-bus interconnection networks. This work is motivated by the high cost and low fault tolerance of the crossbar.
We have assumed that the processors are synchronized, that the memory modules cycle time is constant, and that the processor requests are independent and uniformly distributed random variables. We conclude that with a number of buses slightly larger than N/2, the effective bandwidth of the multiple-bus organization is less than 5 percent smaller than that produced by the crossbar. In practical realizations that satisfy the model hypotheses, it would be better to use the multiple-bus structure because of its better cost and reliability characteristics.
For large N the cost of the multiple-bus structure can still be large, as a result of the number of connections, the loads and the complexity of the arbitration hardware. This finding has led us to propose the partial bus organization, which results in a similar bandwidth than the multiple-bus structure and at a lower cost.
Finally, we have simulated the case in which the processor requests memory with probability p = 0.5. This represents the case in which all clock cycles are not devoted to memory accesses. The results indicate that the crossbar is even more underutilized than for the case in which p = 1 and, therefore, that the multiple-bus and partial-bus organizations are even more attractive because the 5 percent performance degradation is obtained for a smaller number of buses.
