Abstract: The MASC model is a multi-SIMD model that uses control parallelism to coordinate the interaction of data parallel threads. It supports a generalized associative style ofparallel computation. The power of this model has been compared to that ofpriority CRCW PRAM and enhanced meshes. In this paper, we present the work on simulations between MASC and reconfigurable bus-based models, in particular, different versions of the Reconfigurable Multiple Bus Machine (RMBM). It is shown that MASC and the Basic RMBM (B-RMBM) can simulate each other in constant time if the number ofbuses on the B-RMBM is 0(6) where j is the number ofMAVSC instruction streams. Thus, when these two models satisfy the preceding condition, they have the same power.
Introduction
The MASC (for Multiple Associative Computing) model is a multi-SIMD model that uses control parallelism to coordinate the interaction of data parallel threads. The MASC model extends the concept of associative computing and provides a complete paradigm to support general parallel computation. Motivated from the STARAN computer built in 1970s by Goodyear Aerospace, the MASC model was proposed by Potter and Baker in 1994 [14] and has been studied at Kent State University as a practical model for years. Since the MASC model is a strictly synchronous model that supports SIMD computation, it is sometimes called a multi-SIMD or MSIMD model (i.e., a SIMD enhanced with multiple instruction streams).
In contrast to a number of other parallel models, the MASC model possesses certain constant time global properties such as constant time broadcasting, constant time global reduction operations, and constant time associative search, which have all been justified in [7] . These properties have made MASC able not only to solve general parallel processing problems effectively [2, 7, 13, 17] but also to solve some problems in special areas such as real-time air traffic control in an extremely efficient manner [9, 10] . Possible techniques for implementing the MASC model have been explored in [5, 6, 19, 21] .
The power of a computational model is indicated both by the efficiency of algorithms it supports and by the efficiency with which it can simulate other computational models. In order to evaluate the power of the MASC model, simulation and comparison with other popular parallel models such as PRAM (Parallel Random Access Machine) and MMB (Mesh with Multiple Buses) have been studied previously. Constant time simulation of PRAM with MASC and constant time simulations between MMB with MASC have both been established [16, 3] . In this paper, we present our work on simulating MASC and reconfigurable bus-based models. We 
Related Parallel Models
In this section, we provide a short summary of the parallel models used in this paper, i.e., the MASC model, the RMBM model, and the RM model.
The MASC model
The MASC model is a multi-SIMD model that has been enhanced with associative properties. As shown in Figure 1 Figure 1 The MASC model AND of binary values, the maximum and minimum of integer or real values, and the associative search by a given search pattern. Following an associative search, an IS can select (or "pick one") an arbitrary cell from its active cells that have data items matching the search pattern in constant time. An IS can instruct the selected cell to place a value on the bus and all other cells in its set receive this value in constant time. The feasibility of these assumptions has been justified in [11] and more details can be found there.
The number of ISs is assumed to be considerably smaller than the number of PEs. ISs are coordinated using control parallelism and communicate using the IS network, which may be implemented by a bus, shared memory, or other network. Further information about techniques for coordinating multiple ISs can be found in [5, 6] .
A standard associative language that supports the one IS version of MASC or MASC(n, 1), also called ASC, has been implemented across a number of platforms. It provides true portability for parallel algorithms [13] . An extension of the ASC compiler is used in [6] to automatically execute the branches of an ASC program simultaneously using multiple ISs.
The MASC model is able to support general algorithm development and analysis. A wide range of different type of ASC algorithms and several large programs has been implemented using the ASC language. These include graph algorithms, computational geometry algorithms, string matching, image processing, database management, compiler optimization, and a real-time air traffic control system. Examples are given in [2, 7, 9, 10, 13, 17] .
Simulations of PRAM and enhanced meshes using MASC have been studied in [16, 3] . These simulations allow PRAM algorithms and MMB algorithms to be executed by MASC using the same number of processors. As in [16] , let PRAM(n, m) denote a PRAM with n processors and m shared memory. MASC(n, j) without a cell network can simulate priority CRCW PRAM(n, 0(j)) in 0(1) time with high probability. The "high probability" indicates an average running time. In [3] , MASC(n, j) with a 2-D mesh can simulate a \/ x\/ MMB in O(\F Ij) time. When] = Q(\ ), the simulation takes 0(1) time. Simulation of MASC(n, j) with a AF x\F MMB takes O(/nlI6) time. It is shown that MASC(n, 1) with a 2-D mesh is more powerful than a An x\n mesh with a global bus, and that MASC(n, j) with a 2-D mesh is also more powerful than a An x\n MMB whenj = Q(Tn).
Reconfigurable Mesh (RM)
A 2-D RM (Reconfigurable Mesh) is a basic mesh model is enhanced by reconfigurable buses. Each processor has four ports, referred to as N, S, E, and W, that can be controlled locally such that disjoint buses can be established dynamically. With the ability of reconfiguring bus connections during algorithm execution, RM can create different communication patterns based on the algorithm needs.
Local connections of a processor can be restricted in different ways to obtain variants of RM. For example, if a processor is allowed to set at most one pair of the ports {EW} or {NS}, the restricted RM is called a BRM (Basic Reconfigurable Mesh). If all processors connect their ports as {EW} and {NS}, the restricted RM is essentially a MMB.
The RM model has been widely accepted as an extremely powerful model. A number of constant time algorithms have been discovered for the this model [1, 4, 11, 12] (assuming a constant time bus broadcast), while they require non-constant time on other models [1, 12, 20] . Due to the wide acceptance of RM, we wish to establish a relationship between MASC and RM. This relationship could be a useful tool in evaluating the power of the MASC model. However, dissimilarities of the MASC model and a general RM create some difficulties to hinder a direct simulation of the two models. Instead, we consider a bridge model -the RMBM model -which has been shown to be equally powerful to RM. In this paper, we will simulate the MASC model and the RMBM model, aiming to establishing a relationship between MASC and RM. The RMBM model is introduced next.
Reconfigurable Multiple Bus Machine (RMBM)
The Reconfigurable Multiple Bus Machine (RMBM) is also reconfigurable bus-based model, proposed by Trahan et al. in [15, 18] . As in Figure 2 cited from [15, 18] ), the RMBM model consists of a set of processors and a set of buses which are used for processor communication. A processor can connect itself to a bus through the use of its local settings. A bus can be split into segments or fused to connect to another bus by a processor. Each processor has a number of packs of switches with each pack dedicated to one bus. There are up to five switches in each pack of switches. Two switches control the connection of the bus to the processor's reading port and writing port, respectively. Two Depending on switches available in the processors, there are four versions of RMBM defined in [15, 18] : B-RMBM: a processor only has switches to connect reading and writing ports to a bus. S-RMBM: In addition to read/write switches, a processor can segment the bus. F-RMBM: In addition to read/write switches, a processor can fuse buses using its fuse line. E-RMBM: All five switches are functional for a processor; namely, a processor can connect to a bus for read/write, split the bus into two segments, and fuse the bus to a separate fuse line. This is the strongest version of RMBM.
Concurrent read (CR) on a RMBM bus is assumed in this paper. Although either exclusive write (EW) or concurrent write (CW) on a bus can be assumed, we only consider CW in our simulations. The reason is as follows. When there are multiple processors that wish to write on a bus, there is a predefined rule (e.g., common, priority, or arbitrary) to resolve the collision. One processor will win the competition and is allowed to write on the bus. This takes constant time on a CW bus. MASC can also handle CRCW-type operations. When an IS broadcasts a value to its PEs, all the PEs receive the value in one step. When an IS needs to obtain a value from multiple PEs (i.e., a CW situation), the IS uses an associative operation (e.g., global OR, global MAXIMUM, or PickOne) to determine which PE wins, which also takes constant time.
In [15, 18] , the relationships among different versions of RMBM The following three sections present simulations between the MASC model and different versions of the RMBM model. We start with the weakest version -B-RMBM.
MASC vs. B-RMBM
Simulating MASC with B-RMBM is fairly straightforward, as there are some similarities between the structures of the two models. Let MASC(n,j) denote a MASC with n PEs andj ISs.
Let B-RMBM(n, m) denote a B-RMBM with n processors and m buses. We use B-RMBM(n+j, j) to simulate MASC(n, j) in O(1) time. Let each of the first j processors on the B-RMBM simulate a unique MASC IS. The read/write ports of each of these processors are always used to connect to unique bus. The other n processors simulate the n MASC PEs. For each of these n processors, the bus that its reading and writing ports are connected to is determined by the IS that the corresponding MASC PE listens to. It is obvious that each MASC step takes 0(1) time on the B-RMBM. We have following theorem. Theorem 1. MASC(n,j) can be simulated by B-RMBM(n+j,j) in 0(1) time.
Next, we consider simulating B-RMBM using MASC. Assume a B-RMBM (n, m) with n processors and m buses and MASC(n, j) with n PEs and ] ISs. If j > m, let each MASC PE simulate one of n B-RMBM processors and each MASC IS simulate one of m buses. Depending on how a B-RMBM processor set the reading or writing port to a bus in a particular step, a MASC PE listens to the corresponding simulating MASC IS. The reading and writing ports can be differentiated by setting a read/write flag in the PE. This takes 0 (1) 
MASC vs. S-RMBM
Since S-RM\4BM is a B-RMBM enhanced by adding processors with the ability of splitting a bus into bus segments, the constant time simulation of MASC using B-RMBM can be obtained from the simulation of MASC using B-RMBM shown in section 3. Now, we consider simulation of S-RMBM using MASC. Assume a S-RMBM (n, m) with n processors and m bus and a MASC(nm, max(m,n)) with nm PEs and max(n,m) ISs. The nm PEs on the MASC are divided into m groups with n PEs in each group. Initially, only n PEs in the first group are activated and each PE listens to one of n ISs. For any non-bus-access operation, each of these n PEs is used to simulate an S-RMBM processor. Each step takes 0(1) time.
For a bus-access operation on an S-RMBM processor, the simulation proceeds as follows.
1) First, all PEs on the MASC are regrouped as follows. Let PEkn+i listen to ISi while k=O..m-l (see Figure 3(a) ). With this grouping, any data item to be written from PEi is duplicated to PEB+i, PE2n+i, ..., PE(m1,)n+i by an ISi broadcast. Since an S-RMBM processor is allowed to read from or write to at most one bus at any point of time and there are max(n, m) ISs, this takes 0(1) time.
2) Second, all PEs are regrouped with different IS assignments. The first n PEs listen to IS,; the second n PEs listen to IS2, ..., the last n PEs listen to ISm (see Figure 3( find two smallest index numbers of these active PEs, which determine the two ends of the first bus segment. Then, ISi activates PEs whose indices fall into these two ends and whose reading and/or writing ports are connected to bi. If there is any read or write operation, it takes 0(1) time to complete it. After then, ISi removes all these PEs from its PE set except the PE representing the second end of the bus segment. ISi iterates the above steps to process the second segment, the third segment, etc until all bus segments are processed on bi. Since the maximum number of segments on a bus is n/2, the worst case time is O(n). 
MASC vs. F-RMBM and E-RMBM
Since both F-RMBM and E-RMBM are more powerful models than B-RMBM and S-RMBM, obviously it takes O(1) time to simulate MASC (n, j) using either F-RMBM(n+j, j) or E-FMBM(n+j,j) without setting fuse switches nor segment switches.
To simulate F-RMBM(n, m) using MASC(n, m)) (assuming n > m), let each of the first n PEs on the MASC simulate an F-RMBM processor and each MASC IS simulate an F-RMBM bus. Each PE on MASC takes O(1) time to simulate a non-bus access operation.
For a bus-access operation, the simulation is less efficient. We present a brief description here due to the space limit. More details will be included in an expanded version of this paper.
On the MASC, an intuitive approach is as follows. Initially, if an F-RMBM processor sets its reading/writing port connected to bus bi (i = l.m), the simulating PE is assigned to listen to However, an F-RMBM processor may set its reading/writing port to a bus other than those buses it fuses. So a simulating IS may need to read the fuse information from one of its PEs. This fuse information represents the fuse switch settings of the simulated F-RMBM processor. The IS then passes this information to other simulating ISs to instruct their listening PEs for switching. In order to communicate among ISs, an IS network is needed for the MASC. Assume an array is defined in each PE to store the bus numbers that its simulated F-RMBM processor sets its fuse To simulate E-RMBM, we may divide the simulation into to two parts. One is simulating fuse lines as in F-RMBM, and one is simulating bus segments as in S-RMBM. We combine the simulation results from Section 4 and this section in the following Theorem 7. Theorem7. A E-FRMBM(n, m) can be simulated by MASC(nm, max(n, m)) with an IS network in O(nm routing(m)) time while routing(m) is the time to route a data item on the IS network. 6 . Extension on relationships among MASC, PRAM and RM In [15, 18] , relative powers of the PRAM, RMBM, and RM models have been well studied. Some results related to our work can be described as follow. Related models can be placed into two groups, i.e., Based on these relationships, we have the following observations. 1) Since MASC and B-RMBM have the same power when the number of B-RMBM buses is restricted as in Corollary 3, MASC has the same power as CRCW PRAM (with appropriate restrictions). 2) Since S-RMBM has the same power as B-RMBM, it has the same power as MASC (when the number of buses is appropriately restricted). Although our S-RMBM simulation of MASC in Section 4 takes non-constant time, a constant time simulation is possible. 3) It can be shown that S-RMBM has the same power as BRM (the proof is omitted). Therefore, a constant time BRM simulation using MASC is also possible. 4) Since MASC has the same power as B-RMBM, it is less powerful than RM, E-RMBM and F-RMBM. The main reason is that the limited number of ISs restricts the data movements.
Conclusion
In this paper, we have shown simulations between MASC and versions of RMBMreconfigurable bus-based models that can be as powerful as RM. By taking these simulations as a bridge, we have analyzed the power of the MASC model relative to RM. The power of the MASC model is comparable to that of CRCW PRAM but less than that of RM. For some special cases of RM with the restrictions, a constant time simulation can be obtained (e.g. MMB and possibly BRM).
Some problems still remain open. As seen in the paper, when we increase the number of PEs and the number of ISs, the simulation time can be reduced but it also degrades efficiency. Determining minimal cost simulation models (i.e., ones of minimal size) which achieve simulation time optimality is a problem that needs further work. Although the MASC model is less powerful than RM, finding a lower bound for the simulation time is another problem that we plan to study further.
