Abstract There has been a lot of researches to develop techniques that provide redundant paths, there by making Multistage Interconnection Networks(MINs) fault tolerant. So far, the redundant paths in MINs have been realized by adding additional hardware such as extra stages or duplicated data links. This paper presents a new MIN topology called Hierarchical MIN. The proposed MIN is constructed with 2.5N-4 switching elements, which are much fewer than that of the classical MINs. Even though there are fewer hardware than the classical MINs, the HMIN possesses the property of full access and also provides alternative paths for the fault tolerant. Furthermore, since there is the short cut in HMIN for the localized communication, it takes advantage of exploiting the locality of reference in multiprocessor systems. Its performance under varying degrees of localized communication is analysed and simulated.
Introduction
The shared memory model, in which all N processors access a common memory in constant time, has been widely used in parallel processing system [4, 7, 21, 23] .
Crossbar switches are too expensive to construct for large N. The Multistage Interconnection Networks(MINs) among many interconnection types have been used in most of MIMD and SIMD multiprocessor systems since they have good diameter of log  , scalability and employ simple distributed self-routing [7, 11, 13, [18] [19] [20] . The classical MINs -Omega [13, 21] additional hardware such as extra stages [1] [2] [3] 5, 8, 22] or multiple links [1, 10, 12, 14, 15, 24] . Also, the classical MINs have another problem in that they can not exploit the locality of reference.
Interprocessor communications are determined by the algorithms used and the allocation of tasks to processors.
It has been shown that optimum cluster size is application-dependent. If the communication probability distribution function for an application is available, the cluster size that will minimize interprocessor communication delay can be determined [13] . However the latency time between source and destination in classical MIN has to increase with log  , the number of stages in the network. Therefore, they couldn't exploit locality because all processors take the same amount of time. This loss of locality hurts its performance when compared to the n-cube [16] , and Hypertree [10] . So far, there has been little progress for exploiting the locality of reference and short cut path in the classical MINs.
In this paper, we propose a new topology MIN, 
Network Topology Description :
in each stage j, 0 ≤j<n-1, 
Dual Register Switch for Shortcut
To overcome the deficiency of message blockage and to provide a shortcut, we propose new SE model. The main difference from the conventional switch is that each input/output port has with dual registers marked by Up (U) and Low(L). The detail of the new switch design is illustrated in Fig. 4 . The assumption of sending a maximum of one packet from each switch output port during one clock cycle is still valid and therefore each SE input link can still receive a maximum of one packet during each clock cycle. 
Routing Strategy
The pair of source and destination can be classified by register if di=1. The proper connection is made using the ith digit of the destination tag at stage i. Although the shortcut path is busy, the packets are able to re-route using the paths of alternative upper link. For example, a routing tag for connection between source 0(0000) and destination 3(0011) in Fig. 6 , is R=111 as the short cut.
[ Fig. 6 ] Alternative paths for fault tolerant in 16×16 HMIN
PERFORMANCE EVALUATION
In this section, we discuss the performance of HMIN.
In the classical MINs the latency is fixed from any input-output connection. However, for HMIN the latency depends on the locality of the communication. Therefore, for the performance evaluations in HMIN, we have to define window w [5] .
For any input-output connection, there are n-1 windows as Fig 5. ii) Degree of locality : Let  be the degree of locality to refer the window w. 
Simulation Result
The simulations were conducted on the network to study various network parameters. The principal measures used to evaluate and compare the performance of packet switching networks are throughput and delay. We investigate the performance of the HMIN for the following distributions [16] .
[ Fig. 8 
] Expected bandwidth
Uniform distribution : In this distribution the probability of an input node i sending a message to an output node j is the same for all i and j.
Degree of locality : It is possible for an input node i to exchange messages more frequently with nodes in a particular area. A node sends messages to the nodes withins window w with some probability  and to nodes outside the window w with probability 1- . Fig. 9 shows the variation of the throughput for a degree of locality. As expected, we find that the throughput decreases with increase in window size. [ Fig. 9 ] Variation of throughput with window size and degree of locality And the HMIN has also alternative paths and the shortcut.
In the large scale multiprocessor system where the communication is localized within small pair-sets of processors and memory modules, the HMIN performs better than the classical MINs of the same size.
