

Interconnection network plays a vital role in determining the system performance and power usage. Processing node of supercomputer communicates through the interconnection network as well as

View metadata, citation and similar papers at core.ac.uk

*provided by* I ne internati

To the memory units [13]. However, the power usage for interconnection network depends on the router power and link power. In Alpha 21364 microprocessor, the integrated routers and links consume about 20% of the total chip power (about 25W of total chip power 125W) (a 128 core 2Dtorus network fabricated with 180nm fabrication process). In MPC systems, the total number of outgoing links like-on-chip and off-chip links is a big concern, due to power usages and high latency [4]. However, on-chip network consumes up to 50% of the total chip power [5] and number of off-chip link defines the network bandwidth. In consequence, network topology at the lowest level should maintain the lowest number of physical outgoing links to reduce the power usage. Hence, the main target for this research is to analyze the required power usage for various networks with variable traffic loads.

Conventional networks having the flat network structure like- torus networks show better performance than the mesh networks [6]. However, consumes more static electric power than the mesh networks due the extra wrap-around connection. On the other hand, 3D NoC is preferable over the 2D NoC due to low power usage at the on-chip level considering the reduced vertical channels using the Through Silicon Via (TSV) [7]. However, if we consider the 3D torus interconnect (used for Cray T3D), requires minimum about 17.08% higher network power in comparison to 3D mesh network (used in MIT M-Machine) with 64 nodes according to our analysis. In contrast, Hierarchical Interconnection Network requires less number of off-chip connections than other conventional networks, which is one of the key points for the reduction of power usage [4].

The rest of the paper describes about the architectural structure of 3D-TTN, reviews the routing algorithm, shows the estimation for on-chip power consumption for the 3D-TTN with variable loads and finally shows the topological analysis with packing density and network fault-tolerance capability.



Fig. 1. A  $(4\times 4\times 4)$  basic module of 3D-TTN(2, L, 0) [12]

# SECTION II. Related Works

16-tile MIT RAW on-chip network requires about 36% of the total chip power [8]. Similarly, power estimation in CMOS VLSI chip shows that all interconnects and off-chip driven power can be scaled up 20% and 65% of the total power consumption, but if off-chip power is excluded, power related to wires could be up to 46% of the chip power [9]. On the other hand, modern supercomputers are solely based on the hierarchical structured connections for the off-chip levels like-the connection between the node to node connection, node to rack connection and so on. However, such hierarchical connections are not being optimized for a conventional network. In addition, as per analysis on Infiniband QDR 40Gbps switch requires typically about 1W of electrical power for per link. Hence, the power consumptions have been heavily affected with increased off-chip connections. Another key important feature for hierarchical interconnection networks (HIN) is the ability to maintain variable link structure for different network levels.

Recent introduced interconnection networks like- Slim fly [10] and Flattened Butterfly [11] are lowdiameter topology while maintaining high radix for each different size of networks. Such high radix network is not useful for connecting the million of cores due to the incease of router cost with the Power Analysis with Variable Traffic Loads for Next Generation Interconnection Networks - IEEE Xplore Document increased router radix. For example-Flattened Butterfly will require about 48 radix for its only 4K network, requiring router cost about \$15,926.9 (f(K) = 350.4K - 892.3) [10]. On the other hand, 3D-TTN maintains fixed router radix of 9 links with requiring \$2,261.3 for each router. Static performance of 3D-TTN shows near about 21% better diameter, about 12% better average distance performance at the highest level and about 32.48% lower router power usages (for 1% traffic load) than the 5D torus network [12]. It also outperformed other HIN networks like- TTN [13] in considering network performance.

# **SECTION III.** Power Model for On-Chip Analysis

On-chip power model in this paper has been estimated through the Orion energy model [14] using 45nm fabrication process with considering the static and dynamic power dissipation for the routers and the inter-router interconnects. However, 45nm (used in Blue Gene/Q Power PC A2 processor) is a more recent fabrication process over the 65nm, which was considered in our earlier research. On the other hand, GARNET network simulator [15] has been used for dynamic traffic simulation along with Orion energy model. The dynamic power and static power are the main source of power consumption for both router and link power models. Router total energy is the sum of the total read and write to router buffers, total activity at the local, global arbiters and finally for the total number of crossbar traversals. Equation 1 shows the summation of total energy consumption inside the router [15].

$$E_{router} = E_{buffer\_write} + E_{buffer\_read^+} E_{vc\_arb^+} E_{sw\_arb^+} E_{xb}$$
(1)

View Source

The dynamic energy for routers is defined by  $E = 0.5 \alpha \text{ CV}^2$ , where  $\alpha$  is the switching activity, C is the capacitance and V is the supply voltage [14]. However, links dynamic power model is estimated through the charging and discharging of capacitive loads (wire and input capacitance of next-stage repeater). Link power formulated as  $P = \text{Ef}_{\text{Clk}}$ , where  $f_{\text{clk}}$  is the clock frequency. However, link dynamic power can be defined as,  $P_{\text{link}} = \alpha C_1 V_{dd}^2 f_{clk}$ , where  $C_1$  is the load capacitance,  $V_{dd}$  is the supply voltage.

# **SECTION IV.** Architecture of 3D-TTN

Hierarchical interconnection networks are one of the probable solutions for obtaining the low power usage due to its limited number of off-chip connections. 3D-TTN is a also HIN network, which already been introduced earlier [12]. However, to understand the architecture details we like to review the network pattern for 3D-TTN through this section.

# **IV.** Definition

A BM of 3D-TTN(m, L, q) network is similar to 3D-torus network, which consists of  $2^{3m}$  connected processing elements (PEs) having  $2^m$  rows and  $2^m \times 2^m$  columns, where m is a positive integer, L is defined for levels of hierarchy and q is used for inter-level connectivity [12].

## A. Basic Module

3D-TTN is constructed through various level of interconnection. However, the construction of the lowest level network for 3D-TTN(m, L, q) is defined as the Basic Module (BM). A  $(2^m \times 2^m \times 2^m)$  BM of 3D-TTN interconnected with  $2^{3m}$  nodes, has the free ports about  $2^{2m+2}$  for its higher level interconnection. Fig. 1 shows the BM of 64 nodes  $(4 \times 4 \times 4)$ , has  $2^{2 \times 2+2} = 64$  free ports. Each BM uses  $2^m \times 4 \times (2^q) = 2^{m+q+2}$  of its free links in the upper level networks, where  $2(2^{m+q})$  free

links are considered for vertical connections and  $2(2^{m+q})$  free links used for horizontal connections. Here, q defined as the inter-level connectivity (qO, 1, ..., m). In case of m = 2 and q = 1, then  $(2^{2+1+2}) = 32$  of the free ports and their associated links are used for each higher level of interconnections; 16 of them will be used for horizontal and another 16 of them will be considered for the vertical connections. However, if we increase the value of q to 1, then the maximum number of network levels will be equal to three. In such case, the number of links in each direction of vertical\_in, vertical\_out, horizontal\_in, horizontal\_out will be increased to 8. In this paper, we consider the network class of 3D-TTN(2, L, O).

## **B. Higher Level of 3D-TTN**

Higher level of 3D-TTN follows 2Dtorus structured recursive interconnection pattern of the immediate lowest level of sub-networks. Therefore, a level-2 network consists of a certain number of level-1 networks. Fig. 2 illustrates the higher-level interconnection of 3D-TTN up to level-3. For example, a level-3 network can be built by  $(2^{2\times2})$  16 level-2 3D-TTN subnetwork. Similarly, a level-5 network can be built using 16 level-4 subnetworks. The total number of nodes at a certain level of 3D-TTN can be defined as  $N = (2^{2mL} \times 2^m)$ . For example, the total number of nodes at level-2 3D-TTN(3, 2, 0) network is  $N = (2^{12} \times 2^3) = 32$ , 768. The highest possible nodes in 3D-TTN can be obtained through the max number of network levels, which entirely depends on the network interconnectivity (q). Considering the highest level for 3D-TTN is based upon the equation of  $L_{max} = 2^{m-q} + 1$ . Hence, in the case of m = 2 and inter-level connectivity q = 1,  $L_{max} = 3$ ; level-3 will be the maximum possible level. Therefore, with m = 2, the total number of nodes will be  $N = (2^{2\times2\times3} \times 2^2) = 16$ , 384. Now, if we increase the value of m = 3, then the total number of nodes will be  $N = (2^{2\times3\times3} \times 2^3) = 2$ , 097, 152. Table I generalize the various architectural parameters for the 3D-TTN.



Fig. 2.

Higher-level interconnection for 3D-TTN(2, L, 0) [12]

**Table I.** Generalization for 3D-TTN[12]

| Basic Module                  | Max Levels                | Total Nodes                         |
|-------------------------------|---------------------------|-------------------------------------|
| $(2^m \times 2^m \times 2^m)$ | $L_{max} = 2^{m - q} + 1$ | $\mathbf{N} = (2^{2mL} \times 2^m)$ |

**SECTION V.** Routing Algorithm for 3D-TTN Power Analysis with Variable Traffic Loads for Next Generation Interconnection Networks - IEEE Xplore Document Nouting argorithm for 3D-1114 had aready been denned in our earner research [12]. However, in this section, we like to review the routing algorithm for 3D-TTN (algorithm 1). Deterministic dimension-order routing (DOR) algorithm had been considered for 3D-TTN. In DOR routing, each packet traverse certain dimension until the distance of the dimension becomes zero with respect to a destination node, then it forwards to the next dimension. When a packet starts its routing from the source node to a destination node, it considers the destination node exists in the same BM. However, if destination BM doesn't exist in same BM, then the source node will send the packet to the outlet\_node which connects the outer BM at which the routing will be performed. On the other hand, function SP\_routing considers the shortest route for the higher levels. outlet\_x and outlet\_y function will get the x coordinate value of s<sub>1</sub> and y coordinate value for s<sub>2</sub>, such that a link exists for (s, d, 1, D,  $\alpha$ ). Here, s is used for source node, d is for destination node, 1 corresponds to higher levels  $(2 \le 1 \le L)$ , dimension D(DQV, H) and direction  $\alpha(\alpha C\{+, -\})$  which has been obtained from the SP routing function. Hence, the vertical and horizontal directions are represented by V +, V-, H+, H-. Now, if we consider a source node as  $s = [(s_{2L}, s_{2L-1}) \dots (s_4, s_3)(s_2, s_1, s_0)]$  and destination node  $d = [(d_{2L}, d_{2L-1}) \dots (d_4, d_3)(d_2, d_1, d_0)]$ , the routing tag can be defined as  $\mathbf{t} = [(\mathbf{t}_{2L}, \mathbf{t}_{2L-1}) \dots (\mathbf{t}_4, \mathbf{t}_3)(\mathbf{t}_2, \mathbf{t}_1, \mathbf{t}_0)].$ 

# **SECTION Algorithm 1** Routing algorithm for 3D-TTN [12]



In this section, we also like to clarify more details about the 3D-TTN routing algorithm considering the Fig. 3, where source node is [(1,2) (1, 2) (1, 2, 0)] and the destination node is [(2, 1) (2, 1) (2, 1, 0)]. At first routing will be started from the source BM at the highest level of network (Level-3). As the destination BM is different than the source BM, the source node will send the message to the outlet\_node [(1, 2) (1, 2) (3, 0, 0)] and eventually follow the routing path to reach level-3(2, 2). Similarly, will reach the destined level-3(2, 1) network from level-3(2, 2). Now level-2(1, 2) routing will be started and will reach level-2(2, 1) similarly as the level-3 network. After that routing for destination BM will start in reaching to the destination node (2, 1, 0) from receiving node (0, 3, 0).

|--|--|--|



#### Fig. 3. Routing path for 3D-TTN(2, L, 0)

# **SECTION VI.** Estimation of Power Consumption

Power reduction is an obvious choice for achieving exa-scale system. With the modern advancements, we can achieve an exa-scale system with unsatisfactory cost. For instance, the 2<sup>nd</sup> most powerful supercomputer on earth Tianhe-2 has achieved about 33.86 petaflops performance with 3,120,000 cores requiring about 18MW electrical power (enough to power 18,000 homes), which will require 540MW of electrical power for the exa-scale performance (~ 1 nuclear power plant) [16]. This section will explain the effect of interconnection network on the supercomputers though the on-chip power analysis for 3D-TTN with various other networks with variable network traffic loads.

## A. Assumptions for Power Model

Power consumption at the on-chip level requires up to 50% of total chip power. Hence, in this paper, we consider only the on-chip power estimation for 3D-TTN. The target of this power analysis is to examine the leakage and dynamic power for both the links and routers using an on-chip power model simulator with variable traffic loads. This paper considers the comparison of power estimation of 5D torus network, which has been used in Blue Gene/Q supercomputer [17]. This paper also considers the variable traffic load of 30% and 10% with uniform traffic pattern having the same link length of 3mm for each directly connected nodes. However, earlier research was considered only the 1% traffic load with only the router power as to differentiate. Hence, this paper shows a more accurate and proper power analysis for the various networks.

## **B.** Power Consumption for Various Interconnections

According to our considerations, the clock frequency with 1GHz, 128 bits message size, supply voltage 1.0V and uniform traffic pattern with 3mm per link length have been used, table II shows the simulation condition for the on-chip 3D-TTN. This simulation considers only one virtual channel with default dimension-order routing. The most important considation of this paper is the message injection rate, which varies from 30% to 10% load with common 1,000 simulation cycles. In each and every simulation considers 64 nodes. Fig. 4 shows the total link power consumption of 64 nodes of every network and Fig. 5 shows the total router power considering the dynamic and static power dissipation along with the clock power using the 10% traffic load. Those two simulations confirm that using fixed sized links, 3D-TTN outperforms the 4Dtorus and 5Dtorus networks. However, Fig. 6 shows the power distribution of 3D-TTN, which confirms that the clock power will require less than 36% (required in Intel 80-core router) [14]. In addition, considering the state of the power will require less than 36% (required in Intel 80-core router) [14].

Power Analysis with Variable Traffic Loads for Next Generation Interconnection Networks - IEEE Xplore Document the 30% traffic load, simulations for link power usage is shown in Fig. 7 and router power usage along with clock power in Fig. 8. This simulation also confirms that 3D-TTN is obviously a better choice over the 4Dtorus and 5Dtorus as the on-chip network. On the other hand, Fig. 9 confirms the increase of router's dynamic (4.79%) and link's dynamic power (3.09%) for the 30% traffic load from the earlier simulations of 10% traffic load for 3D-TTN.

As the number of required links for 4Dtorus network and 5Dtorus network is much higher than 3D-TTN, the number of off-chip connection has also a big impact on power analysis. However, table III explains that 3D-TTN will require about 29.75% less network power than 4Dtorus and for 5Dtorus it requires about 39.96% less electric power. Now if we estimate this power consumption in a real scenario like- Blue Gene/Q 5Dtorus network, where 20PF/s has been achieved by 1.57M core and requires 6.6MW electrical power; 3D-TTN will require about 198.132MW electrical power with 10% load for the exa-scale. However, 5Dtorus will require about 330MW electrical power. This comparison confirms that we can reduce the electric power about 130MW by adopting 3D-TTN.



Fig. 4:

Link power estimation for 10% load with 64 nodes



Router power estimation for 10% load with 64 nodes



#### Fig. 6:

Power distribution for 10% load on 3D-TTN with 64 nodes



## Fig. 7:









Router power estimation for 30% load with 64 nodes



## Fig. 9:

Power distribution for 30% load on 3D-TTN with 64 nodes

| Parameter              | Value             | Units            |
|------------------------|-------------------|------------------|
| Fabrication Process    | 45nm              | -                |
| Number of nodes        | 64nodes           | -                |
| Link lengths           | 3                 | mm               |
| Operating frequency    | $1 \times 10^{9}$ | Hz               |
| Transistor Type        | NVT               | -                |
| Supply Voltage         | 1                 | V                |
| Traffic Pattern        | Uniform           | -                |
| Message Injection rate | 0.10 or 0.30      | Flits/cycle/node |
| Message Size           | 128               | Bits             |
| Simulation Cycle       | 1,000             | -                |

#### Table II. Simulation condition for power analysis

Table III: Power comparison for various traffic loads against 3D-TTN

| Network | Traffic | Power Model   | Total Difference |
|---------|---------|---------------|------------------|
| 5Dtorus | 1%      | Router Power  | 32.48% less      |
| 5Dtorus | 10%     | Router + Link | 39.96% less      |
| 4Dtorus | 10%     | Router + Link | 29.75% less      |
| 3Dmesh  | 10%     | Router + Link | 17.08% higher    |

| 5Dtorus | 30% | Router + Link | 38.42% less   |
|---------|-----|---------------|---------------|
| 4Dtorus | 30% | Router + Link | 28.3% less    |
| 3Dmesh  | 30% | Router + Link | 23.26% higher |

# **SECTION VII.** Topological Analysis

Performance is the first concern for interconnection networks. Interconnection network with low cost, low degree, low congestion, high connectivity and high-fault tolerance is preferable than the others [18]. In this section, we like to show some topological analysis of 3D-TTN with node degree, cost, packing density and message traffic density.

## A. Node Degree

The node degree of an interconnection network is treated as the maximum outgoing links from a single node. The input/output interface cost of a particular node is equivalent to its degree. Hence, the consideration of constant and small number of the node degree is highly recommended for interconnection networks. Even a high degree network increases the total link power consumptions. Moreover, the constant node degree is easy to maintain as well as in reducing the network complexity. Since each node of 3D-TTN has eight outgoing links, the node degree for 3D-TTN is 8 whereas 5D torus uses 10 links as the node degree.

## **B.** Cost

Cost is also a static network parameter, which basically depends on the diameter and the node degree of the network [19]. Node to node distance, network congestion and fault tolerance depends on the diameter and the node degree. Hence the product of diameter and node degree has been treated as the cost performance of the interconnection networks. Cost can be useful as the parameter for next generation supercomputers as it reflects the diameter with the node degree. Now, Fig. 10 shows the cost performance of the 3D-TTN(2, L, 0) network, which explains that the cost performance for 3D-TTN(2, L, 0) is much better than the 5Dtorus network as well as the 2D or 3D mesh and torus networks, is little worse than the TTN and TESH network due to little high number of node degree [13].

$$Cost = Total Node Degree imes Diameter$$

(2)

View Source 📀



Cost performance analysis of various networks

## **C. Packing Density**

Higher packing density is preferable for a network to reduce the required chip area for the VLSI layout. Even the chip size has a big impact on the required power usages. Network cost is defined by the product of network diameter and node degree. On the other hand, the packing density is defined as the ratio of total number of nodes to its cost [18]. Equation 3 shows the definition for packing density.

$$Packing Density = \frac{Total number of nodes}{Degree \times Diameter}$$
(3)

#### View Source 📀

Fig. 11 shows the packing density for the 3D-TTN, which proves that 3D-TTN has higher packing density than the 5D torus network and even than the 2D or 3D mesh and torus networks. 3D-TTN also shows the similar packing density like the 3D-TESH network at the maximum level.



Fig 11. Packing density for various networks

## **D. Message Traffic Density**

The performance of a network for the message traffic density can be evaluated by average distance from one source node to another. An efficient network should have low message traffic density to reduce traffic congestion and eventually should provide wide network bandwidth. Message traffic density is the ratio of multiplication between total number of nodes and average distance to its total number of links [20]. Hence, it can be derived from the equation 4.

Message Traffic Density, 
$$\rho \equiv \frac{\bar{d}N}{E}$$
 (4)

View Source

Here,  $\overline{d}$  is the average distance, N is the total number of nodes and E is the total number of links. On the other hand, the total number of links for 3D-TTN can be derived from the equation 5.

$$E = N_{BM} \times inner \ L_1 \ Links + \sum_{i=2}^{N} N_{BM} \times outer \ L_i \ links$$

$$[N > 2]$$
(5)

View Source

Here,  $N_{BM}$  is the number of basic module in current level,  $L_1$  links considers for number of level-1 links and  $L_i$  considers the number of i-th level links. On the other hand, total number of nodes can be derived by the equation 6.

The total number of nodes, 
$$N = (2^{2mL} \times 2^m)$$
 (6)

#### View Source 📀

Table IV shows the simulated result of  $\bar{d}$ , N, E,  $\rho$  for the 3D-TTN(2, L, 0) network and using those values we have also compared the message traffic density with the other networks in Fig. 12, which shows that 3D-TTN network requires the lowest message traffic density than the 2D-torus, 2Dmesh, 3D torus and 3D-TESH networks.



Fig. 12. Message traffic density of various networks

Table IV: Message traffic density of 3D-TTN network

| Level | d       | N      | Е       | ρ       |
|-------|---------|--------|---------|---------|
| 1     | 3       | 64     | 192     | 1       |
| 2     | 7.44477 | 1024   | 3200    | 2.38326 |
| 3     | 11.5945 | 16384  | 53,248  | 3.56754 |
| 4     | 17.8446 | 262144 | 884,736 | 5.28729 |

# **SECTION VIII.** Fault Tolerance

Fault tolerance for any interconnection network is a common phenomenon due to the link or node failures. Even on-chip processors like- PowerPC A2 used in Blue Gene/Q supercomputer contains one extra core as the fault tolerated node [21]. According to the definition of fault tolerance of a graph has been defined as the maximum number of vertices that can be removed until the graph is still connected. Hence, the fault tolerance for a graph treated as the one less than its connectivity [22]. A network will be k-fault tolerant if it can sustain up to k number of link failures. In case of 3D-TTN, the connectivity is less than its node degree. Hence, at the BM level the 3D-TTN can

tolerate up to 5 links. In contrast, a level-5 3D-TTN node can tolerate up to 7 links. Fig. 13 illustrates that 3D-TTN has six total paths in the BM levels between each source-destination pair, which is higher than the 2D and 3D mesh networks and equal to 3D torus network. In this section, we like to measure the fault handling capabilities for 3D-TTN through fault tolerance, arc connectivity and fault diameter.



Fig. 13.

Communication paths for each source-destination

## A. Arc or Graph Connectivity

Arc or graph connectivity ensures the robustness of a network. Arc connectivity is the minimum number of links that is required to be removed to break a network into two disjoint parts. High connectivity helps to improve the network performance through the avoidance of link congestion and improved fault tolerance. A network is maximally fault tolerant if its connectivity is equal to its own node degree. Table V shows the arc connectivity for 3D-TTN, which explains that 3D-TTN has higher connectivity than the 2D networks like-2Dtorus, TESH and TTN. Even better than the 3D-TESH network. On the other hand, 2Dtorus is the most fault-tolerant network even than the 3D-TTN due to the common value in node degree and arc connectivity parameters.

| Parameters          | 2DT | TESH | TTN | 3D<br>TESH | 3D-<br>TTN |
|---------------------|-----|------|-----|------------|------------|
| Node Degree         | 4   | 4    | 6   | 6          | 8          |
| Arc<br>Connectivity | 4   | 2    | 4   | 4          | 6          |

 Table V: Arc connectivity for various networks

## **B.** Fault Diameter

Fault diameter affects the network diameter. Fault diameter measures the network diameter when a fault occurs, which may be caused by a faulty processing node. Fault diameter  $d_f$  can be defined as the maximum diameter any graph (Having *f* fault tolerance) will require in deleting at most *f* vertices [22].

## Theorem 1

The fault diameter for 3D-TTN(2, L, 0) is given by-

$$\mathrm{d}_f = \max(\mathrm{D_z} + \mathrm{D_s} + (\sum_{i=2}^L (D_{si} + D_i)) + \mathrm{D_d})$$

\*

View Source 📀

## Proof

The fault tolerance for 3D-TTN for BM level is five. Hence, 3D-TTN can communicate with other nodes though maximum six paths in the BM level. Now if we consider a single link/node failure on 3D-TTN(2, L, 0) network one message requires to travel through the wrap-around connection, which confirms that 3D-TTN(2, L, 0) network requires the same number of hops to be passed through to reach the destination node. Fig. 14 shows a 3D-TTN(2, L, 0)  $(4 \times 4 \times 1)$  network, where (1, 0, 0) is temporally faulted. Now a message destined to the (2, 0, 0) node from the source node (0, 0, 0) requires to travel through (3, 0, 0) node with the wraparound connection and then will reach the destination (2, 0, 0) node. Hence, the fault diameter for 3D-TTN(2, L, 0) network is equal to original diameter.



Fig. 14. Fault node routing for 3D-TTN  $(4\times 4\times 1)$ 

# **SECTION IX.** Conclusions

In this research, our main objective was to show the power comparison for various on-chip networks with variable traffic loads using the Orion power model. This paper considers only the uniform traffic pattern with 45nm fabrication process and 64 nodes for each network having single virtual channel.

Power efficiency for 3D-TTN has also been compared with the various networks, which shows that 3D-TTN requires 29.75% less total power usage than the 4Dtorus and also 39.96% less power usage than the 5Dtorus with 10% traffic load. However, increasing the traffic load up to 30%, it requires 28.3% less total power usage than the 4Dtorus network and also 38.42% less power usage than the 5Dtorus. Those differences at the on-chip level ensure that 3D-TTN can reduce more than 130MW of electric power over the 5Dtorus network in the exa-scale system. In contrast, it requires only about 17.08% higher power usage than the 3Dmesh network. In our power simulation, we could able to show that increase of traffic level from 10% to 30%, 3D-TTN will require 3.09% increased link dynamic power and 4.70% router dynamic power. However, the other power consumed

parameters like-link static power, router static power and clock will remain same even with the vaiable traffic loads.

From our earlier analysis, we found that 3D-TTN at the maximum level is obviously better than the 2D and 3D mesh and torus networks for diameter and average distance. Comparing with the 5Dtorus network, it has also outperformed the 5Dtorus by near about 21% diameter and 12% average distance performance at over 4 millions of nodes. Now, considering this paper analysis on 3D-TTN shows that it outperformed the 5Dtorus network at the maximum level through cost and packing density. Even shows better performance than the mesh and torus network in considering the message traffic density. This simulation also ensures that 3D-TTN will have less network congestion than others. In case of fault tolerance, 3D-TTN has higher fault tolerance than the 2D or 3D mesh and torus networks and even than the 3D-TESH network. Moreover, 3D-TTN has higher arc connectivity than the 2Dtorus, TESH, TTN and even than the 3D-TESH network.

## ACKNOWLEDGMENT

This research is partly supported by JSPS KAKENHI GRANT NUMBER 24300016. The authors are grateful to the anonymous reviewers for their constructive comments.

#### Keywords

#### **IEEE Keywords**

Multiprocessor interconnection, Routing, System-on-chip, Telecommunication traffic, Power demand, Supercomputers, Next generation networking

#### **INSPEC:** Controlled Indexing

power aware computing, multiprocessor interconnection networks, parallel processing

#### **INSPEC: Non-Controlled Indexing**

massively parallel computer, power analysis, traffic load variability, next generation interconnection network, power consumption, next generation supercomputer, 3D-TTN, hierarchical interconnection network

#### **Author Keywords**

estimation of power consumption, 3D-TTN, topological analysis, message traffic density, packing density, fault tolerance, routing algorithm

#### Authors

Faiz Al Faisal Sch. of Inf. Sci., JAIST, Ishikawa, Japan

M. M. Hafizur Rahman Dept. of Comput. Sci., IIUM, Kuala Lumpur, Malaysia

Yasushi Inoguchi Res. Center for Adv. Com. Infr., JAIST, Ishikawa, Japan

## **Related Articles**

L-turn routing: an adaptive routing in irregular networks M. Koibuchi; A. Funahashi; A. Jouraku; H. Amano

The Mcube: a symmetrical cube based network with twisted links N.K. Singhvi; K. Ghose

A memory-affective routing strategy for regular interconnection networks

M.E. Gomez; P. Lopez; J. Duato

Descending layers routing: a deadlock-free deterministic routing using virtual channels in system area networks with irregular topologies M. Koibuchi; A. Jouraku; K. Watanabe; H. Amano

Power shifting in Thrifty Interconnection Network Jian Li; Wei Huang; Charles Lefurgy; Lixin Zhang; Wolfgang E. Denzel; Richard R. Treumann; Kun Wang

Run-time adaptive on-chip communication scheme Mohammad Abdullah Al Faruque; Thomas Ebi; Jorg Henkel

An efficient path-based multicast algorithm for mesh networks Y. Al-Dubai; M. Ould-Khaoua; L.M. Mackenzie

Switch-tagged VLAN Routing Methodology for PC Clusters with Ethernet Tomohiro Otsuka; Michihiro Koibuchi; Tomohiro Kudoh; Hideharu Amano

PP-MESS-SIM: a simulator for evaluating multicomputer interconnection networks J. Rexford; J. Dolter; Wu-Chang Feng; K.G. Shin

Dynamic and Distributed Multipath Routing Policy for High-Speed Cluster Networks D. Lugones; D. Franco; E. Luque

| IEEE Account               | Purchase Details           | Profile Information          | Need Help?                     |
|----------------------------|----------------------------|------------------------------|--------------------------------|
| » Change Username/Password | » Payment Options          | » Communications Preferences | » US & Canada: +1 800 678 4333 |
| » Update Address           | » Order History            | » Profession and Education   | » Worldwide: +1 732 981 0060   |
|                            | » View Purchased Documents | » Technical Interests        | » Contact & Support            |
|                            |                            |                              |                                |

About IEEE Xplore | Contact Us | Help | Terms of Use | Nondiscrimination Policy | Sitemap | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2017 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.