Millimeter-wave (mm-Wave) technology has been widely adopted in recent wireless networkon-chip (WiNoC) design since it is fully compatible with current CMOS process. By employing efficient channel multiplexing mechanisms, the performance of WiNoCs can be improved. However, such improvement is very limited since the wireless channels are generally shared by multiple pairs of communicating nodes and as the network size scales up, the multiplexing mechanisms perform worse. In this work, more physically achievable mm-Wave channels are introduced in WiNoCs, based on which, a high-performance millimeter-wave multichannel WiNoC architecture is elaborated which includes designs of topology, routing and MAC mechanism. Besides, to relieve the congestion in hubs, a congestion-aware adaptive channel selection (CAACS) mechanism is also proposed. Simulation results show that such an architecture increases the saturated throughput by 16%∼98% and by introducing the CAACS mechanism, the saturated throughput can be further improved by up to 17%. The average packet delay is also significantly reduced while just negligible area and energy overhead are produced.
I. INTRODUCTION
As silicon technology scales down into deep sub-micron nodes, the number of cores integrated on a single chip has increased phenomenally and traditional bus-based interconnects 1 cannot satisfy the performance demands due to low bandwidth, high delay and bad scalability. Networks-on-chip possess the characteristics of high bandwidth, low delay and good scalability becoming the design paradigm for chip multiprocessors (CMPs). However, its inherent multi-hop nature causes excessive packet transmission delay and energy consumption.
To break through the performance bottleneck of traditional metallic NoCs, multiple novel interconnects have been proposed, for example, 3D NoCs [4] , photonic NoCs [1] and multi-band RF-Interconnect NoCs [2] . Such designs reduce delay and energy consumption of the traditional metallic NoCs while still facing some unsolved problems [5] . Limited
The associate editor coordinating the review of this manuscript and approving it for publication was Nitin Nitin . 1 In this work, the word ''interconnect'' refers to ''on-chip interconnect''.
to manufacturing technology, mass production of 3D and photonic NoCs is still impossible. Though multi-band RF-I is compatible with existing CMOS process, specific transmission line must be paved across the whole chip to achieve high data rate. Compared with previously mentioned interconnects, wireless network-on-chip is fully compatible with current CMOS technology without special physical media such as photonic waveguide or RF transmission line introduced which dramatically reduces the implementation complexity and cost. While providing express links for data transmission, mm-Wave WiNoC has multiple unique advantages in energy efficiency, communication distance and transmission delay compared with other wireless interconnects [5] .
mm-Wave frequency band can provide bandwidth ranging from 30 GHz to 300 GHz. To utilize the band resources more efficiently, various works have been contributed in the following two aspects.
Firstly, how to utilize the available mm-Wave channel more efficiently. So, multiple channel multiplexing schemes such as frequency division multiple access (FDMA), code division multiple access (CDMA), and time division multiple access (TDMA) are proposed.
To apply FDMA in mm-Wave WiNoCs, the mm-Wave frequency band must be partitioned into numbers of nonoverlapping channels. However, such partition cannot be achieved with current transceiver design technology which makes the application of FDMA a real challenge. Besides, owning to fixed band partition, such mechanism cannot satisfy the scalability requirement of WiNoCs.
In [6] , orthogonal Walsh code is deployed to encode data from different channels to multiplex the mm-Wave channel. However, such a mechanism must synchronize the clocks of different transceivers located across the whole chip to maintain the orthogonality between channels, which imposes a challenge for its implementation.
The TDMA based token-passing MAC mechanism is adopted in multiple mm-Wave WiNoCs [3] , [8] , [9] for its simplicity and distributed control characteristics. However, due to temporal variation of workload and spatial variation of traffic distribution [10] , the equally allocated time slot for every wireless node will cause serious waste of channel resources.
Secondly, how to provide WiNoCs with more channels in mm-Wave frequency band. This problem can be solved by advanced transceiver design technologies.
In [7] , a transceiver operating at 900 MHz and 1.8 GHz is designed for dual-band applications which is the first attempt to design a dual-band wireless transceiver. In [9] , a mm-Wave transceiver with channels of 30, 60 and 90 GHz is proposed and initially applied to WiNoCs while literature [14] presents another mm-Wave transceiver design using three non-overlapping channels demonstrated in frequency range of 31, 57.5 and 122 GHz respectively.
It should be noticed that the improvement of network performance achieved by channel multiplexing is very limited since the wireless channel is generally shared by multiple pairs of communicating nodes and as the network size scales up, the multiplexing mechanisms perform worse.
In this work, we present a high-performance WiNoC architecture composed of topology, routing and MAC mechanism designs based on the mm-Wave multichannel transceiver proposed in [9] . The architecture incorporates several mm-Wave channels and since every channel varies in congestion every moment, the choice of the least congested channel for packet transmission will always relieve the congestion and improve the performance of the network. Therefore, a congestionaware adaptive channel selection (CAACS) mechanism is also presented. Besides, such a mechanism can also inherently avoid deadlock in cross-level communication between wired and wireless level of the hybrid WiNoC architecture without introducing virtual channels.
The major contributions of this work are as follows:
(1) a high-performance mm-Wave multichannel WiNoC architecture and its supporting designs of topology, routing and MAC mechanisms;
(2) a congestion-aware adaptive channel selection mechanism which can inherently avoid deadlock in cross-level communication.
(3) when the performance of mm-Wave single-channel WiNoCs can only be improved limitedly by channel multiplexing, we present a comprehensive and in-depth discussion about mm-Wave multichannel WiNoC architecture aiming at providing new ideas for high-performance mm-Wave WiNoC design.
The rest of this paper is organized as follows: in section II, the mm-Wave multichannel transceiver, basis of cm 3 WiNoCs, is elaborated while the cm 3 WiNoC architecture is detailed in section III; the CAACS mechanism is explained in section IV and the simulation results of the cm 3 WiNoC architecture is given in section V; section VI concludes the whole paper.
II. MILLIMETER-WAVE MULTICHANNEL WIRELESS TRANSCEIVER
As the basic component of cm 3 WiNoCs, the architecture of the mm-Wave multichannel wireless transceiver will be introduced in this section.
A. TYPICAL SINGLE-CHANNEL MILLIMETER-WAVE TRANSCEIVER
Transceiver design largely impacts the performance of WiNoCs. A good transceiver design should possess the characteristics of high bandwidth, low energy consumption to satisfy the throughput and power requirements of WiNoCs.
A well-designed mm-Wave transceiver can provide relatively sufficient bandwidth without severe signal degradation while the low power design of a mm-Wave transceiver should be taken into account at both circuit and architecture level.
To reduce implementation complexity and power consumption of the transceiver, the non-coherent on-off keying (OOK) is deployed in the transceiver design which avoids introducing complicated carrier recovery circuits or LOs in the receiving end of the transceiver achieving significant reduction in implementation complexity and power consumption. As shown in Fig. 1(a) , the transmitting end of the transceiver employs a direct-conversion topology which consists of a serializer, an OOK modulator and a power amplifier (PA) while the receiving end of the transceiver includes a low-noise amplifier (LNA), an envelope detector (ED), a baseband amplifier (BA) and a deserializer for non-coherent demodulation.
B. MILLIMETER-WAVE MULTICHANNEL TRANSCEIVER
Millimeter-wave occupies the frequency spectrum from 30 GHz to 300 GHz providing relatively abundant bandwidth for WiNoCs while the number of available mm-Wave channels is very limited with current transceiver design technology. To provide WiNoCs with more valuable mm-Wave channels for parallel data transmission, the design of a triple-band mm-Wave transceiver is presented in literature [9] .
The transmitting end of the transceiver is shown in Fig. 1(b) . For simultaneous data transmission in three non-overlapping channels, three groups of duplicated RF components are introduced, including three Zigzag antennas, three power amplifiers and three OOK modulators. A 30 GHz voltage-controlled oscillator (VCO) in combination with two multipliers can be reused in the transmitting end as the local oscillator (LO) helping reduce the overall power consumption of the triple-band transceiver compared with using three separate VCOs for each front end. The receiving end of the triple-band transceiver is like that of a single-band one as shown in Fig. 1(a) .
The triple-band transceiver can operate at 30 GHz, 60 GHz and 90 GHz channels among which each varies in power and data rate as shown in table 1 and only channel ch1 and ch2 are deployed in this work to present the design of cm 3 WiNoCs as explained later in section V-C-2. Also, similarly, all these three channels can be adopted to establish a congestion-aware millimeter-wave triple-band wireless networks-on-chip.
III. MILLIMETER-WAVE MULTICHANNEL WINOC ARCHITECTURE
The topology, routing and MAC mechanism of the cm 3 WiNoC architecture are detailed in this section. 
A. TOPOLOGY
A wireless/wired hybrid on-chip network architecture is adopted in this work. There are two types of nodes in the network, namely, tile nodes and hub nodes. A tile node consists of a processing element (PE) and a router through which the local PE can communicate with neighboring PEs. The hub node is equipped with two wireless interfaces (WIs) (operating on channel ch1 and channel ch2 respectively as indicated in section II-B) providing far apart communicating tiles with long-distance single-hop wireless links which reduces congestion and improves network performance significantly. Figure 2 shows the traffic distribution of an 8×8 mesh NoC. Each square in the figure represents a tile and the darker the color of the tile is, the more the flits flow through the tile. It's obviously that the traffic is mainly concentrated in the central area of the network and thus, hubs placed near the center can provide more packets with long-distance singlehop wireless links so that the packets which have remote destinations can bypass the central area and reduce the congestion of the network. The four black boxes indicate the location of the tiles that can use wireless links. As shown in Fig. 3 (a), with four hubs of which each has two WIs placed symmetrically near the central area of the network, traffic distribution will be greatly balanced.
To balance the utilization of wireless and wireline links, wireless channel can only be used when the following two conditions are met during packet transmission: (1) the destination of the packet is connected directly to a hub and (2) the routing path of the packet contains at least one tile directly connected to a hub. For example, when a packet from tile A1 needs to be forwarded to its destination tile A2 following XY routing, it will firstly be routed along horizontal wireline links until arriving at the tile connected directly to hub H2. Since the destination tile A2 is connected to hub H4, the packet will then be transmitted to hub H4 through a single-hop wireless link and hereafter arrive at its destination. However, the packet from tile B1 with its destination tile of B2 cannot access the highspeed wireless links since its destination is not connected to any hub. 
B. MAC MECHANISM
In this work, a token-based MAC mechanism is adopted to ensure the fairness and efficiency of channel access initiated by any hub in the network. In either channel, the token is passed between all hubs in a round-robin fashion without interfering with each other (as shown in Fig. 3(b) ). A hub can only transmit data through wireless channel after capturing the token. However, when one hub is in possession of the token, the others must wait until the token-owned hub finishes its packet transmission. The mm-Wave multichannel transceiver mentioned in section II-B allows token grant in every individual channel simultaneously and, therefore, more hubs can access the wireless links at the same time.
In traditional TDMA-based token-passing MAC mechanism, the token possession period (TPP) of each hub is equal [3] , [8] , [10] . However, due to spatial variation of traffic distribution and time variation of workload [10] , assigning equal transmission durations to each hub results in pretty low channel utilization. In this work, the token possession period of each hub is dynamically adjusted according to the in-transmission packet size and once a packet completes its tail flit transmission, the token-captured hub will release the token to the next hub in order. Such a mechanism greatly improves the utilization of mm-Wave channels. Since every packet varies in size, the TPP of each hub may vary at all times.
When a flit comes, the type of it will be determined by its first two bits and if it's a token flit, the flit will be transferred to the token management unit (TMU), and otherwise, the flit will be put into the receiving buffer. The TMU will generate a new token with its Next_WI field written by the value of register ID_next and set register Has_token if the Next_WI field of the received token matches the content of register ID_self (as shown in Fig. 4 and Fig. 5 ). When the tail flit of a packet in the antenna transmitting buffer (ABTx) completes its transmission, signal ls_tail_flit goes high and thus, the wireless interface (WI) can be used for the transmission of a token flit generated by the TMU. Such a MAC mechanism greatly improves the utilization of mm-Wave channels which leads to significant enhancement of the network performance.
IV. CONGESTION-AWARE ADAPTIVE CHANNEL SELECTION A. CHANNEL CONGESTION ANALYSIS
The performance of the network is closely related to its congestion degree [11] . When congestion happens, the power consumption and the packet delay of the network increase dramatically [12] , [13] . In WiNoCs, wireless hubs are generally shared by many tiles and get congested more easily.
Section III-A presents the overall architecture of cm 3 WiNoCs where a single hub shared by four tiles causing increased congestion and more specifically, the congestion happens at the antenna transmitting buffer (ABTx).
The internal architecture of a radio-hub is shown in Fig. 6 . Any tile requesting for a wireless link must (1) forward its packet to the hub's input buffer (IB) firstly and then (2) the packet will be further forwarded to either ABTx buffer. Subsequently, (3) the packet is transmitted through the wireless link to its destination if the token of current channel is captured at the hub.
Among these three steps, the second and the third is generally where congestion happens. As shown in Fig. 6 , a packet from buffer IB1 is being forwarded to buffer ABTx_Ch1 while a packet from buffer IB2 is being forwarded to buffer ABTx_Ch2. If there is another packet from buffer IB4 requesting for a wireless channel at the same time, it has to wait until the tail flits of the two packets complete their transmission since there is no more wireless channel available at present. As a result, flits from buffer IB4 cannot be forwarded in time and accumulate resulting in congestion.
Flits in buffer ABTx will be transmitted through wireless links to their destinations and the time (in cycles) spent in transmitting a single flit is given by:
where, DR ch i is the data rate of channel i, CP is for clock period and FS is for flit size. Assume a DR ch i of 10 Gbps, a CP of 1 ns and a FS of 32 bits, the time spent on the transmission of a single flit is 3.2 cycles while the flit injection rate of buffer ABTx is much higher at one flit per cycle. As a result, flits from buffer ABTx can't be transmitted in time and accumulate resulting in congestion. Actually, the flits cannot always be injected into buffer ABTx at such a high rate. Depends on the number of connected tiles, the rate at which flits come may vary greatly with one flit per cycle at its maximum. The more the tiles are connected to the hub, the more the ABTx buffer gets congested.
Besides, the size of the antenna receiving buffer (ABRx) at receiving end also impacts the congestion level of buffer ABTx since a full ABRx buffer will reject the forwarding request initiated by buffer ABTx at transmitting end so that its flits can't be transmitted in time and accumulate rapidly.
To reduce congestion, bigger ABTx or ABRx buffers are always preferred while excess buffer size will introduce significant power and area consumption. It can be concluded that if the wireless interface supports a higher data rate, smaller ABTx/ABRx buffers can just satisfy the performance demands of the network.
B. CONGESTION-AWARE ADAPTIVE CHANNEL SELECTION
The congestion degree of channel i can be defined as:
where N i and n i indicate the total and used space of buffer ABTx in flits respectively. The bigger the η i is, the more seriously the buffer ABTx gets congested. When a packet from input buffer IB1 requests for a wireless channel for packet transmission, channel ch1 or ch2 can either be the choice (as shown in Fig. 6 ). However, the congestion levels may differ between the both and appropriate channel selection mechanism may alleviate congestion in buffer ABTx and improve the performance of cm 3 WiNoCs. In this work, metric η i is adopted to evaluate the channel congestion level according to which a reasonable channel selection can be made. Although the metric is very simple in form, it is a comprehensive reflection of the channel data rate, the congestion level of the antenna receiving buffer, and the size of the packet being transmitted. A mm-Wave channel with a small η i will always be chosen for congestion reduction. Fig. 7 shows the flow control and channel selection in a hub. Credit signal C1 and C2 indicating the congestion status of buffer ABTx_Ch1 and ABTx_Ch2 are offered to a channel selecting module for the choice of a less congested channel.
C. DEADLOCK ANALYSIS
The wired level of the network employs a dimension order (XY) routing which is shown to provide the shortest deadlock-free path while, in the wireless level of the network, the token-passing MAC mechanism ensures only one transmitting hub at the same time and the resources (buffers in this case) dependency arising from cyclic packet movement between hubs can never happen.
It should be noticed that the CAACS mechanism can inherently avoid deadlock in cross-level communication between the wired and wireless level of cm 3 WiNoC without introducing virtual channels. The channel selection will be operated packet by packet and if channel ch1 as shown in Fig. 6 , for instance, tends to be deadlocked (which means channel ch1 has got congested enough), the CAACS mechanism will choose channel ch2 for the packet transmission and thus, the resources dependency is broken.
Since deadlocks cannot happen in wired/wireless levels or between wired and wireless levels of the network, the cm 3 WiNoC is deadlock free.
V. EXPERIMENTAL EVALUATION A. OVERHEAD OF THE MM-WAVE MULTICHANNEL TRANSCEIVER
To evaluate the overhead of the millimeter-wave multichannel transceiver, the design is fabricated in 65-nm bulk CMOS technology with 1 V supply. Fig. 8 shows the micrographs of the major components of the transceiver. The chip size is limited by pads and on-wafer probing requirements. The area overhead of the triple-band LO is 0.39 mm 2 while the OOK transmitter and OOK receiver introduce 0.36 mm 2 and 0.38 mm 2 area overhead respectively.
Since the wireless transceiver is an energy-intensive component of the architecture, the evaluation towards energy efficiency of the transceiver is also conducted. The triple-band LO made of a 30 GHz VCO and two multipliers consumes a power of 6.3 mW equivalent to 2.1 mW per band.
Such a design significantly reduces the energy consumption of the transceiver. For example, compared with an individual 60 GHz VCO directly integrated on the chip which consumes a power of 3.4 mW, such a design achieves 62% power saving. Depending on the operating frequencies of the LO, the transceiver supports three data rates of 10 Gbps, 16 Gbps and 22 Gbps with bit energy consumption of 1.61 pJ, 1.95 pJ and 2.28 pJ respectively.
B. OVERHEAD OF TOKEN MANAGEMENT UNIT
To evaluate the overhead of the token management unit presented in section III-B, the RTL simulation of the TMU is conducted. The unit is firstly implemented with Verilog HDL and then synthesized with Synopsys Design Compiler using TSMC 65 nm process at typical PVT (tt, 1.20 V, 25 • C). Simulation results show that the TMU consumes 12.26 pW while introducing an area overhead of 2.88 µm 2 and a delay of 330 ps under 2.5 GHz operating frequency. Due to complete combinational logic implementation, the cost of TMU is negligible.
C. SYSTEMATIC PERFORMANCE EVALUATION 1) SIMULATION SETUP
Systematic level simulation is carried out on a cycle-accurate NoC simulator. A 64-node network with 4 radio-hubs is considered as shown in Fig. 3(a) which can be regarded as a twolevel network. The first level of the network is a traditional wired network organized in a mesh topology and the second level of the network is a wireless network consisting of 4 radio-hubs of which each is shared by 4 tiles of the firstlevel network. The detailed simulation setup is shown in Table 2 . VOLUME 8, 2020 
2) PERFORMANCE EVALUATION OF EVERY INDIVIDUAL CHANNEL IN CM 3 WINOCS
The mm-Wave multichannel transceiver is presented in section II-B which can provide WiNoCs with three RF channels for parallel packet transmission. However, such a digital system is driven by a clock and a flit can only be consumed on a negative or positive edge of the clock. For example, under the configuration listed in Table 2 , transmitting a 32-bit flit through a 10 Gbps channel will take 3.2 cycles but the flit will not be written into the ABRx buffer at the receiving end until the fourth cycle. So, whether a flit is transmitted through a 16 Gbps channel (taking 2 cycles) or a 22 Gbps (taking 1.45 cycles) channel, it takes the same amount of time (2 cycles) though the 22 Gbps channel is 37.5% faster than the 16 Gbps channel. Besides, from Table 1 we can see that the bit energy efficiency of channel 0 and channel 1 differ greatly and channel 0 consumes 17% more energy than channel 1 when transmitting the same amount of data. Thus, for a dual-channel radio hub implementation, a pair of channel ch1 and channel ch2 is always the best choice and the former is two times faster than the latter in actual performance.
As presented in section III-A, each hub in cm 3 WiNoCs is shared by four neighboring tiles and located in the if the congestion at hub gets alleviated, the network performance will be greatly improved. A faster mm-Wave channel provided by the transceiver integrated in the hub can transmit packets in shorter time thus alleviating the congestion at hub. Fig. 9 shows the impact of channel data rate on network performance. The 10 Gbps, 16 Gbps and 22 Gbps channels are physically achievable while the left three are just virtualized for comparison. As we can see from the figure that a faster channel can improve the saturated throughput and reduce the average packet delay of the network. Since 16 Gbps, 22 Gbps and 28 Gbps channels take the same time when transmitting a single flit of 32 bits as explained before, the corresponding polylines nearly coincide in the figure. In a single-channel WiNoC, multiple hubs share a single millimeter-wave channel and only the hub owning the token can transmit packets through the channel and if a hub supports more than one channel, then, more packets can be forwarded at the same time so that congestion at hub gets alleviated. Networks integrating three different channels and their combination are simulated and the results are as shown in Fig. 10 which shows that a dual-band WiNoC outperforms their single-band counterpart in both throughput and delay.
3) PERFORMANCE EVALUATION FOR CM 3 WINOCS
To verify the performance of cm 3 WiNoCs, the 10 Gbps and 16 Gbps channels are chosen to construct the architecture. The choice of the two channels has considered both energy consumption and data rate as analyzed in section V-C-2. For comparison, three types of network architectures are defined, namely Single_band_16Gbps, Dual_band_10-16Gbps_RANDOM and Dual_band_10-16Gbps_CAACS.
The first two architectures are different from each other in hub configuration that one has a single-channel transceiver integrated in each hub while the other has a dual-channel transceiver integrated in each hub. The Dual_band_10-16Gbps_RANDOM architecture employs a random channel selection mechanism. The last architecture also has dualchannel transceivers integrated in each hub while introducing the congestion-aware adaptive channel selection mechanism to further optimize the performance of the network. The mechanism has been detailed in section IV-B. Then, five different synthetic traffic patterns are deployed in the experiment, namely butterfly, shuffle, transpose1, transpose2 and random. The five traffic patterns can be divided into two classes, namely non-uniform traffic pattern and uniform random traffic pattern. The first four are of non-uniform traffic pattern and the last is of uniform random traffic pattern.
In uniform random traffic pattern, every source node in the network randomly transmits packets to every other node with the same probability and every destination node is randomly generated while the non-uniform traffic pattern produces traffic more regularly and locally as shown in Fig. 11 . The results were obtained under traditional wireline mesh topology under the same packet injection rate (PIR). It's easy to find that the transpose traffic pattern is the most likely to cause congestion while the shuffle traffic pattern is the least likely to cause congestion.
The average packet delay under various non-uniform traffic scenarios are as shown in Fig. 12 . The results indicate that the dual-channel architecture always outperforms the singlechannel architecture. By introducing the CAACS mechanism, the network performance of the architecture Dual_band_10-16Gbps_CAACS is further improved.
Since a dual-channel hub will provide WiNoCs with more alternative channels allowing more packets to be transmitted through long-distance single-hop wireless links at the same time, thus, the average packet delay decreases. Besides, the CAACS mechanism always chooses a less congested channel for packet transmission which balances the utilization between channels and alleviates the congestion at the radio hub.
It should be noticed that if the PIR is too small or too large, introducing more channels or the additional CAACS mechanism will not improve the performance of the network efficiently because a too small PIR won't cause any congestion while the introduction of multiple channels or the CAACS scheme is for congestion alleviation and a too large PIR will severely block the network and any congestion alleviation mechanism won't work anymore.
The average packet delay for the three architectures is also measured under uniform random traffic pattern as shown in Fig. 13. Fig. 14 determines the saturated throughput and the corresponding average packet delay in presence of uniform and non-uniform traffic patterns. By introducing extra channels, the saturated throughput of the architecture Dual_band_10-16Gbps_RANDOM increases by 98% under uniform traffic pattern compared with their single-channel counterpart and the CAACS mechanism achieves a further improvement in saturated throughput as indicated in the simulation result of architecture Dual_band_10-16Gbps_CAACS. Since the transpose traffic pattern causes the most congestion, the performance gain achieved by providing more channels is also relatively small and the CAACS mechanism hardly works under such traffic scenario. However, it should be also noticed that, though the CAACS mechanism contributes little on saturated throughput under the traffic patterns of trans-pose1 and transpose2, it achieved 51% and 31% decreases in average packet delay respectively.
VI. CONCLUSION
When the performance of mm-Wave single-channel WiNoCs can only be improved limitedly by various channel multiplexing schemes, we give an in-depth and comprehensive discussion about millimeter-wave multichannel wireless network-on-chip architecture in this paper. As an instance, a high-performance WiNoC architecture is proposed based on an existing mm-Wave multichannel transceiver. It's shown that, by introducing multiple non-overlapping millimeterwave channels, the performance of WiNoCs can be significantly improved. A deadlock-free congestion-aware adaptive channel selection mechanism is also proposed to improve the network performance further.
