5 research outputs found
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
Routing and Wavelength Assignment for Multicast Communication in Optical Network-on-Chip
An Optical Network-on-Chip (ONoC) is an emerging chip-level optical interconnection technology to realise high-performance and power-efficient inter-core communication for many-core processors. Within the field, multicast communication is one of the most important inter-core communication forms. It is not only widely used in parallel computing applications in Chip Multi-Processors (CMPs), but also common in emerging areas such as neuromorphic computing. While many studies have been conducted on designing ONoC architectures and routing schemes to support multicast communication, most existing solutions adopt the methods that were initially proposed for electrical interconnects. These solutions can neither fully take advantage of optical communication nor address the special requirements of an ONoC. Moreover, most of them focus only on the optimisation of one multicast, which limits the practical applications because real systems often have to handle multiple multicasts requested from various applications. Hence, this thesis will address the design of a high-performance communication scheme for multiple multicasts by taking into account the unique characteristics and constraints of an ONoC.
This thesis studies the problem from a network-level perspective. The design methodology is to optimally route all multicasts requested simultaneously from the applications in an ONoC, with the objective of efficiently utilising available wavelengths. The novelty is to adopt multicast-splitting strategies, where a multicast can be split into several sub-multicasts according to the distribution of multicast nodes, in order to reduce the conflicts of different multicasts. As routing and wavelength assignment problem is an NP-hard problem, heuristic approaches that use the multicast-splitting strategy are proposed in this thesis. Specifically, three routing and wavelength assignment schemes for multiple multicasts in an ONoC are proposed for different problem domains.
Firstly, PRWAMM, a Path-based Routing and Wavelength Assignment for Multiple Multicasts in an ONoC, is proposed. Due to the low manufacture complexity requirement of an ONoC, e.g., no splitters, path-based routing is studied in PRWAMM. Two wavelength-assignment strategies for multiple multicasts under path-based routing are proposed. One is an intramulticast wavelength assignment, which assigns wavelength(s) for one multicast. The other is an inter-multicast wavelength assignment, which assigns wavelength(s) for different multicasts, according to the distributions of multicasts. Simulation results show that PRWAMM can reduce the average number of wavelengths by 15% compared to other path-based schemes.
Secondly, RWADMM, a Routing and Wavelength Assignment scheme for Distribution-based Multiple Multicasts in a 2D ONoC, is proposed. Because path-based routing lacks flexibility, it cannot reduce the link conflicts effectively. Hence, RWADMM is designed, based on the distribution of different multicasts, which includes two algorithms. One is an optimal routing and wavelength assignment algorithm for special distributions of multicast nodes. The other is a heuristic routing and wavelength assignment algorithm for random distributions of multicast nodes. Simulation results show that RWADMM can reduce the number of wavelengths by 21.85% on average, compared to the state-of-the-art solutions in a 2D ONoC.
Thirdly, CRRWAMM, a Cluster-based Routing and Reusable Wavelength Assignment scheme for Multiple Multicasts in a 3D ONoC, is proposed. Because of the different architectures with a 2D ONoC (e.g., the layout of nodes, optical routers), the methods designed for a 2D ONoC cannot be simply extended to a 3D ONoC. In CRRWAMM, the distribution of multicast nodes in a mesh-based 3D ONoC is analysed first. Then, routing theorems for special instances are derived. Based on the theorems, a general routing scheme, which includes a cluster-based routing method and a reusable wavelength assignment method, is proposed. Simulation results show that CRRWAMM can reduce the number of wavelengths by 33.2% on average, compared to other schemes in a 3D ONoC.
Overall, the three routing and wavelength assignment schemes can achieve high-performance multicast communication for multiple multicasts of their problem domains in an ONoC. They all have the advantages of a low routing complexity, a low wavelength requirement, and good scalability, compared to their counterparts, respectively. These methods make an ONoC a flexible high-performance computing platform to execute various parallel applications with different multicast requirements.
As future work, I will investigate the power consumption of various routing schemes for multicasts. Using a multicast-splitting strategy may increase power consumption since it needs different wavelengths to send packets to different destinations for one multicast, though the reduction of wavelengths used in the schemes can also potentially decrease overall power consumption. Therefore, how to achieve the best trade-off between the total number of wavelengths used and the number of sub-multicasts in order to reduce power consumption will be interesting future research