In this chapter we describe the architecture of a torus interconnect and its implementation on FPGAs, which so far has been used in two different HPC systems. The network design is optimized for applications which benefit from a tightly coupled network and allows to exchange relatively small messages between nearest neighbours at a high rate. Examples for such applications are Lattice Quantum Chromodynamics (LQCD) simulations and fluid dynamics applications using the Lattice Boltzmann Method (LBM). We describe the details of the implementation of our torus network architecture for two massively parallel machines, QPACE and Aurora, and present details on the FPGA resource usage. Furthermore, we discuss optimizations which were necessary to fit the design. Finally, we provide an outlook on possible implementation changes when using more recent generations of FPGAs
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.