6 research outputs found
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems
Modern Graphics Processing Units (GPUs) are now considered accelerators for
general purpose computation. A tight interaction between the GPU and the
interconnection network is the strategy to express the full potential on
capability computing of a multi-GPU system on large HPC clusters; that is the
reason why an efficient and scalable interconnect is a key technology to
finally deliver GPUs for scientific HPC. In this paper we show the latest
architectural and performance improvement of the APEnet+ network fabric, a
FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps
of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The
board implements a Remote Direct Memory Access (RDMA) protocol that leverages
upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to
obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on
the development activities for 2013 focusing on the adoption of the latest
generation 28 nm FPGAs and the preliminary tests performed on this new
platform.Comment: Proceedings for the 20th International Conference on Computing in
High Energy and Nuclear Physics (CHEP
A heterogeneous many-core platform for experiments on scalable custom interconnects and management of fault and critical events, applied to many-process applications: Vol. II, 2012 technical report
This is the second of a planned collection of four yearly volumes describing
the deployment of a heterogeneous many-core platform for experiments on
scalable custom interconnects and management of fault and critical events,
applied to many-process applications. This volume covers several topics, among
which: 1- a system for awareness of faults and critical events (named LO|FA|MO)
on experimental heterogeneous many-core hardware platforms; 2- the integration
and test of the experimental hardware heterogeneous many-core platform QUoNG,
based on the APEnet+ custom interconnect; 3- the design of a
Software-Programmable Distributed Network Processor architecture (DNP) using
ASIP technology; 4- the initial stages of design of a new DNP generation onto a
28nm FPGA. These developments were performed in the framework of the EURETILE
European Project under the Grant Agreement no. 247846.Comment: 119 page
Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network
APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth
Solutions for the optimization of the software interface on an FPGA-based NIC
The theme of the research is the study of solutions for the optimization of the software interface on FPGA-based Network Interface Cards. The research activity was carried out in the APE group at INFN (Istituto Nazionale di Fisica Nucleare), which has been historically active in designing of high performance scalable networks for hybrid nodes (CPU/GPU) clusters.
The result of the research is validated on two projects the APE group is currently working on, both allowing fast prototyping for solutions and hardware-software co-design: APEnet (a PCIe FPGA-based 3D torus network controller) and NaNet (FPGA-based family of NICs mainly dedicated to real-time, low-latency computing systems such as fast control systems or High Energy Physics Data Acquisition Systems). NaNet is also used to validate a GPU-controlled device driver to improve network perfomances, i.e. even lower latency of the communication, while used in combination with existing user-space software.
This research is also gaining results in the "Horizon2020 FET-HPC ExaNeSt project", which aims to prototype and develop solutions for some of the crucial problems on the way towards production of Exascale-level Supercomputers, where the APE group is actively contribuiting to the development of the network / interconnection infrastructure
Characterization and optimization of network traffic in cortical simulation
Considering the great variety of obstacles the Exascale systems
have to face in the next future, a deeper attention will be given in this thesis
to the interconnect and the power consumption.
The data movement challenge involves the whole hierarchical organization
of components in HPC systems — i.e. registers, cache, memory, disks.
Running scientific applications needs to provide the most effective methods
of data transport among the levels of hierarchy. On current petaflop systems,
memory access at all the levels is the limiting factor in almost all applications.
This drives the requirement for an interconnect achieving adequate rates of
data transfer, or throughput, and reducing time delays, or latency, between
the levels.
Power consumption is identified as the largest hardware research challenge.
The annual power cost to operate the system would be above 2.5 B$
per year for an Exascale system using current technology. The research for alternative
power-efficient computing device is mandatory for the procurement
of the future HPC systems.
In this thesis, a preliminary approach will be offered to the critical process of
co-design. Co-desing is defined as the simultaneos design of both hardware
and software, to implement a desired function. This process both integrates
all components of the Exascale initiative and illuminates the trade-offs that
must be made within this complex undertaking