2 research outputs found
sPIN: High-performance streaming Processing in the Network
Optimizing communication performance is imperative for large-scale computing
because communication overheads limit the strong scalability of parallel
applications. Today's network cards contain rather powerful processors
optimized for data movement. However, these devices are limited to fixed
functions, such as remote direct memory access. We develop sPIN, a portable
programming model to offload simple packet processing functions to the network
card. To demonstrate the potential of the model, we design a cycle-accurate
simulation environment by combining the network simulator LogGOPSim and the CPU
simulator gem5.
We implement offloaded message matching, datatype processing, and collective
communications and demonstrate transparent full-application speedups.
Furthermore, we show how sPIN can be used to accelerate redundant in-memory
filesystems and several other use cases.
Our work investigates a portable packet-processing network acceleration model
similar to compute acceleration with CUDA or OpenCL. We show how such network
acceleration enables an eco-system that can significantly speed up applications
and system services.Comment: 20 page
PsPIN: A high-performance low-power architecture for flexible in-network compute
The capacity of offloading data and control tasks to the network is becoming
increasingly important, especially if we consider the faster growth of network
speed when compared to CPU frequencies. In-network compute alleviates the host
CPU load by running tasks directly in the network, enabling additional
computation/communication overlap and potentially improving overall application
performance. However, sustaining bandwidths provided by next-generation
networks, e.g., 400 Gbit/s, can become a challenge. sPIN is a programming model
for in-NIC compute, where users specify handler functions that are executed on
the NIC, for each incoming packet belonging to a given message or flow. It
enables a CUDA-like acceleration, where the NIC is equipped with lightweight
processing elements that process network packets in parallel. We investigate
the architectural specialties that a sPIN NIC should provide to enable
high-performance, low-power, and flexible packet processing. We introduce
PsPIN, a first open-source sPIN implementation, based on a multi-cluster RISC-V
architecture and designed according to the identified architectural
specialties. We investigate the performance of PsPIN with cycle-accurate
simulations, showing that it can process packets at 400 Gbit/s for several use
cases, introducing minimal latencies (26 ns for 64 B packets) and occupying a
total area of 18.5 mm 2 (22 nm FDSOI)