This thesis discuss the design and the implementation of an FPGA-Based
Network Processor for scientific computing, like Lattice Quantum ChromoDinamycs
(LQCD) and fluid-dynamics applications based on Lattice Boltzmann
Methods (LBM). State-of-the-art programs in this (and other similar)
applications have a large degree of available parallelism, that can be easily
exploited on massively parallel systems, provided the underlying communication
network has not only high-bandwidth but also low-latency.
I have designed in details, built and tested in hardware, firmware and
software an implementation of a Network Processor, tailored for the most
recent families of multi-core processors. The implementation has been developed
on an FPGA device to easily interface the logic of NWP with the CPU
I/O sub-system.
In this work I have assessed several ways to move data between the main
memory of the CPU and the I/O sub-system to exploit high data throughput
and low latency, enabling the use of “Programmed Input Output” (PIO),
“Direct Memory Access” (DMA) and “Write Combining” memory-settings.
On the software side, I developed and test a device driver for the Linux
operating system to access the NWP device, as well as a system library to
efficiently access the network device from user-applications.
This thesis demonstrates the feasibility of a network infrastructure that
saturates the maximum bandwidth of the I/O sub-systems available on recent
CPUs, and reduces communication latencies to values very close to those
needed by the processor to move data across the chip boundary