Improving resource usage in large FPGA accelerators

Abstract

In modern FPGA devices, place and route has become a difficult task for the underlying FPGA implementation tools. This is caused by an increase of device size and complexity. As devices grow in size and number of resources, their topology also grows in complexity. Larger devices are divided in different regions. While this allows to pack a larger number of resources in a single device, it creates a new set of challenges in order to obtain good quality of results while using as many resources as possible. Devices such as Xilinx’s Alveo accelerators are comprised of multiple regions called Super Logic Regions (SLR). Crossing from one region to another adds some delay to signal propagation. This can hurt overall timing if implementation tool decides to scatter a single accelerator among different SLRs. Thus, the design may not reach operating frequencies expected by the user. In a similar fashion as the SLRs, they usually have multiple independent memory banks that interface with DDR modules. This requires memory allocations and interconnection to be manually managed by the user, causing extra burden to users. Otherwise, the design will not be able to take profit of the aggregated available bandwidth. We propose methods to improve resource and bandwidth usage that allow a user to direct how a design is built and implemented while maintaining device abstraction and minimal development overhead

    Similar works