2 research outputs found
Harmonic-summing Module of SKA on FPGA--Optimising the Irregular Memory Accesses
The Square Kilometre Array (SKA), which will be the world's largest radio
telescope, will enhance and boost a large number of science projects, including
the search for pulsars. The frequency domain acceleration search is an
efficient approach to search for binary pulsars. A significant part of it is
the harmonic-summing module, which is the research subject of this paper. Most
of the operations in the harmonic-summing module are relatively cheap
operations for FPGAs. The main challenge is the large number of point accesses
to off-chip memory which are not consecutive but irregular. Although
harmonic-summing alone might not be targeted for FPGA acceleration, it is a
part of the pulsar search pipeline that contains many other compute-intensive
modules, which are efficiently executed on FPGA. Hence having the
harmonic-summing also on the FPGA will avoid off-board communication, which
could destroy other acceleration benefits. Two types of harmonic-summing
approaches are investigated in this paper: 1) storing intermediate data in
off-chip memory and 2) processing the input signals directly without storing.
For the second type, two approaches of caching data are proposed and evaluated:
1) preloading points that are frequently touched 2) preloading all necessary
points that are used to generate a chunk of output points. OpenCL is adopted to
implement the proposed approaches. In an extensive experimental evaluation, the
same OpenCL kernel codes are evaluated on FPGA boards and GPU cards. Regarding
the proposed preloading methods, preloading all necessary points method while
reordering the input signals is faster than all the other methods. While in raw
performance a single FPGA board cannot compete with a GPU, in terms of energy
dissipation, GPU costs up to 2.6x times more energy than that of FPGAs in
executing the same NDRange kernels.Comment: 14 pages, 12 figures, 7 tables, 30 reference
Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL
Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal
processing design of the Square Kilometre Array (SKA) as acceleration hardware.
The frequency domain acceleration search (FDAS) module is an important part of
the SKA1-MID pulsar search engine. To develop for a yet to be finalised
hardware, for cross-discipline interoperability and to achieve fast
prototyping, OpenCL as a high-level FPGA synthesis approach is employed to
create the sub-modules of FDAS. The FT convolution and the harmonic-summing
plus some other minor sub-modules are elements in the FDAS module that have
been well-optimised separately before. In this paper, we explore the design
space of combining well-optimised designs, dealing with the ensuing need to
trade-off and compromise. Pipeline computing is employed to handle multiple
input arrays at high speed. The hardware target is to employ multiple high-end
FPGAs to process the combined FDAS module. The results show interesting
consequences, where the best individual solutions are not necessarily the best
solutions for the speed of a pipeline where FPGA resources and memory bandwidth
need to be shared. By proposing multiple buffering techniques to the pipeline,
the combined FDAS module can achieve up to 2x speedup over implementations
without pipeline computing. We perform an extensive experimental evaluation on
multiple FPGA boards (Arria 10) hosted in a workstation and compare to a
technology comparable mid-range GPU