2 research outputs found

    Harmonic-summing Module of SKA on FPGA--Optimising the Irregular Memory Accesses

    Full text link
    The Square Kilometre Array (SKA), which will be the world's largest radio telescope, will enhance and boost a large number of science projects, including the search for pulsars. The frequency domain acceleration search is an efficient approach to search for binary pulsars. A significant part of it is the harmonic-summing module, which is the research subject of this paper. Most of the operations in the harmonic-summing module are relatively cheap operations for FPGAs. The main challenge is the large number of point accesses to off-chip memory which are not consecutive but irregular. Although harmonic-summing alone might not be targeted for FPGA acceleration, it is a part of the pulsar search pipeline that contains many other compute-intensive modules, which are efficiently executed on FPGA. Hence having the harmonic-summing also on the FPGA will avoid off-board communication, which could destroy other acceleration benefits. Two types of harmonic-summing approaches are investigated in this paper: 1) storing intermediate data in off-chip memory and 2) processing the input signals directly without storing. For the second type, two approaches of caching data are proposed and evaluated: 1) preloading points that are frequently touched 2) preloading all necessary points that are used to generate a chunk of output points. OpenCL is adopted to implement the proposed approaches. In an extensive experimental evaluation, the same OpenCL kernel codes are evaluated on FPGA boards and GPU cards. Regarding the proposed preloading methods, preloading all necessary points method while reordering the input signals is faster than all the other methods. While in raw performance a single FPGA board cannot compete with a GPU, in terms of energy dissipation, GPU costs up to 2.6x times more energy than that of FPGAs in executing the same NDRange kernels.Comment: 14 pages, 12 figures, 7 tables, 30 reference

    Combining Multiple Optimised FPGA-based Pulsar Search Modules Using OpenCL

    Full text link
    Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometre Array (SKA) as acceleration hardware. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalised hardware, for cross-discipline interoperability and to achieve fast prototyping, OpenCL as a high-level FPGA synthesis approach is employed to create the sub-modules of FDAS. The FT convolution and the harmonic-summing plus some other minor sub-modules are elements in the FDAS module that have been well-optimised separately before. In this paper, we explore the design space of combining well-optimised designs, dealing with the ensuing need to trade-off and compromise. Pipeline computing is employed to handle multiple input arrays at high speed. The hardware target is to employ multiple high-end FPGAs to process the combined FDAS module. The results show interesting consequences, where the best individual solutions are not necessarily the best solutions for the speed of a pipeline where FPGA resources and memory bandwidth need to be shared. By proposing multiple buffering techniques to the pipeline, the combined FDAS module can achieve up to 2x speedup over implementations without pipeline computing. We perform an extensive experimental evaluation on multiple FPGA boards (Arria 10) hosted in a workstation and compare to a technology comparable mid-range GPU