In this work, we propose a configurable many-core overlay for
high-performance embedded computing. The size of internal memory, supported
operations and number of ports can be configured independently for each core of
the overlay. The overlay was evaluated with matrix multiplication, LU
decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform.
The results show that using a system-level many-core overlay avoids complex
hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software
Programmers (FSP 2014) (arXiv:1408.4423