Micro-core architectures combine many low memory, low power computing cores
together in a single package. These are attractive for use as accelerators but
due to limited on-chip memory and multiple levels of memory hierarchy, the way
in which programmers offload kernels needs to be carefully considered. In this
paper we use Python as a vehicle for exploring the semantics and abstractions
of higher level programming languages to support the offloading of
computational kernels to these devices. By moving to a pass by reference model,
along with leveraging memory kinds, we demonstrate the ability to easily and
efficiently take advantage of multiple levels in the memory hierarchy, even
ones that are not directly accessible to the micro-cores. Using a machine
learning benchmark, we perform experiments on both Epiphany-III and MicroBlaze
based micro-cores, demonstrating the ability to compute with data sets of
arbitrarily large size. To provide context of our results, we explore the
performance and power efficiency of these technologies, demonstrating that
whilst these two micro-core technologies are competitive within their own
embedded class of hardware, there is still a way to go to reach HPC class GPUs.Comment: Accepted manuscript of paper in Journal of Parallel and Distributed
Computing 13