33,553 research outputs found
cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications
Modern processor architectures, in addition to having still more cores, also
require still more consideration to memory-layout in order to run at full
capacity. The usefulness of most languages is deprecating as their
abstractions, structures or objects are hard to map onto modern processor
architectures efficiently.
The work in this paper introduces a new abstract machine framework, cphVB,
that enables vector oriented high-level programming languages to map onto a
broad range of architectures efficiently. The idea is to close the gap between
high-level languages and hardware optimized low-level implementations. By
translating high-level vector operations into an intermediate vector bytecode,
cphVB enables specialized vector engines to efficiently execute the vector
operations.
The primary success parameters are to maintain a complete abstraction from
low-level details and to provide efficient code execution across different,
modern, processors. We evaluate the presented design through a setup that
targets multi-core CPU architectures. We evaluate the performance of the
implementation using Python implementations of well-known algorithms: a jacobi
solver, a kNN search, a shallow water simulation and a synthetic stencil
simulation. All demonstrate good performance
Programming MPSoC platforms: Road works ahead
This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud
deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy
Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs
As modern FPGAs evolve to include more het- erogeneous processing elements,
such as ARM cores, it makes sense to consider these devices as processors first
and FPGA accelerators second. As such, the conventional FPGA develop- ment
environment must also adapt to support more software- like programming
functionality. While high-level synthesis tools can help reduce FPGA
development time, there still remains a large expertise gap in order to realize
highly performing implementations. At a system-level the skill set necessary to
integrate multiple custom IP hardware cores, interconnects, memory interfaces,
and now heterogeneous processing elements is complex. Rather than drive FPGA
development from the hardware up, we consider the impact of leveraging Python
to ac- celerate application development. Python offers highly optimized
libraries from an incredibly large developer community, yet is limited to the
performance of the hardware system. In this work we evaluate the impact of
using PYNQ, a Python development environment for application development on the
Xilinx Zynq devices, the performance implications, and bottlenecks associated
with it. We compare our results against existing C-based and hand-coded
implementations to better understand if Python can be the glue that binds
together software and hardware developers.Comment: To appear in 2017 IEEE 25th Annual International Symposium on
Field-Programmable Custom Computing Machines (FCCM'17
- …