40,105 research outputs found
Asynchronous Execution of Python Code on Task Based Runtime Systems
Despite advancements in the areas of parallel and distributed computing, the
complexity of programming on High Performance Computing (HPC) resources has
deterred many domain experts, especially in the areas of machine learning and
artificial intelligence (AI), from utilizing performance benefits of such
systems. Researchers and scientists favor high-productivity languages to avoid
the inconvenience of programming in low-level languages and costs of acquiring
the necessary skills required for programming at this level. In recent years,
Python, with the support of linear algebra libraries like NumPy, has gained
popularity despite facing limitations which prevent this code from distributed
runs. Here we present a solution which maintains both high level programming
abstractions as well as parallel and distributed efficiency. Phylanx, is an
asynchronous array processing toolkit which transforms Python and NumPy
operations into code which can be executed in parallel on HPC resources by
mapping Python and NumPy functions and variables into a dependency tree
executed by HPX, a general purpose, parallel, task-based runtime system written
in C++. Phylanx additionally provides introspection and visualization
capabilities for debugging and performance analysis. We have tested the
foundations of our approach by comparing our implementation of widely used
machine learning algorithms to accepted NumPy standards
Just-In-Time Compilation of NumPy Vector Operations
In this paper, we introduce JIT compilation for thehigh-productivity framework Python/NumPy in order to boost theperformance significantly. The JIT compilation of Python/NumPyis completely transparent to the user – the runtime system willautomatically JIT compile and execute the NumPy instructionsencountered in a Python application. In other words, we introducea framework that provides the high-productivity from Pythonwhile maintaining the high-performance of a low-level, compiledlanguage.We transforms NumPy vector instruction into an AbstractSyntax Tree representation that creates the basis for furtheroptimizations. From the AST we auto-generate C code whichwe compile into computational kernels and execute. These incorporatetemporary array removal and loop-fusion which are mainbenefactors in the achieved speedups. In order to amortize theoverhead of creation, we also implement a cache for the compiledkernels.We evaluate the JIT compilation by executing several scientificcomputing benchmarks on an AMD. Compared to NumPy, weachieve speedups of a factor 4.72 for a N-Body application and7.51 for a Jacobi Stencil application executing on a single CPUcore
cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications
Modern processor architectures, in addition to having still more cores, also
require still more consideration to memory-layout in order to run at full
capacity. The usefulness of most languages is deprecating as their
abstractions, structures or objects are hard to map onto modern processor
architectures efficiently.
The work in this paper introduces a new abstract machine framework, cphVB,
that enables vector oriented high-level programming languages to map onto a
broad range of architectures efficiently. The idea is to close the gap between
high-level languages and hardware optimized low-level implementations. By
translating high-level vector operations into an intermediate vector bytecode,
cphVB enables specialized vector engines to efficiently execute the vector
operations.
The primary success parameters are to maintain a complete abstraction from
low-level details and to provide efficient code execution across different,
modern, processors. We evaluate the presented design through a setup that
targets multi-core CPU architectures. We evaluate the performance of the
implementation using Python implementations of well-known algorithms: a jacobi
solver, a kNN search, a shallow water simulation and a synthetic stencil
simulation. All demonstrate good performance
- …