Search CORE

40,105 research outputs found

Asynchronous Execution of Python Code on Task Based Runtime Systems

Author: Amini Parsa
Brandt Steven
Diehl Patrick
Huck Kevin
Isaacs Kate
Kaiser Hartmut
Kheirkhahan Alireza
Serio Adrian
Shirzad Shahrzad
Tohid R.
Wagle Bibek
Williams Katy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2018
Field of study

Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards

arXiv.org e-Print Archive

Crossref

The University of Arizona

CUDArray: CUDA-based NumPy

Author: Larsen Anders Boesen Lindbo
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

Just-In-Time Compilation of NumPy Vector Operations

Author: . Brian Vinter
. Johannes Lund
. Mads R. B. Kristensen
. Simon A. F. Lund
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 01/12/2013
Field of study

In this paper, we introduce JIT compilation for thehigh-productivity framework Python/NumPy in order to boost theperformance significantly. The JIT compilation of Python/NumPyis completely transparent to the user – the runtime system willautomatically JIT compile and execute the NumPy instructionsencountered in a Python application. In other words, we introducea framework that provides the high-productivity from Pythonwhile maintaining the high-performance of a low-level, compiledlanguage.We transforms NumPy vector instruction into an AbstractSyntax Tree representation that creates the basis for furtheroptimizations. From the AST we auto-generate C code whichwe compile into computational kernels and execute. These incorporatetemporary array removal and loop-fusion which are mainbenefactors in the achieved speedups. In order to amortize theoverhead of creation, we also implement a cache for the compiledkernels.We evaluate the JIT compilation by executing several scientificcomputing benchmarks on an AMD. Compared to NumPy, weachieve speedups of a factor 4.72 for a N-Body application and7.51 for a Jacobi Stencil application executing on a single CPUcore

Copenhagen University Research Information System

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Author: Blum Troels
Kristensen Mads Ruben Burgdorff
Lund Simon Andreas Frimann
Vinter Brian
Publication venue
Publication date: 01/01/2012
Field of study

Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract machine framework, cphVB, that enables vector oriented high-level programming languages to map onto a broad range of architectures efficiently. The idea is to close the gap between high-level languages and hardware optimized low-level implementations. By translating high-level vector operations into an intermediate vector bytecode, cphVB enables specialized vector engines to efficiently execute the vector operations. The primary success parameters are to maintain a complete abstraction from low-level details and to provide efficient code execution across different, modern, processors. We evaluate the presented design through a setup that targets multi-core CPU architectures. We evaluate the performance of the implementation using Python implementations of well-known algorithms: a jacobi solver, a kNN search, a shallow water simulation and a synthetic stencil simulation. All demonstrate good performance

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System