40,105 research outputs found

    Asynchronous Execution of Python Code on Task Based Runtime Systems

    Get PDF
    Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards

    CUDArray: CUDA-based NumPy

    Get PDF

    Just-In-Time Compilation of NumPy Vector Operations

    Get PDF
    In this paper, we introduce JIT compilation for thehigh-productivity framework Python/NumPy in order to boost theperformance significantly. The JIT compilation of Python/NumPyis completely transparent to the user – the runtime system willautomatically JIT compile and execute the NumPy instructionsencountered in a Python application. In other words, we introducea framework that provides the high-productivity from Pythonwhile maintaining the high-performance of a low-level, compiledlanguage.We transforms NumPy vector instruction into an AbstractSyntax Tree representation that creates the basis for furtheroptimizations. From the AST we auto-generate C code whichwe compile into computational kernels and execute. These incorporatetemporary array removal and loop-fusion which are mainbenefactors in the achieved speedups. In order to amortize theoverhead of creation, we also implement a cache for the compiledkernels.We evaluate the JIT compilation by executing several scientificcomputing benchmarks on an AMD. Compared to NumPy, weachieve speedups of a factor 4.72 for a N-Body application and7.51 for a Jacobi Stencil application executing on a single CPUcore

    cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

    Full text link
    Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions, structures or objects are hard to map onto modern processor architectures efficiently. The work in this paper introduces a new abstract machine framework, cphVB, that enables vector oriented high-level programming languages to map onto a broad range of architectures efficiently. The idea is to close the gap between high-level languages and hardware optimized low-level implementations. By translating high-level vector operations into an intermediate vector bytecode, cphVB enables specialized vector engines to efficiently execute the vector operations. The primary success parameters are to maintain a complete abstraction from low-level details and to provide efficient code execution across different, modern, processors. We evaluate the presented design through a setup that targets multi-core CPU architectures. We evaluate the performance of the implementation using Python implementations of well-known algorithms: a jacobi solver, a kNN search, a shallow water simulation and a synthetic stencil simulation. All demonstrate good performance
    • …
    corecore