41 research outputs found

    AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

    Get PDF
    The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and Scientific Computing (PyHPC 2018

    Scheduling policies for Big Data workflows

    Get PDF
    The aim of this master thesis is to both give the programmer some guidelines to achieve good scalabilities with tasked based programming models and to improve the COMPSs runtime scheduler the capabilities to reach this scaling objectives

    Desenvolupament de codis d'àlgebra lineal amb PyCOMPSs

    Get PDF
    Arrel de l’interès per ser capaç d’utilitzar les llibreries de NumPy amb MKL en entorns distribuïts, va sorgir una col·laboració entre Intel R i el departament de Workflows and distributed computing del BSC. Aquest treball explora un seguit de possibles línies d’investigació que tenen per objectiu millorar el rendiment i facilitar la utilització simultània de MKL a través de NumPy i PyCOMPSs per a realitzar operacions d’àlgebra lineal en entorns distribuïts. Pel que fa als algoritmes matemàtics utilitzats, en una primera instància es busquen solucions per aconseguir un rendiment màxim de l’algoritme de multiplicació de matrius mitjançant la descomposició de les mateixes en blocs quadrats. A continuació, s’ha adaptat el codi ja present al BSC per calcular la factorització de Cholesky amb inicialització distribuïda i augmentar-ne el paral·lelisme. L’última tasca realitzada en aquest apartat consisteix en la implementació d’un algoritme per al càlcul de la descomposició QR mitjançant la descomposició en matrius quadrades i inicialització distribuïda. Tenint en compte la importància de la planificació per obtenir un bon rendiment de les aplicacions, a continuació s’ha procedit a fer una refactorització del planificador per introduir-hi un conjunt de noves polítiques. Concretament, s’han afegit planificadors que segueixen una política FIFO, una LIFO i una FIFO modificada per prioritzar la localitat de les dades i minimitzar així la quantitat de transferències. Finalment, s’ha dissenyat i implementat una llibreria que utilitza un sistema de wrapping sobre la llibreria NumPy que permet la introducció progressiva dels algoritmes distribuïts sense la necessitat d’implementar-la tota de cop. En tot moment es garanteix, però, que l’usuari podrà accedir a totes les funcionalitats de la llibreria original.A raíz del interés por ser capaces de utilitzar las librerias de NumPy con MKL en entornos distribuidos, surgió una colaboración entre Intel R i el departamento de Workflows and distributed computing del BSC. Este trabajo explora un conjunto de posibles lineas de investigación que tienen por objetivo mejorar el rendimiento y facilitar la utilización simultánea de MKL a través de NumPy i PyCOMPSs para realizar operaciones de álgebra lineal en entornos distribuidos. En lo que afecta a los algoritmos matemáticos utilizados, en una primera instancia se buscan soluciones para conseguir un rendimiento máximo de la multiplicación de matrices mediante la descomposicion de las mismas en bloques cuadrados. A continuación, se ha adaptado el código ya existente en el BSC para calcular la factorización de Cholesky con la inicialización distribuida i aumentar su grado de paralelismo. La última tarea realizada consiste en la implementación de un algoritmo para el cálculo de la descomposición QR mediante la descomposición de la matriz en bloques cuadrados realizando la inicialización de forma distribuida. Considerando el impacto de la planificación para obtener un buen rendimiento de las aplicaciones, a continuación se ha procedido a realizar una refactorización del planificador con el objetivo de introducir nuevas políticas. Concretamente, se han añadido planificadores que siguen una política FIFO, una LIFO i una FIFO modificada para priorizar la localidad de los datos y evitar así el número de transferencias. Finalmente, se ha diseñado e implementado una librería que utiliza un sistema de wrapping sobre la librería NumPy que permite la introducción progresiva de los algoritmos distribuidos sin la necesidad de implementar-la toda de golpe a la vez que el usuario sigue teniendo acceso a todas las funcionalidades de la librería original.In the wake of the interest in being capable to use MKL through NumPy in distributed systems, a collaboration between Intel R and the BSC’s Work- flows and distributed computing department has been created. This work explores several possible investigation lines that aims to improve the behaviour and ease the use of NumPy with MKL and PyCOMPSs simultaneously to carry out linear algebra operations in distributed systems. Regarding the used algorithms, first of all some solutions are explored to achieve a better performance in the blocked matrix multiplication. Next, the code already present in the center that computes the Cholesky factorisation has been modified in order to improve his parallelisation level and initialise the matrix in a distributed way. Finally, an algorithm to compute a QR decomposition through the matrix decomposition in smaller square matrix initialized in a distributed way has been implemented. Next, a refactor in the COMPSs scheduler has been done in order to ease the creation of new scheduling policies. Once this work has been done, some new schemes has been added. In particular, schedulers with FIFO, LIFO and FIFO modified to schedule child tasks in the father’s worker to minimize the amount of data transfers has been added. Finally, a library that wraps the entire NumPy library has been implemented. This new environment allows the team to introduce the distributed implementations progressively assuring that importing it the user will have access to the full stack of NumPy’s functionalities, even those not still implemented

    AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

    Get PDF
    The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores

    Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs

    Get PDF
    Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). Javier Conejero postdoctoral contract is co-financed by the Ministry of Economy and Competitiveness under Juan de la Cierva Formación postdoctoral fellowship number FJCI-2015-24651. Cristian Ramon-Cortes predoctoral contract is financed by the Ministry of Economy and Competitiveness under the contract BES-2016-076791. This work is supported by the Intel-BSC Exascale Lab. This work has been supported by the European Commission through the Horizon 2020 Research and Innovation program under contract 687584 (TANGO project).Peer ReviewedPostprint (published version

    AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

    Get PDF
    International audienceThe last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax , automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning the most commonly used to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoPar-allel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores

    AutoParallel: Automatic parallelisation and distributed execution of affine loop nests in Python

    Get PDF
    International audienceThe last improvements in programming languages and models have focused on simplicity and abstraction; leading Python to the top of the list of the programming languages. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and contains one single annotation (in the form of a Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores. The evaluation demonstrates that AutoParallel goes one step further in easing the development of distributed applications. On the one hand, the programmability evaluation highlights the benefits of using a single Python decorator instead of manually annotating each task and its parameters or, even worse, having to develop the parallel code explicitly (e.g., using OpenMP, MPI). On the other hand, the performance evaluation demonstrates that AutoParallel is capable of automatically generating task-based workflows from sequential Python code while achieving the same performances than manually taskified versions of established state-of-the-art algorithms (i.e., Cholesky, LU, and QR decompositions). Finally, AutoParallel is also capable of automatically building data blocks to increase the tasks' granularity; freeing the user from creating the data chunks, and redesigning the algorithm. For advanced users, we believe that this feature can be useful as a baseline to design blocked algorithms

    Enabling Python to execute efficiently in heterogeneous distributed infrastructures with PyCOMPSs

    Get PDF
    Python has been adopted as programming language by a large number of scientific communities. Additionally to the easy programming interface, the large number of libraries and modules that have been made available by a large number of contributors, have taken this language to the top of the list of the most popular programming languages in scientific applications. However, one main drawback of Python is the lack of support for concurrency or parallelism. PyCOMPSs is a proved approach to support task-based parallelism in Python that enables applications to be executed in parallel in distributed computing platforms. This paper presents PyCOMPSs and how it has been tailored to execute tasks in heterogeneous and multi-threaded environments. We present an approach to combine the task-level parallelism provided by PyCOMPSs with the thread-level parallelism provided by MKL. Performance and behavioral results in distributed computing heterogeneous clusters show the benefits and capabilities of PyCOMPSs in both HPC and Big Data infrastructures.Thiswork has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). Javier Conejero postdoctoral contract is co-financed by the Ministry of Economy and Competitiveness under Juan de la Cierva Formación postdoctoral fellowship number FJCI- 2015-24651. Cristian Ramon-Cortes predoctoral contract is financed by the Ministry of Economy and Competitiveness under the contract BES-2016-076791. This work is supported by the Intel-BSC Exascale Lab. This work has been supported by the European Commission through the Horizon 2020 Research and Innovation program under contract 687584 (TANGO project).Peer ReviewedPostprint (author's final draft

    Model calibration for leak localization, a real application

    Get PDF
    The localization of leaks in Water Distribution Networks has a major relevance in terms of environmental and economic efficiency. This localization is generally carried on in situ by human operators using time consuming methods like acoustic loggers. Nevertheless, the automated aid provided to the operators is continuously increasing thanks to the exhaustive use of models. Models that have to be calibrated and updated in order to provide proper help and an improvement in the leak search. This paper presents an experience of leak localization using steady state models combined with a demand calibration algorithm. The calibration produces a notable improvement of the localization accuracy and signals changes in the network configuration. Results presented are based on real data and a real leak provoked for the test.Peer ReviewedPostprint (published version
    corecore