41 research outputs found
AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests
The last improvements in programming languages, programming models, and
frameworks have focused on abstracting the users from many programming issues.
Among others, recent programming frameworks include simpler syntax, automatic
memory management and garbage collection, which simplifies code re-usage
through library packages, and easily configurable tools for deployment. For
instance, Python has risen to the top of the list of the programming languages
due to the simplicity of its syntax, while still achieving a good performance
even being an interpreted language. Moreover, the community has helped to
develop a large number of libraries and modules, tuning them to obtain great
performance.
However, there is still room for improvement when preventing users from
dealing directly with distributed and parallel computing issues. This paper
proposes and evaluates AutoParallel, a Python module to automatically find an
appropriate task-based parallelization of affine loop nests to execute them in
parallel in a distributed computing infrastructure. This parallelization can
also include the building of data blocks to increase task granularity in order
to achieve a good execution performance. Moreover, AutoParallel is based on
sequential programming and only contains a small annotation in the form of a
Python decorator so that anyone with little programming skills can scale up an
application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and
Scientific Computing (PyHPC 2018
Scheduling policies for Big Data workflows
The aim of this master thesis is to both give the programmer some guidelines to achieve good scalabilities with tasked based programming models and to improve the COMPSs runtime scheduler the capabilities to reach this scaling objectives
Desenvolupament de codis d'Ã lgebra lineal amb PyCOMPSs
Arrel de l’interès per ser capaç d’utilitzar les llibreries de NumPy amb MKL
en entorns distribuïts, va sorgir una col·laboració entre Intel
R i el departament
de Workflows and distributed computing del BSC.
Aquest treball explora un seguit de possibles lÃnies d’investigació que tenen
per objectiu millorar el rendiment i facilitar la utilització simultà nia de MKL
a través de NumPy i PyCOMPSs per a realitzar operacions d’à lgebra lineal
en entorns distribuïts.
Pel que fa als algoritmes matemà tics utilitzats, en una primera instà ncia es
busquen solucions per aconseguir un rendiment mà xim de l’algoritme de multiplicació
de matrius mitjançant la descomposició de les mateixes en blocs
quadrats. A continuació, s’ha adaptat el codi ja present al BSC per calcular
la factorització de Cholesky amb inicialització distribuïda i augmentar-ne el
paral·lelisme. L’última tasca realitzada en aquest apartat consisteix en la implementació
d’un algoritme per al cà lcul de la descomposició QR mitjançant
la descomposició en matrius quadrades i inicialització distribuïda.
Tenint en compte la importà ncia de la planificació per obtenir un bon rendiment
de les aplicacions, a continuació s’ha procedit a fer una refactorització
del planificador per introduir-hi un conjunt de noves polÃtiques. Concretament,
s’han afegit planificadors que segueixen una polÃtica FIFO, una LIFO i
una FIFO modificada per prioritzar la localitat de les dades i minimitzar aixÃ
la quantitat de transferències.
Finalment, s’ha dissenyat i implementat una llibreria que utilitza un sistema
de wrapping sobre la llibreria NumPy que permet la introducció progressiva
dels algoritmes distribuïts sense la necessitat d’implementar-la tota de cop.
En tot moment es garanteix, però, que l’usuari podrà accedir a totes les funcionalitats
de la llibreria original.A raÃz del interés por ser capaces de utilitzar las librerias de NumPy con MKL
en entornos distribuidos, surgió una colaboración entre Intel
R i el departamento
de Workflows and distributed computing del BSC.
Este trabajo explora un conjunto de posibles lineas de investigación que tienen
por objetivo mejorar el rendimiento y facilitar la utilización simultánea de
MKL a través de NumPy i PyCOMPSs para realizar operaciones de álgebra
lineal en entornos distribuidos.
En lo que afecta a los algoritmos matemáticos utilizados, en una primera
instancia se buscan soluciones para conseguir un rendimiento máximo de la
multiplicación de matrices mediante la descomposicion de las mismas en bloques
cuadrados. A continuación, se ha adaptado el código ya existente en el
BSC para calcular la factorización de Cholesky con la inicialización distribuida
i aumentar su grado de paralelismo. La última tarea realizada consiste en
la implementación de un algoritmo para el cálculo de la descomposición QR
mediante la descomposición de la matriz en bloques cuadrados realizando la
inicialización de forma distribuida.
Considerando el impacto de la planificación para obtener un buen rendimiento
de las aplicaciones, a continuación se ha procedido a realizar una refactorización
del planificador con el objetivo de introducir nuevas polÃticas. Concretamente,
se han añadido planificadores que siguen una polÃtica FIFO, una LIFO
i una FIFO modificada para priorizar la localidad de los datos y evitar asà el
número de transferencias.
Finalmente, se ha diseñado e implementado una librerÃa que utiliza un sistema
de wrapping sobre la librerÃa NumPy que permite la introducción progresiva
de los algoritmos distribuidos sin la necesidad de implementar-la toda de golpe
a la vez que el usuario sigue teniendo acceso a todas las funcionalidades de
la librerÃa original.In the wake of the interest in being capable to use MKL through NumPy in
distributed systems, a collaboration between Intel
R and the BSC’s Work-
flows and distributed computing department has been created.
This work explores several possible investigation lines that aims to improve
the behaviour and ease the use of NumPy with MKL and PyCOMPSs simultaneously
to carry out linear algebra operations in distributed systems.
Regarding the used algorithms, first of all some solutions are explored to achieve
a better performance in the blocked matrix multiplication. Next, the
code already present in the center that computes the Cholesky factorisation
has been modified in order to improve his parallelisation level and initialise the
matrix in a distributed way. Finally, an algorithm to compute a QR decomposition
through the matrix decomposition in smaller square matrix initialized
in a distributed way has been implemented.
Next, a refactor in the COMPSs scheduler has been done in order to ease
the creation of new scheduling policies. Once this work has been done, some
new schemes has been added. In particular, schedulers with FIFO, LIFO and
FIFO modified to schedule child tasks in the father’s worker to minimize the
amount of data transfers has been added.
Finally, a library that wraps the entire NumPy library has been implemented.
This new environment allows the team to introduce the distributed implementations
progressively assuring that importing it the user will have access to
the full stack of NumPy’s functionalities, even those not still implemented
AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests
The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance.
However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores
Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs
Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.This work has been supported by the
Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and
2014-SGR-1272). Javier Conejero postdoctoral contract is co-financed by the Ministry of Economy and Competitiveness
under Juan de la Cierva Formación postdoctoral fellowship number FJCI-2015-24651. Cristian Ramon-Cortes predoctoral contract is financed by the Ministry of Economy and Competitiveness under the contract BES-2016-076791. This work is
supported by the Intel-BSC Exascale Lab.
This work has been supported by the European Commission through the Horizon 2020 Research and Innovation program
under contract 687584 (TANGO project).Peer ReviewedPostprint (published version
AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests
International audienceThe last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax , automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning the most commonly used to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoPar-allel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores
AutoParallel: Automatic parallelisation and distributed execution of affine loop nests in Python
International audienceThe last improvements in programming languages and models have focused on simplicity and abstraction; leading Python to the top of the list of the programming languages. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and contains one single annotation (in the form of a Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores. The evaluation demonstrates that AutoParallel goes one step further in easing the development of distributed applications. On the one hand, the programmability evaluation highlights the benefits of using a single Python decorator instead of manually annotating each task and its parameters or, even worse, having to develop the parallel code explicitly (e.g., using OpenMP, MPI). On the other hand, the performance evaluation demonstrates that AutoParallel is capable of automatically generating task-based workflows from sequential Python code while achieving the same performances than manually taskified versions of established state-of-the-art algorithms (i.e., Cholesky, LU, and QR decompositions). Finally, AutoParallel is also capable of automatically building data blocks to increase the tasks' granularity; freeing the user from creating the data chunks, and redesigning the algorithm. For advanced users, we believe that this feature can be useful as a baseline to design blocked algorithms
Enabling Python to execute efficiently in heterogeneous distributed infrastructures with PyCOMPSs
Python has been adopted as programming language by a large number of scientific communities. Additionally to the easy programming interface, the large number of libraries and modules that have been made available by a large number of contributors, have taken this language to the top of the list of the most popular programming languages in scientific applications. However, one main drawback of Python is the lack of support for concurrency or parallelism. PyCOMPSs is a proved approach to support task-based parallelism in Python that enables applications to be executed in parallel in distributed computing platforms.
This paper presents PyCOMPSs and how it has been tailored to execute tasks in heterogeneous and multi-threaded environments. We present an approach to combine the task-level parallelism provided by PyCOMPSs with the thread-level parallelism provided by MKL. Performance and behavioral results in distributed computing heterogeneous clusters show the benefits and capabilities of PyCOMPSs in both HPC and Big Data infrastructures.Thiswork has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract
TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). Javier Conejero postdoctoral contract is co-financed by the Ministry of Economy and Competitiveness
under Juan de la Cierva Formación postdoctoral fellowship number FJCI- 2015-24651. Cristian Ramon-Cortes predoctoral contract is financed by the Ministry of Economy and Competitiveness under the contract BES-2016-076791. This work is supported by the Intel-BSC Exascale Lab.
This work has been supported by the European Commission through the Horizon 2020 Research and Innovation program under
contract 687584 (TANGO project).Peer ReviewedPostprint (author's final draft
Model calibration for leak localization, a real application
The localization of leaks in Water Distribution Networks has a major relevance in terms of environmental and economic efficiency. This localization is generally carried on in situ by human
operators using time consuming methods like acoustic loggers. Nevertheless, the automated aid provided to the operators is continuously increasing thanks to the exhaustive use of models. Models
that have to be calibrated and updated in order to provide proper help and an improvement in the leak search. This paper presents an experience of leak localization using steady state models combined with a demand calibration algorithm. The calibration produces a notable improvement of the localization accuracy and signals changes in the network configuration. Results presented are based on real data and a real leak provoked for the test.Peer ReviewedPostprint (published version