Search CORE

4 research outputs found

Programmation parallèle à base de tâches pour algorithmes passant à l'échelle : application au produit de matrices

Author: Agullo Emmanuel
Buttari Alfredo
Guermouche Abdou
Herrmann Julien
Jego Antoine
Publication venue: HAL CCSD
Publication date: 01/02/2022
Field of study

Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community thanks to how they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way. In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write a compact yet efficient and scalable General Matrix Multiplication. This extension required few modifications to the StarPU runtime system. The final implementation is shown to be competitive up to 32,768 cores with state-of-the-art libraries and may outperform them on some specific problem configurations.Les modèles de programmation à base de tâches ont réussi à susciter l'intérêt de la communauté des logiciels mathématiques de haute performance grâce à la manière dont ils soulagent une partie du fardeau que représentent le développement et la mise en œuvre efficace et portable d'algorithmes parallèles à mémoire distribuée. Dans des grappes d'ordinateurs de plus en plus grandes et hétérogènes, ces modèles apparaissent comme un moyen de développer et maintenir des algorithmes plus complexes. Cependant, les modèles de programmation basés sur les tâches manquent de flexibilité et les caractéristiques nécessaires pour exprimer de manière élégante et compacte des algorithmes passant à l'échelle se basant sur des schémas de communication avancés. Nous montrons que le paradigme de flot de tâches séquentiel (STF) peut être étendu pour écrire une multiplication matricielle passant à l'échelle. Cette extension a nécessité peu de modifications au système d'exécution StarPU. L'implantation finale est compétitive jusqu'à 32 768 cœurs avec les bibliothèques de pointe et peut même les surpasser dans certaines configurations spécifiques

INRIA a CCSD electronic archive server

The effect of mesh partitioning quality on the performance of a scientific application in an HPC environment

Author: António Pedro Araújo Fraga
Publication venue
Publication date: 13/09/2018
Field of study

The need of fast and reliable methods to solve large linear systems of equations is growing rapidly. Because this is a challenging problem, several techniques have been developed in order to solve it accurately and efficiently. Geometric Multigrid methods are being used to solve these problems, as they accelerate the convergence to a solution. With these methods, it is possible to use a coarser grid as an input, reducing the problem domain and thus reducing the computational cost.The focus of this thesis is the development of new algorithms to generate a sequence of coarser grids from the original grid. By treating this problem as a minimization problem, one can attempt to optimize the overall grid quality by choosing how to merge elements. In order to evaluate our algorithms, we are going define how to quantify the overall grid quality, and therefore analyse the grids obtained by them. We are also going to use the multilevel grid construction paradigm, which is known to be adequate to solve similar problems.Such construction can be done in parallel, by adding a small overhead and not sacrificing the quality produced by our multilevel constructor. Hence, we can achieve a high level of concurrency

Repositório Aberto da Universidade do Porto

Parallel and External High Quality Graph Partitioning

Author: Akhremtsev Yaroslav
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2019
Field of study

Partitioning graphs into k blocks of roughly equal size such that few edges run between the blocks is a key tool for processing and analyzing large complex real-world networks. The graph partitioning problem has multiple practical applications in parallel and distributed computations, data storage, image processing, VLSI physical design and many more. Furthermore, recently, size, variety, and structural complexity of real-world networks has grown dramatically. Therefore, there is a demand for efficient graph partitioning algorithms that fully utilize computational power and memory capacity of modern machines. A popular and successful heuristic to compute a high-quality partitions of large networks in reasonable time is

\textit{multi-level graph partitioning}

approach which contracts the graph preserving its structure and then partitions it using a complex graph partitioning algorithm. Specifically, the multi-level graph partitioning approach consists of three main phases: coarsening, initial partitioning, and uncoarsening. During the coarsening phase, the graph is recursively contracted preserving its structure and properties until it is small enough to compute its initial partition during the initial partitioning phase. Afterwards, during the uncoarsening phase the partition of the contracted graph is projected onto the original graph and refined using, for example, local search. Most of the research on heuristical graph partitioning focuses on sequential algorithms or parallel algorithms in the distributed memory model. Unfortunately, previous approaches to graph partitioning are not able to process large networks and rarely take in into account several aspects of modern computational machines. Specifically, the amount of cores per chip grows each year as well as the price of RAM reduces slower than the real-world graphs grow. Since HDDs and SSDs are 50 – 400 times cheaper than RAM, external memory makes it possible to process large real-world graphs for a reasonable price. Therefore, in order to better utilize contemporary computational machines, we develop efficient

\textit{multi-level graph partitioning}

algorithms for the shared-memory and the external memory models. First, we present an approach to shared-memory parallel multi-level graph partitioning that guarantees balanced solutions, shows high speed-ups for a variety of large graphs and yields very good quality independently of the number of cores used. Important ingredients include parallel label propagation for both coarsening and uncoarsening, parallel initial partitioning, a simple yet effective approach to parallel localized local search, and fast locality preserving hash tables that effectively utilizes caches. The main idea of the parallel localized local search is that each processors refines only a small area around a random vertex reducing interactions between processors. For example, on 79 cores, our algorithms partitions a graph with more than 3 billions of edges into 16 blocks cutting 4.5% less edges than the closest competitor and being more than two times faster. Furthermore, another competitors is not able to partition this graph. We then present an approach to external memory graph partitioning that is able to partition large graphs that do not fit into RAM. Specifically, we consider the semi-external and the external memory model. In both models a data structure of size proportional to the number of edges does not fit into the RAM. The difference is that the former model assumes that a data structure of size proportional to the number of vertices fits into the RAM whereas the latter assumes the opposite. We address the graph partitioning problem in both models by adapting the size-constrained label propagation technique for the semi-external model and by developing a size-constrained clustering algorithm based on graph coloring in the external memory. Our semi-external size-constrained label propagation algorithm (or external memory clustering algorithm) can be used to compute graph clusterings and is a prerequisite for the (semi-)external graph partitioning algorithm. The algorithms are then used for both the coarsening and the uncoarsening phase of a multi-level algorithm to compute graph partitions. Our (semi-)external algorithm is able to partition and cluster huge complex networks with billions of edges on cheap commodity machines. Experiments demonstrate that the semi-external graph partitioning algorithm is scalable and can compute high quality partitions in time that is comparable to the running time of an efficient internal memory implementation. A parallelization of the algorithm in the semi-external model further reduces running times. Additionally, we develop a speed-up technique for the hypergraph partitioning algorithms. Hypergraphs are an extension of graphs that allow a single edge to connect more than two vertices. Therefore, they describe models and processes more accurately additionally allowing more possibilities for improvement. Most multi-level hypergraph partitioning algorithms perform some computations on vertices and their set of neighbors. Since these computations can be super-linear, they have a significant impact on the overall running time on large hypergraphs. Therefore, to further reduce the size of hyperedges, we develop a pin-sparsifier based on the min-hash technique that clusters vertices with similar neighborhood. Further, vertices that belong to the same cluster are substituted by one vertex, which is connected to their neighbors, therefore, reducing the size of the hypergraph. Our algorithm sparsifies a hypergraph such that the resulting graph can be partitioned significantly faster without loss in quality (or with insignificant loss). On average, KaHyPar with sparsifier performs partitioning about 1.5 times faster while preserving solution quality if hyperedges are large. All aforementioned frameworks are publicly available

KITopen

Reducción del Tiempo de Simulación de Redes de Distribución de Agua, mediante el Método de Mallas y la Computación de Altas Prestaciones

Author: Alvarruiz Bermejo Fernando
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 14/03/2016
Field of study

[EN] Computer simulation of water distribution networks by means of mathematical models is nowadays an indispensable tool for the design and exploitation of those networks. Simulation is used not only for the design of new supply systems, or modifications and extensions of existing systems, but also for the normal operation tasks carried out in any network. Two main types of simulation can be differentiated: hydraulic simulation, by means of which the pressures and flows registered in the network are computed, and water quality simulation, the objective of which is to obtain information about chemical substance concentrations. The need for simulation comes often in the context of a wider problem of optimization or reliability analysis, which requires performing a large number of simulations, thus resulting in a process with considerable computational complexity. This fact, added to the growing size and level of detail of network models, as a consequence of the automatic incorporation of data coming from Geographical Information Systems, means that the performance of the simulation solver has a great impact in the overall computing time. In this context, this thesis considers and explores different strategies to improve the performance of water distribution network simulation. The first strategy consists of making some contributions to the hydraulic simulation method known as Looped Newton-Raphson (or more simply the loop method), which is based on the consideration of flow corrections associated to a set of independent loops within the network. Even though the method known as Global Gradient Algorithm (GGA) is more widely used and accepted, the loop method has the potential to be faster, owing to the smaller size of the underlying linear systems. In this thesis some contributions are presented to improve the performance of the loop method for hydraulic simulation. Firstly, efficient algorithms are developed for the selection of a suitable set of independent loops, leading to a highly sparse linear system. Secondly, methods are developed for efficient modeling of hydraulic valves, and especially pressure reducing/sustaining valves. The second strategy explored is the introduction of high performance computing in the hydraulic simulation using distributed memory platforms. In particular, the code of Epanet, a widely accepted water distribution network simulation software, is taken as the starting point for the introduction of parallel simulation algorithms, using the Message Passing Interface (MPI) tool for inter-process communications. As a result of this work, firstly a parallel algorithm is presented for the simulation of flows and pressures by means of the GGA method, making use of multifrontal algorithms for the parallel solution of the underlying linear systems. Secondly, a parallel algorithm for water quality simulation by means of the Discrete Volume Element Method (DVEM) is described, based on partitioning the network by means of multilevel recursive bisection algorithms. Thirdly, a parallel method is presented for leakage minimization by finding the optimal pressure settings for a set of pressure-reducing valves. In distributed memory platforms the overhead due to communication and synchronization can be excessively high, counterbalancing the gain derived from the division of the computation among the processors. This effect is less pronounced in shared memory platforms such as multicore systems, which have gained popularity over the last years. This fact motivates the third strategy explored in this thesis, which is the development of parallel algorithms for simulation of flows and pressures using multicore systems. OpenMP is the tool used for the parallelization, both of the method GGA as implemented in Epanet software and of the loop method with the contributions on it that have been made in the context of this thesis.[ES] La simulación por computador de las redes de distribución de agua potable, mediante el uso de modelos matemáticos, es hoy en día una herramienta indispensable para el diseño y la explotación de dichas redes. La simulación se utiliza tanto en el diseño de nuevos abastecimientos y en ampliaciones o modificaciones de abastecimientos existentes, como en las tareas de operación normales de cualquier red. Se puede diferenciar entre dos tipos de simulación: la simulación hidráulica, que permite obtener las presiones y caudales que se registran en la red, y la simulación de la calidad del agua, cuyo objetivo es obtener información sobre concentraciones de sustancias químicas. A menudo la necesidad de simulación surge dentro de un problema más amplio de optimización o de análisis de fiabilidad, que requiere llevar a cabo un gran número de simulaciones, con lo que el proceso completo resulta de una complejidad computacional considerable. Esto, añadido al hecho de que el tamaño y nivel de detalle de los modelos de redes crece constantemente, como consecuencia de la incorporación automática de datos contenidos en Sistemas de Información Geográfica, hace que las prestaciones del solver de simulación tengan un gran impacto en el tiempo total de cálculo necesario. En este contexto, esta tesis considera y explora distintas vías para mejorar las prestaciones de la simulación de redes de distribución de agua. La primera de estas vías consiste en realizar algunas aportaciones al método de simulación hidráulica conocido como método de Newton-Raphson de mallas, el cual se basa en la consideración de caudales correctores asociados a un conjunto de mallas independientes definidas sobre la red. Aunque el método conocido como Algoritmo del Gradiente Global (GGA) goza de mayor aceptación, el método de mallas tiene el potencial de ser más rápido, debido al menor tamaño de los sistemas lineales subyacentes. Esta tesis presenta aportaciones para mejorar las prestaciones del método de mallas de simulación hidráulica. En primer lugar, se desarrollan algoritmos eficientes para la selección de un conjunto de mallas adecuado, que conduzca a un sistema altamente disperso. En segundo lugar se desarrollan métodos para la modelización eficiente de válvulas, y especialmente válvulas reductoras/sostenedoras de presión. La segunda vía explorada es la introducción de la computación de altas prestaciones en la simulación hidráulica usando plataformas de memoria distribuida. En particular, se parte del código de Epanet, un software de simulación de redes de amplia aceptación, y se introducen en él algoritmos paralelos de simulación, usando la herramienta Message Passing Interface (MPI) para la comunicación entre procesos. Como resultado de ello, se presenta en primer lugar un algoritmo paralelo para la simulación de caudales y presiones por medio del método GGA, haciendo uso de algoritmos multifrontales para la resolución paralela de los sistemas lineales subyacentes. En segundo lugar, se describe un algoritmo paralelo para la simulación de la calidad del agua mediante el Método de Elementos Discretos de Volumen (DVEM), particionando la red por medio de algoritmos de bisección recursiva multinivel. En tercer lugar, se presenta un método paralelo para la minimización de fugas mediante la determinación de las consignas óptimas de una serie de válvulas reductoras de presión. Finalmente, la tercera vía explorada es el desarrollo de algoritmos paralelos sobre memoria compartida para la simulación de presiones y caudales. Se considera con ello un tipo de plataformas que han ganado popularidad en los últimos años. Se utiliza la herramienta OpenMP para la paralelización, tanto de Epanet y de su implementación del método GGA, como del método de mallas, con las aportaciones al mismo que se han realizado en el contexto de esta tesis.[CA] La simulació per computador de les xarxes de distribució d'aigua potable, per mitjà de l'ús de models matemàtics, es hui en dia una ferramenta indispensable per al disseny i l'explotació d'abastiments d'aigua. La simulació s'utilitza tant per al disseny de nous abastiments o ampliacions i modificacions d'abastiments existents, com per a les tasques d'operació normals en qualsevol xarxa. Es pot diferenciar entre dos tipus de simulació: la simulació hidràulica, que permet obtindre les pressions i cabals que es produeixen en la xarxa, i la simulació de la qualitat de l'aigua, l'objectiu de la qual és obtindre informació sobre concentracions de substàncies químiques. Sovint la necessitat de simulació sorgeix dins d'un problema més ampli d'optimització o d'anàlisi de fiabilitat, que requereix dur a terme un gran nombre de simulacions, amb la qual cosa el procés complet resulta d'una complexitat computacional considerable. Això, afegit al fet de que la grandària i nivell de detall del models de xarxes creix constantment, com a conseqüència de la incorporació automàtica de dades contingudes en Sistemes d'Informació Geogràfica, fa que les prestacions del solver de simulació tinguen un gran impacte en el temps total de càlcul necessari. En este context, esta tesi considera i explora diferents vies per a millorar les prestacions de la simulació de xarxes de distribució d'aigua. La primera d'estes vies consisteix en realitzar algunes contribucions al mètode de simulació hidràulica conegut com mètode de Newton-Raphson de malles (o simplement mètode de malles), el qual es basa en la consideració de cabals correctors associats a un conjunt de malles independents definides en la xarxa. Encara que el mètode conegut com Algorisme del Gradient Global (GGA) gaudeix de major acceptació, el mètode de malles té el potencial de ser més ràpid, degut a la menor grandària dels sistemes lineals subjacents. En esta tesi es presenten contribucions per a millorar les prestacions del mètode de malles de simulació hidràulica. En concret, en primer lloc es desenvolupen algorismes eficients per a la selecció d'un conjunt de malles adequat, que conduïsca a un sistema lineal altament dispers. En segon lloc es desenvolupen mètodes per a la modelització eficient de vàlvules, i especialment vàlvules reductores/sostenidores de pressió. La segona via explorada és la introducció de la computació d'altes prestacions en la simulació hidràulica utilitzant plataformes de memòria distribuïda. En concret, es parteix del codi d'Epanet, un programari de simulació de xarxes de distribució d'aigua d'amplia acceptació, i s'hi introdueixen algorismes paral·lels de simulació, utilitzant la ferramenta Message Passing Interface (MPI) per a la comunicació entre processos. Com a resultat d'este treball, es presenta en primer lloc un algorisme paral·lel per a la simulació de cabals i pressions per mitjà del mètode GGA, fent ús d'algorismes multifrontals per a la resolució en paral·lel dels sistemes lineals subjacents. En segon lloc, es descriu un algorisme paral·lel per a la simulació de la qualitat d'aigua amb el Mètode d'Elements Discrets de Volum (DVEM), particionant la xarxa per mitjà d'algoritmes de bisecció recursiva multinivell. En tercer lloc es presenta un mètode paral·lel per a la minimització de fugues mitjançant la determinació de les consignes òptimes d'una sèrie de vàlvules reductores de pressió. Finalment, la tercera via explorada és el desenvolupament d'algorismes paral·lels sobre memòria compartida per a la simulació de pressions i cabals. Es considera amb això un tipus de plataformes que han guanyat popularitat en els últims anys. S'utilitza la ferramenta OpenMP per a la paral·lelització, tant del programari Epanet i de la seua implementació del mètode GGA, com del mètode de malles, amb les contribucions al mateix que s'han realitzat en el context d'esta tesi.Alvarruiz Bermejo, F. (2016). Reducción del Tiempo de Simulación de Redes de Distribución de Agua, mediante el Método de Mallas y la Computación de Altas Prestaciones [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/61764TESI

RiuNet