Search CORE

4 research outputs found

A k-way Greedy Graph Partitioning with Initial Fixed Vertices for Parallel Applications

Author: Esnard Aurélien
Predari Maria
Publication venue: HAL CCSD
Publication date: 17/02/2016
Field of study

International audienceGraph partitioning is used to solve the problem of distributing computations to a number of processors, in order to improve the performance of time consuming applications in parallel environments. A common approach to solve this problem is based on a multilevel framework, where the graph is firstly coarsened to a smaller instance and then it is partitioned in a number of parts using recursive bisection (RB) based methods. However, in applications where initial fixed vertices are used to model additional constraints of the problem, RB based methods often fail to produce partitions of good quality. In this paper, we propose a new direct k-way greedy graph growing algorithm, called KGGGP, that overcomes this issue and succeeds to produce partition with better quality than RB while respecting the constraint of fixed vertices. In the experimental section, we present results which compare KGGGP against state-of-the-art methods for graphs available from the popular DIMACS'10 collection

INRIA a CCSD electronic archive server

Reducing data movement on large shared memory systems by exploiting computation dependencies

Author: Ayguadé Parra Eduard
Barrera I.S.
Casas Guix Marc
Labarta Mancho Jesús José
Moreto Planas Miquel
Valero Cortés Mateo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Shared memory systems are becoming increasingly complex as they typically integrate several storage devices. That brings different access latencies or bandwidth rates depending on the proximity between the cores where memory accesses are issued and the storage devices containing the requested data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are generally applied by the system software. We propose techniques at the runtime system level to further mitigate the impact of NUMA effects on parallel applications' performance. We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are pieces of serial code and edges are control or data dependencies between them, to efficiently reduce data transfers. Our approach, based on graph partitioning, adds negligible overhead and is able to provide performance improvements up to 1.52× and average improvements of 1.12× with respect to the best state-of-the-art approach when deployed on a 288-core shared-memory system. Our approach reduces the coherence traffic by 2.28× on average with respect to the state-of-the-art.This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the European Union’s Horizon 2020 research and innovation programme (grant agreements 671697 and 779877). I. Sánchez Barrera has been partially supported by the Spanish Ministry of Education, Culture and Sport under Formación del Profesorado Universitario fellowship number FPU15/03612. M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2016-21104.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

High-Performance Graph Algorithms

Author: Looz Moritz von
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2019
Field of study

Ever since Euler took a stroll across the bridges of a city then called Königsberg, relationships of entities have been modeled as graphs. Being a useful abstraction when the structure of relationships is the significant aspect of a problem, popular uses of graphs include the modeling of social networks, supply chain dependencies, the internet, customer preferences, street networks or the who-eats-whom (aka food network) in an ecosystem. In parallel computing, the assignment of sub-tasks to processes can massively influence the performance, since data dependencies between processes are significantly more expensive than within them. This scenario has been profitably modeled as a graph problem, with sub-tasks as vertices and their communication dependencies as edges. Many graphs are governed by an underlying geometry. Some are derived directly from a geometric object, such as street networks or meshes from spatial simulations. Others have a hidden geometry, in which no explicit geometry information is known, but the graph structure follows an underlying geometry. A suitable embedding into this geometry would then show a close relationship between graph distances and geometric distances. A subclass of graphs enjoying significant attention are complex networks. Though hard to define exactly, they are commonly characterized by a low diameter, heterogeneous structure, and a skewed degree distribution, often following a power law. The most famous examples include social networks, the hyperlink network and communication networks. Development of new analysis algorithms for complex networks is ongoing. Especially since the instances of interest are often in the size of billions of vertices, fast analysis algorithms and approximations are a natural focus of development. To accurately test and benchmark new developments, as well as to gain theoretical insight about network formation, generative graph models are required: A mathematical model describing a family of random graphs, from which instances can be sampled efficiently. Even if real test data is available, interesting instances are often in the size of terabytes, making storage and transmission inconvenient. While the underlying geometry of street networks is the spherical geometry they were built in, there is some evidence that the geometry best fitting to complex networks is not Euclidean or spherical, but hyperbolic. Based on this notion, Krioukov et al. proposed a generative graph model for complex networks, called Random Hyperbolic Graphs. They are created by setting vertices randomly within a disk in the hyperbolic plane and connecting pairs of vertices with a probability depending on their distance. An important subclass of this model, called Threshold Random Hyperbolic Graphs connects vertices exactly if the distances between vertices is below a threshold. This model has pleasant properties and has received considerable attention from theoreticians. Unfortunately, a straightforward generation algorithm has a complexity quadratic in the number of nodes, which renders it infeasible for instances of more than a few million vertices. We developed four faster generation algorithms for random hyperbolic graphs: By projecting hyperbolic geometry in the Euclidean geometry of the Poincaré disk model, we are able to use adapted versions of existing geometric data structures. Our first algorithm uses a polar quadtree to avoid distance calculations and achieves a time complexity of

\mathcal{O}((n^{3/2} + m)\log n)

whp. -- the first subquadratic generation algorithm for threshold random hyperbolic graphs. Empirically, our implementation achieves an improvement of three orders of magnitude over a reference implementation of the straightforward algorithm. We extend this quadtree data structure further for the generation of general random hyperbolic graphs, in which all edges are probabilistic. Since each edge has a non-zero probability of existing, sampling them by throwing a biased coin for each would again cost quadratic time complexity. We address this issue by sampling jumping widths within leaf cells and aggregating subtrees to virtual leaf cells when the expected number of neighbors in them is less than a threshold. With this tradeoff, we bound both the number of distance calculations and the number of examined quadtree cells per edge, resulting in the same time complexity of

\mathcal{O}((n^{3/2} + m)\log n)

also for general random hyperbolic graphs. We generalize this sampling scenario and define Probabilistic Neighborhood Queries, in which a random sample of a geometric point set is desired, with the probability of inclusion depending on the distance to a query point. Usable to simulate probabilistic spatial spreading, we show a significant speedup on a proof of concept disease simulation. Our second algorithm for threshold random hyperbolic graphs uses a data structure of concentric annuli in the hyperbolic plane. For each given vertex, the positions of possible neighbors in each band can be restricted with the hyperbolic law of cosines, leading to a much reduced number of candidates that need to be checked. This yields a reduced time complexity of

\mathcal{O}(n \log^2 n + m)

and a further order of magnitude in practice for graphs of a few million vertices. Finally, we extend also this data structure to general random hyperbolic graphs, with the same time complexity for constant parameters. The second part of my thesis is in many aspects the opposite of the first. Instead of a hidden geometry, I consider graphs whose geometric information is explicit. Instead of using it to generate graphs, I use their geometric information to decide how to cut them into pieces. Given a graph, the Graph Partitioning Problem asks for a disjoint partition of the vertex set so that each subset has a similar number of vertices and some objective function is optimized. Its many applications include parallelizing a computing task while balancing load and minimizing communication, dividing a graph into blocks as preparation for graph analysis tasks or finding natural cuts in street networks for efficient route planning. Since the graph partitioning problem is NP-complete and hard to approximate, heuristics are used in practice. Our first graph partitioner is designed for an application in quantum chemistry. Computing electron density fields is necessary to accurately predict protein interactions, but the required time scales quadratically with the protein\u27s size, with punitive constant factors. This effectively restricts these density-based methods to proteins of at most a few hundred amino acids. It is possible to circumvent this limitation by computing only parts of the target protein at a time, an approach known as subsystem quantum chemistry. However, the interactions between amino acids in different parts are then neglected; this neglect causes errors in the solution. We model this problem as partitioning a protein graph: Each vertex represents one amino acid, each edge an interaction between them and each edge weight the expected error caused by neglecting this interaction. Finding a set of subsets with minimum error is then equivalent to finding a partition of the protein graph with minimum edge cut. The requirements of the chemical simulations cause additional constraints on this partition, as well as an implied geometry by the protein structure. We provide an implementation of the well-known multilevel heuristic together with the local search algorithm by Fiduccia and Mattheyses, both adapted to respect these new constraints. We also provide an optimal dynamic programming algorithm for a restricted scenario with a smaller solution space. In terms of edge cut, we achieve an average improvement of 13.5% against the naive solution, which was previously used by domain scientists. Our second graph partitioner targets geometric meshes from numerical simulations in the scale of billions of grid points, parallelized to tens of thousands of processes. In general purpose graph partitioning, the multilevel heuristic computes high-quality partitions, but its scalability is limited. Due to the large halos in our targeted application, the shape of the partitioned blocks is also important, with convex shapes leading to less communication. We adapt the well-known

k

-means algorithm, which yields convex shapes, to the partitioning of geometric meshes by including a balancing scheme. We further extend several existing geometric optimizations to the balanced version to achieve fast running times and parallel scalability. The classic

k

-means algorithm is highly dependent on the choice of initial centers. We select initial centers among the input points along a space-filling curve, thus guaranteeing a similar distribution as the input points. The resulting implementation scales to tens of thousands of processes and billions of vertices, partitioning them in seconds. Compared to previous fast geometric partitioners, our method provides partitions with a 10-15% lower communication volume and also a corresponding smaller communication time in a distributed SpMV benchmark

KITopen