3 research outputs found

    High-Performance Graph Algorithms

    Get PDF
    Ever since Euler took a stroll across the bridges of a city then called Königsberg, relationships of entities have been modeled as graphs. Being a useful abstraction when the structure of relationships is the significant aspect of a problem, popular uses of graphs include the modeling of social networks, supply chain dependencies, the internet, customer preferences, street networks or the who-eats-whom (aka food network) in an ecosystem. In parallel computing, the assignment of sub-tasks to processes can massively influence the performance, since data dependencies between processes are significantly more expensive than within them. This scenario has been profitably modeled as a graph problem, with sub-tasks as vertices and their communication dependencies as edges. Many graphs are governed by an underlying geometry. Some are derived directly from a geometric object, such as street networks or meshes from spatial simulations. Others have a hidden geometry, in which no explicit geometry information is known, but the graph structure follows an underlying geometry. A suitable embedding into this geometry would then show a close relationship between graph distances and geometric distances. A subclass of graphs enjoying significant attention are complex networks. Though hard to define exactly, they are commonly characterized by a low diameter, heterogeneous structure, and a skewed degree distribution, often following a power law. The most famous examples include social networks, the hyperlink network and communication networks. Development of new analysis algorithms for complex networks is ongoing. Especially since the instances of interest are often in the size of billions of vertices, fast analysis algorithms and approximations are a natural focus of development. To accurately test and benchmark new developments, as well as to gain theoretical insight about network formation, generative graph models are required: A mathematical model describing a family of random graphs, from which instances can be sampled efficiently. Even if real test data is available, interesting instances are often in the size of terabytes, making storage and transmission inconvenient. While the underlying geometry of street networks is the spherical geometry they were built in, there is some evidence that the geometry best fitting to complex networks is not Euclidean or spherical, but hyperbolic. Based on this notion, Krioukov et al. proposed a generative graph model for complex networks, called Random Hyperbolic Graphs. They are created by setting vertices randomly within a disk in the hyperbolic plane and connecting pairs of vertices with a probability depending on their distance. An important subclass of this model, called Threshold Random Hyperbolic Graphs connects vertices exactly if the distances between vertices is below a threshold. This model has pleasant properties and has received considerable attention from theoreticians. Unfortunately, a straightforward generation algorithm has a complexity quadratic in the number of nodes, which renders it infeasible for instances of more than a few million vertices. We developed four faster generation algorithms for random hyperbolic graphs: By projecting hyperbolic geometry in the Euclidean geometry of the Poincaré disk model, we are able to use adapted versions of existing geometric data structures. Our first algorithm uses a polar quadtree to avoid distance calculations and achieves a time complexity of O((n3/2+m)logn)\mathcal{O}((n^{3/2} + m)\log n) whp. -- the first subquadratic generation algorithm for threshold random hyperbolic graphs. Empirically, our implementation achieves an improvement of three orders of magnitude over a reference implementation of the straightforward algorithm. We extend this quadtree data structure further for the generation of general random hyperbolic graphs, in which all edges are probabilistic. Since each edge has a non-zero probability of existing, sampling them by throwing a biased coin for each would again cost quadratic time complexity. We address this issue by sampling jumping widths within leaf cells and aggregating subtrees to virtual leaf cells when the expected number of neighbors in them is less than a threshold. With this tradeoff, we bound both the number of distance calculations and the number of examined quadtree cells per edge, resulting in the same time complexity of O((n3/2+m)logn)\mathcal{O}((n^{3/2} + m)\log n) also for general random hyperbolic graphs. We generalize this sampling scenario and define Probabilistic Neighborhood Queries, in which a random sample of a geometric point set is desired, with the probability of inclusion depending on the distance to a query point. Usable to simulate probabilistic spatial spreading, we show a significant speedup on a proof of concept disease simulation. Our second algorithm for threshold random hyperbolic graphs uses a data structure of concentric annuli in the hyperbolic plane. For each given vertex, the positions of possible neighbors in each band can be restricted with the hyperbolic law of cosines, leading to a much reduced number of candidates that need to be checked. This yields a reduced time complexity of O(nlog2n+m)\mathcal{O}(n \log^2 n + m) and a further order of magnitude in practice for graphs of a few million vertices. Finally, we extend also this data structure to general random hyperbolic graphs, with the same time complexity for constant parameters. The second part of my thesis is in many aspects the opposite of the first. Instead of a hidden geometry, I consider graphs whose geometric information is explicit. Instead of using it to generate graphs, I use their geometric information to decide how to cut them into pieces. Given a graph, the Graph Partitioning Problem asks for a disjoint partition of the vertex set so that each subset has a similar number of vertices and some objective function is optimized. Its many applications include parallelizing a computing task while balancing load and minimizing communication, dividing a graph into blocks as preparation for graph analysis tasks or finding natural cuts in street networks for efficient route planning. Since the graph partitioning problem is NP-complete and hard to approximate, heuristics are used in practice. Our first graph partitioner is designed for an application in quantum chemistry. Computing electron density fields is necessary to accurately predict protein interactions, but the required time scales quadratically with the protein\u27s size, with punitive constant factors. This effectively restricts these density-based methods to proteins of at most a few hundred amino acids. It is possible to circumvent this limitation by computing only parts of the target protein at a time, an approach known as subsystem quantum chemistry. However, the interactions between amino acids in different parts are then neglected; this neglect causes errors in the solution. We model this problem as partitioning a protein graph: Each vertex represents one amino acid, each edge an interaction between them and each edge weight the expected error caused by neglecting this interaction. Finding a set of subsets with minimum error is then equivalent to finding a partition of the protein graph with minimum edge cut. The requirements of the chemical simulations cause additional constraints on this partition, as well as an implied geometry by the protein structure. We provide an implementation of the well-known multilevel heuristic together with the local search algorithm by Fiduccia and Mattheyses, both adapted to respect these new constraints. We also provide an optimal dynamic programming algorithm for a restricted scenario with a smaller solution space. In terms of edge cut, we achieve an average improvement of 13.5% against the naive solution, which was previously used by domain scientists. Our second graph partitioner targets geometric meshes from numerical simulations in the scale of billions of grid points, parallelized to tens of thousands of processes. In general purpose graph partitioning, the multilevel heuristic computes high-quality partitions, but its scalability is limited. Due to the large halos in our targeted application, the shape of the partitioned blocks is also important, with convex shapes leading to less communication. We adapt the well-known kk-means algorithm, which yields convex shapes, to the partitioning of geometric meshes by including a balancing scheme. We further extend several existing geometric optimizations to the balanced version to achieve fast running times and parallel scalability. The classic kk-means algorithm is highly dependent on the choice of initial centers. We select initial centers among the input points along a space-filling curve, thus guaranteeing a similar distribution as the input points. The resulting implementation scales to tens of thousands of processes and billions of vertices, partitioning them in seconds. Compared to previous fast geometric partitioners, our method provides partitions with a 10-15% lower communication volume and also a corresponding smaller communication time in a distributed SpMV benchmark

    Dynamic Load Balancing Techniques for Particulate Flow Simulations

    Full text link
    Parallel multiphysics simulations often suffer from load imbalances originating from the applied coupling of algorithms with spatially and temporally varying workloads. It is thus desirable to minimize these imbalances to reduce the time to solution and to better utilize the available hardware resources. Taking particulate flows as an illustrating example application, we present and evaluate load balancing techniques that tackle this challenging task. This involves a load estimation step in which the currently generated workload is predicted. We describe in detail how such a workload estimator can be developed. In a second step, load distribution strategies like space-filling curves or graph partitioning are applied to dynamically distribute the load among the available processes. To compare and analyze their performance, we employ these techniques to a benchmark scenario and observe a reduction of the load imbalances by almost a factor of four. This results in a decrease of the overall runtime by 14% for space-filling curves
    corecore