    Tree-based Coarsening and Partitioning of Complex Networks

    Many applications produce massive complex networks whose analysis would benefit from parallel processing. Parallel algorithms, in turn, often require a suitable network partition. For solving optimization tasks such as graph partitioning on large networks, multilevel methods are preferred in practice. Yet, complex networks pose challenges to established multilevel algorithms, in particular to their coarsening phase. One way to specify a (recursive) coarsening of a graph is to rate its edges and then contract the edges as prioritized by the rating. In this paper we (i) define weights for the edges of a network that express the edges' importance for connectivity, (ii) compute a minimum weight spanning tree TmT^m with respect to these weights, and (iii) rate the network edges based on the conductance values of TmT^m's fundamental cuts. To this end, we also (iv) develop the first optimal linear-time algorithm to compute the conductance values of \emph{all} fundamental cuts of a given spanning tree. We integrate the new edge rating into a leading multilevel graph partitioner and equip the latter with a new greedy postprocessing for optimizing the maximum communication volume (MCV). Experiments on bipartitioning frequently used benchmark networks show that the postprocessing already reduces MCV by 11.3%. Our new edge rating further reduces MCV by 10.3% compared to the previously best rating with the postprocessing in place for both ratings. In total, with a modest increase in running time, our new approach reduces the MCV of complex network partitions by 20.4%

    Space-Efficient Graph Coarsening with Applications to Succinct Planar Encodings

    We present a novel space-efficient graph coarsening technique for n-vertex planar graphs G, called cloud partition, which partitions the vertices V(G) into disjoint sets C of size O(log n) such that each C induces a connected subgraph of G. Using this partition ? we construct a so-called structure-maintaining minor F of G via specific contractions within the disjoint sets such that F has O(n/log n) vertices. The combination of (F, ?) is referred to as a cloud decomposition. For planar graphs we show that a cloud decomposition can be constructed in O(n) time and using O(n) bits. Given a cloud decomposition (F, ?) constructed for a planar graph G we are able to find a balanced separator of G in O(n/log n) time. Contrary to related publications, we do not make use of an embedding of the planar input graph. We generalize our cloud decomposition from planar graphs to H-minor-free graphs for any fixed graph H. This allows us to construct the succinct encoding scheme for H-minor-free graphs due to Blelloch and Farzan (CPM 2010) in O(n) time and O(n) bits improving both runtime and space by a factor of ?(log n). As an additional application of our cloud decomposition we show that, for H-minor-free graphs, a tree decomposition of width O(n^{1/2 + ?}) for any ? > 0 can be constructed in O(n) bits and a time linear in the size of the tree decomposition. A similar result by Izumi and Otachi (ICALP 2020) constructs a tree decomposition of width O(k ?n log n) for graphs of treewidth k ? ?n in sublinear space and polynomial time

    Parallel heuristics for scalable community detection

    AbstractCommunity detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is a multi-phase, iterative heuristic for modularity optimization. Originally developed by Blondel et al. (2008), the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains (e.g., internet, citation, biological). Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing absolute speedups of up to 16Ă— using 32 threads

    Recent Advances in Graph Partitioning

    We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions

    Algorithms and Software for the Analysis of Large Complex Networks

    The work presented intersects three main areas, namely graph algorithmics, network science and applied software engineering. Each computational method discussed relates to one of the main tasks of data analysis: to extract structural features from network data, such as methods for community detection; or to transform network data, such as methods to sparsify a network and reduce its size while keeping essential properties; or to realistically model networks through generative models

    Efficient Scheduling and High-Performance Graph Partitioning on Heterogeneous CPU-GPU Systems

    Heterogeneous CPU-GPU systems have emerged as a power-efficient platform for high performance parallelization of the applications. However, effectively exploiting these architectures faces a number of challenges including differences in the programming models of the CPU (MIMD) and the GPU (SIMD), GPU memory constraints, and comparatively low communication bandwidth between the CPU and GPU. As a consequence, high performance execution of applications on these platforms requires designing new adaptive parallelizing methods. In this thesis, first we explore embarrassingly parallel applications where tasks have no inter-dependencies. Although the massive processing power of GPUs provides an attractive opportunity for high-performance execution of embarrassingly parallel tasks on CPU-GPU systems, minimized execution time can only be obtained by optimally distributing the tasks between the processors. In contemporary CPU-GPU systems, the scheduler cannot decide about the appropriate rate distribution. Hence it requires high programming effort to manually divide the tasks among the processors. Herein, we design and implement a new dynamic scheduling heuristic to minimize the execution time of embarrassingly parallel applications on a heterogeneous CPU-GPU system. The scheduler is integrated into a scheduling framework that provides pre-implemented automated scheduling modules, liberating the user from the complexities of scheduling details. The experimental results show that our scheduling approach achieves better to similar performance compared to some of the scheduling algorithms proposed for CPU-GPU systems. We then investigate task dependent applications, where the tasks have data dependencies. The computational tasks and their communication patterns are expressed by a task interaction graph. Scheduling of the task interaction graph on a cluster can be done by first partitioning the graph into a set of computationally balanced partitions in such a way that the communication cost among the partitions is minimized, and subsequently mapping the partitions onto physical processors. Aside from scheduling, graph partitioning is a common computation phase in many application domains, including social network analysis, data mining, and VLSI design. However, irregular and data-dependent graph partitioning sub-tasks pose multiple challenges for efficient GPU utilization, which favors regularity. We design and implement a multilevel graph partitioner on a heterogeneous CPU-GPU system that takes advantage of the high parallel processing power of GPUs by executing the computation-intensive parts of the partitioning sub-tasks on the GPU and assigning the parts with less parallelism to the CPU. Our partitioner aims to overcome some of the challenges arising due to the irregular nature of the algorithm, and memory constraints on GPUs. We present a lock-free scheme since fine-grained synchronization among thousands of GPU threads imposes too high a performance overhead. Experimental results demonstrate that our partitioner outperforms serial and parallel MPI-based partitioners. It performs similar to shared-memory CPU-based parallel graph partitioner. To optimize the graph partitioner performance, we describe an effective and methodological approach to enable a GPU-based multi-level graph partitioning that is tailored specifically for the SIMD architecture. Our solution avoids thread divergence and balances the load over GPU threads by dynamically assigning an appropriate number of threads to process the graph vertices and irregular sized neighbors. Our optimized design is autonomous as all the steps are carried out by the GPU with minimal CPU interference. We show that this design outperforms CPU-based parallel graph partitioner. Finally, we apply some of our partitioning techniques to another graph processing algorithm, minimum spanning tree (MST), that exhibits load imbalance characteristics. We show that extending these techniques helps in achieving a high performance implementation of MST on the GPU

    Algorithmes stochastiques d'optimisation sous incertitude sur des structures complexes. Convergence et applications

    Les principaux sujets étudiés dans cette thèse concernent le développement d'algorithmes stochastiques d'optimisation sous incertitude, l'étude de leurs propriétés théoriques et leurs applications. Les algorithmes proposés sont des variantes du recuit simulé qui n'utilisent que des estimations sans biais de la fonction de coût. On étudie leur convergence en utilisant des outils développés dans la théorie des processus de Markov : on utilise les propriétés du générateur infinitésimal et des inégalités fonctionnelles pour mesurer la distance entre leur distribution et une distribution cible. La première partie est dédiée aux graphes quantiques, munis d'une mesure de probabilité sur l'ensemble des sommets. Les graphes quantiques sont des versions continues de graphes pondérés non-orientés. Le point de départ de cette thèse a été de trouver la moyenne de Fréchet de tels graphes. La moyenne de Fréchet est une extension aux espaces métriques de la moyenne euclidienne et est définie comme étant le point qui minimise la somme des carrés des distances pondérées à tous les sommets. Notre méthode est basée sur une formulation de Langevin d'un recuit simulé bruité et utilise une technique d'homogénéisation. Dans le but d'établir la convergence en probabilité du processus, on étudie l'évolution de l'entropie relative de sa loi par rapport a une mesure de Gibbs bien choisie. En utilisant des inégalités fonctionnelles (Poincaré et Sobolev) et le lemme de Gronwall, on montre ensuite que l'entropie relative tend vers zéro. Notre méthode est testée sur des données réelles et nous proposons une méthode heuristique pour adapter l'algorithme à de très grands graphes, en utilisant un clustering préliminaire. Dans le même cadre, on introduit une définition d'analyse en composantes principales pour un graphe quantique. Ceci implique, une fois de plus, un problème d'optimisation stochastique, cette fois-ci sur l'espace des géodésiques du graphe. Nous présentons un algorithme pour trouver la première composante principale et conjecturons la convergence du processus de Markov associé vers l'ensemble voulu. Dans une deuxième partie, on propose une version modifiée de l'algorithme du recuit simulé pour résoudre un problème d'optimisation stochastique global sur un espace d'états fini. Notre approche est inspirée du domaine général des méthodes Monte-Carlo et repose sur une chaine de Markov dont la probabilité de transition à chaque étape est définie à l'aide de " mini-lots " de taille croissante (aléatoire). On montre la convergence en probabilité de l'algorithme vers l'ensemble optimal, on donne la vitesse de convergence et un choix de paramètres optimisés pour assurer un nombre minimal d'évaluations pour une précision donnée et un intervalle de confiance proche de 1. Ce travail est complété par un ensemble de simulations numériques qui illustrent la performance pratique de notre algorithme à la fois sur des fonctions tests et sur des données réelles issues de cas concrets.The main topics of this thesis involve the development of stochastic algorithms for optimization under uncertainty, the study of their theoretical properties and applications. The proposed algorithms are modified versions of simulated an- nealing that use only unbiased estimators of the cost function. We study their convergence using the tools developed in the theory of Markov processes: we use properties of infinitesimal generators and functional inequalities to measure the distance between their probability law and a target one. The first part is concerned with quantum graphs endowed with a probability measure on their vertex set. Quantum graphs are continuous versions of undirected weighted graphs. The starting point of the present work was the question of finding Fréchet means on such a graph. The Fréchet mean is an extension of the Euclidean mean to general metric spaces and is defined as an element that minimizes the sum of weighted square distances to all vertices. Our method relies on a Langevin formulation of a noisy simulated annealing dealt with using homogenization. In order to establish the convergence in probability of the process, we study the evolution of the relative entropy of its law with respect to a convenient Gibbs measure. Using functional inequalities (Poincare and Sobolev) and Gronwall's Lemma, we then show that the relative entropy goes to zero. We test our method on some real data sets and propose an heuristic method to adapt the algorithm to huge graphs, using a preliminary clustering. In the same framework, we introduce a definition of principal component analysis for quantum graphs. This implies, once more, a stochastic optimization problem, this time on the space of the graph's geodesics. We suggest an algorithm for finding the first principal component and conjecture the convergence of the associated Markov process to the wanted set. On the second part, we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem on a finite space. Our approach is inspired by the general field of Monte Carlo methods and relies on a Markov chain whose probability transition at each step is defined with the help of mini batches of increasing (random) size. We prove the algorithm's convergence in probability towards the optimal set, provide convergence rate and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experiments and the assessment of the practical performance both on benchmark test cases and on real world examples

    Parallel and External High Quality Graph Partitioning

    Partitioning graphs into k blocks of roughly equal size such that few edges run between the blocks is a key tool for processing and analyzing large complex real-world networks. The graph partitioning problem has multiple practical applications in parallel and distributed computations, data storage, image processing, VLSI physical design and many more. Furthermore, recently, size, variety, and structural complexity of real-world networks has grown dramatically. Therefore, there is a demand for efficient graph partitioning algorithms that fully utilize computational power and memory capacity of modern machines. A popular and successful heuristic to compute a high-quality partitions of large networks in reasonable time is multi-level graph partitioning\textit{multi-level graph partitioning} approach which contracts the graph preserving its structure and then partitions it using a complex graph partitioning algorithm. Specifically, the multi-level graph partitioning approach consists of three main phases: coarsening, initial partitioning, and uncoarsening. During the coarsening phase, the graph is recursively contracted preserving its structure and properties until it is small enough to compute its initial partition during the initial partitioning phase. Afterwards, during the uncoarsening phase the partition of the contracted graph is projected onto the original graph and refined using, for example, local search. Most of the research on heuristical graph partitioning focuses on sequential algorithms or parallel algorithms in the distributed memory model. Unfortunately, previous approaches to graph partitioning are not able to process large networks and rarely take in into account several aspects of modern computational machines. Specifically, the amount of cores per chip grows each year as well as the price of RAM reduces slower than the real-world graphs grow. Since HDDs and SSDs are 50 – 400 times cheaper than RAM, external memory makes it possible to process large real-world graphs for a reasonable price. Therefore, in order to better utilize contemporary computational machines, we develop efficient multi-level graph partitioning\textit{multi-level graph partitioning} algorithms for the shared-memory and the external memory models. First, we present an approach to shared-memory parallel multi-level graph partitioning that guarantees balanced solutions, shows high speed-ups for a variety of large graphs and yields very good quality independently of the number of cores used. Important ingredients include parallel label propagation for both coarsening and uncoarsening, parallel initial partitioning, a simple yet effective approach to parallel localized local search, and fast locality preserving hash tables that effectively utilizes caches. The main idea of the parallel localized local search is that each processors refines only a small area around a random vertex reducing interactions between processors. For example, on 79 cores, our algorithms partitions a graph with more than 3 billions of edges into 16 blocks cutting 4.5% less edges than the closest competitor and being more than two times faster. Furthermore, another competitors is not able to partition this graph. We then present an approach to external memory graph partitioning that is able to partition large graphs that do not fit into RAM. Specifically, we consider the semi-external and the external memory model. In both models a data structure of size proportional to the number of edges does not fit into the RAM. The difference is that the former model assumes that a data structure of size proportional to the number of vertices fits into the RAM whereas the latter assumes the opposite. We address the graph partitioning problem in both models by adapting the size-constrained label propagation technique for the semi-external model and by developing a size-constrained clustering algorithm based on graph coloring in the external memory. Our semi-external size-constrained label propagation algorithm (or external memory clustering algorithm) can be used to compute graph clusterings and is a prerequisite for the (semi-)external graph partitioning algorithm. The algorithms are then used for both the coarsening and the uncoarsening phase of a multi-level algorithm to compute graph partitions. Our (semi-)external algorithm is able to partition and cluster huge complex networks with billions of edges on cheap commodity machines. Experiments demonstrate that the semi-external graph partitioning algorithm is scalable and can compute high quality partitions in time that is comparable to the running time of an efficient internal memory implementation. A parallelization of the algorithm in the semi-external model further reduces running times. Additionally, we develop a speed-up technique for the hypergraph partitioning algorithms. Hypergraphs are an extension of graphs that allow a single edge to connect more than two vertices. Therefore, they describe models and processes more accurately additionally allowing more possibilities for improvement. Most multi-level hypergraph partitioning algorithms perform some computations on vertices and their set of neighbors. Since these computations can be super-linear, they have a significant impact on the overall running time on large hypergraphs. Therefore, to further reduce the size of hyperedges, we develop a pin-sparsifier based on the min-hash technique that clusters vertices with similar neighborhood. Further, vertices that belong to the same cluster are substituted by one vertex, which is connected to their neighbors, therefore, reducing the size of the hypergraph. Our algorithm sparsifies a hypergraph such that the resulting graph can be partitioned significantly faster without loss in quality (or with insignificant loss). On average, KaHyPar with sparsifier performs partitioning about 1.5 times faster while preserving solution quality if hyperedges are large. All aforementioned frameworks are publicly available

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    LIPIcs, Volume 248, ISAAC 2022, Complete Volum