8 research outputs found

    Streaming Matching and Edge Cover in Practice

    Get PDF
    Graph algorithms with polynomial space and time requirements often become infeasible for massive graphs with billions of edges or more. State-of-the-art approaches therefore employ approximate serial, parallel, and distributed algorithms to tackle these challenges. However, such approaches require storing the entire graph in memory and thus need access to costly computing resources such as clusters and supercomputers. In this paper, we present practical streaming approaches for solving massive graph problems using limited memory for two prototypical graph problems: maximum weighted matching and minimum weighted edge cover. For matching, we conduct a thorough computational study on two of the semi-streaming algorithms including a recent breakthrough result that achieves a 1/(2+ε)-approximation of the weight while using O(n log W /ε) memory (here n is the number of vertices and W is the maximum edge weight), designed by Paz and Schwartzman [SODA, 2017]. Empirically, we show that the semi-streaming algorithms produce matchings whose weight is close to the best 1/2-approximate offline algorithm while requiring less time and an order-of-magnitude less memory. For minimum weighted edge cover, we develop three novel semi-streaming algorithms. Two of these algorithms require a single pass through the input graph, require O(n log n) memory, and provide a 2-approximation guarantee on the objective. We also leverage a relationship between approximate maximum weighted matching and approximate minimum weighted edge cover to develop a two-pass 3/2+ε-approximate algorithm with the memory requirement of Paz and Schwartzman’s semi-streaming matching algorithm. These streaming approaches are compared against the state-of-the-art 3/2-approximate offline algorithm. The semi-streaming matching and the novel edge cover algorithms proposed in this paper can process graphs with several billions of edges in under 30 minutes using 6 GB of memory, which is at least an order of magnitude improvement from the offline (non-streaming) algorithms. For the largest graph, the best alternative offline parallel approximation algorithm (GPA+ROMA) could not finish in three hours even while employing hundreds of processors and 1 TB of memory. We also demonstrate an application of semi-streaming algorithm by computing a matching using linearly bounded memory on intersection graphs derived from three machine learning datasets, while the existing offline algorithms could not complete on one of these datasets since its memory requirement exceeded 1TB

    Approximate Bipartite bb-Matching using Multiplicative Auction

    Full text link
    Given a bipartite graph G(V=(AB),E)G(V= (A \cup B),E) with nn vertices and mm edges and a function b ⁣:VZ+b \colon V \to \mathbb{Z}_+, a bb-matching is a subset of edges such that every vertex vVv \in V is incident to at most b(v)b(v) edges in the subset. When we are also given edge weights, the Max Weight bb-Matching problem is to find a bb-matching of maximum weight, which is a fundamental combinatorial optimization problem with many applications. Extending on the recent work of Zheng and Henzinger (IPCO, 2023) on standard bipartite matching problems, we develop a simple auction algorithm to approximately solve Max Weight bb-Matching. Specifically, we present a multiplicative auction algorithm that gives a (1ε)(1 - \varepsilon)-approximation in O(mε1logε1logβ)O(m \varepsilon^{-1} \log \varepsilon^{-1} \log \beta) worst case time, where β\beta the maximum bb-value. Although this is a logβ\log \beta factor greater than the current best approximation algorithm by Huang and Pettie (Algorithmica, 2022), it is considerably simpler to present, analyze, and implement.Comment: 14 pages; Accepted as a refereed paper in the 2024 INFORMS Optimization Society conferenc

    Semi-Streaming Algorithms for Weighted k-Disjoint Matchings

    Get PDF
    We design and implement two single-pass semi-streaming algorithms for the maximum weight k-disjoint matching (k-DM) problem. Given an integer k, the k-DM problem is to find k pairwise edge-disjoint matchings such that the sum of the weights of the matchings is maximized. For k ≥ 2, this problem is NP-hard. Our first algorithm is based on the primal-dual framework of a linear programming relaxation of the problem and is 1/(3+ε)-approximate. We also develop an approximation preserving reduction from k-DM to the maximum weight b-matching problem. Leveraging this reduction and an existing semi-streaming b-matching algorithm, we design a (1/(2+ε))(1 - 1/(k+1))-approximate semi-streaming algorithm for k-DM. For any constant ε > 0, both of these algorithms require O(nk log_{1+ε}² n) bits of space. To the best of our knowledge, this is the first study of semi-streaming algorithms for the k-DM problem. We compare our two algorithms to state-of-the-art offline algorithms on 95 real-world and synthetic test problems, including thirteen graphs generated from data center network traces. On these instances, our streaming algorithms used significantly less memory (ranging from 6× to 512× less) and were faster in runtime than the offline algorithms. Our solutions were often within 5% of the best weights from the offline algorithms. We highlight that the existing offline algorithms run out of 1 TB memory for most of the large instances (> 1 billion edges), whereas our streaming algorithms can solve these problems using only 100 GB memory for k = 8

    AGS-GNN: Attribute-guided Sampling for Graph Neural Networks

    Full text link
    We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) While GNNs have been successfully applied to homophilic graphs, their application to heterophilic graphs remains challenging. The best-performing GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high computational costs, and are not inductive. We employ samplers based on feature-similarity and feature-diversity to select subsets of neighbors for a node, and adaptively capture information from homophilic and heterophilic neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm that we know of that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, which was not used in this context prior to our work. The sampling distribution is pre-computed and highly parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small (\le 100K nodes) and large (>100K nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compare to the current approaches in the literature. AGS-GNN achieves comparable test accuracy to the best-performing heterophilic GNNs, even outperforming methods using the entire graph for node classification. AGS-GNN also converges faster compared to methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.Comment: The paper has been accepted to KDD'24 in the research trac

    GreediRIS: Scalable Influence Maximization using Distributed Streaming Maximum Cover

    Full text link
    Influence maximization--the problem of identifying a subset of k influential seeds (vertices) in a network--is a classical problem in network science with numerous applications. The problem is NP-hard, but there exist efficient polynomial time approximations. However, scaling these algorithms still remain a daunting task due to the complexities associated with steps involving stochastic sampling and large-scale aggregations. In this paper, we present a new parallel distributed approximation algorithm for influence maximization with provable approximation guarantees. Our approach, which we call GreediRIS, leverages the RandGreedi framework--a state-of-the-art approach for distributed submodular optimization--for solving a step that computes a maximum k cover. GreediRIS combines distributed and streaming models of computations, along with pruning techniques, to effectively address the communication bottlenecks of the algorithm. Experimental results on up to 512 nodes (32K cores) of the NERSC Perlmutter supercomputer show that GreediRIS can achieve good strong scaling performance, preserve quality, and significantly outperform the other state-of-the-art distributed implementations. For instance, on 512 nodes, the most performant variant of GreediRIS achieves geometric mean speedups of 28.99x and 36.35x for two different diffusion models, over a state-of-the-art parallel implementation. We also present a communication-optimized version of GreediRIS that further improves the speedups by two orders of magnitude

    Physics-constrained graph modeling for building thermal dynamics

    No full text
    In this paper, we propose a graph model embedded with compact physical equations for modeling the thermal dynamics of buildings. The principles of heat flow across various components in the building, such as walls and doors, fit the message-passing strategy used by Graph Neural networks (GNNs). The proposed method is to represent the multi-zone building as a graph, in which only zones are considered as nodes, and any heat flow between zones is modeled as an edge based on prior knowledge of the building structure. Furthermore, the thermal dynamics of these components are described by compact models in the graph. GNNs are further employed to train model parameters from collected data. During model training, our proposed method enforces physical constraints (e.g., zone sizes and connections) on model parameters and propagates the penalty in the loss function of GNN. Such constraints are essential to ensure model robustness and interpretability. We evaluate the effectiveness of the proposed modeling approach on a realistic dataset with multiple zones. The results demonstrate a satisfactory accuracy in the prediction of multi-zone temperature. Moreover, we illustrate that the new model can reliably learn hidden physical parameters with incomplete data
    corecore