8 research outputs found
Streaming Matching and Edge Cover in Practice
Graph algorithms with polynomial space and time requirements often become infeasible for massive graphs with billions of edges or more. State-of-the-art approaches therefore employ approximate serial, parallel, and distributed algorithms to tackle these challenges. However, such approaches require storing the entire graph in memory and thus need access to costly computing resources such as clusters and supercomputers. In this paper, we present practical streaming approaches for solving massive graph problems using limited memory for two prototypical graph problems: maximum weighted matching and minimum weighted edge cover. For matching, we conduct a thorough computational study on two of the semi-streaming algorithms including a recent breakthrough result that achieves a 1/(2+ε)-approximation of the weight while using O(n log W /ε) memory (here n is the number of vertices and W is the maximum edge weight), designed by Paz and Schwartzman [SODA, 2017]. Empirically, we show that the semi-streaming algorithms produce matchings whose weight is close to the best 1/2-approximate offline algorithm while requiring less time and an order-of-magnitude less memory.
For minimum weighted edge cover, we develop three novel semi-streaming algorithms. Two of these algorithms require a single pass through the input graph, require O(n log n) memory, and provide a 2-approximation guarantee on the objective. We also leverage a relationship between approximate maximum weighted matching and approximate minimum weighted edge cover to develop a two-pass 3/2+ε-approximate algorithm with the memory requirement of Paz and Schwartzman’s semi-streaming matching algorithm. These streaming approaches are compared against the state-of-the-art 3/2-approximate offline algorithm.
The semi-streaming matching and the novel edge cover algorithms proposed in this paper can process graphs with several billions of edges in under 30 minutes using 6 GB of memory, which is at least an order of magnitude improvement from the offline (non-streaming) algorithms. For the largest graph, the best alternative offline parallel approximation algorithm (GPA+ROMA) could not finish in three hours even while employing hundreds of processors and 1 TB of memory. We also demonstrate an application of semi-streaming algorithm by computing a matching using linearly bounded memory on intersection graphs derived from three machine learning datasets, while the existing offline algorithms could not complete on one of these datasets since its memory requirement exceeded 1TB
Approximate Bipartite -Matching using Multiplicative Auction
Given a bipartite graph with vertices and edges
and a function , a -matching is a subset of
edges such that every vertex is incident to at most edges in
the subset. When we are also given edge weights, the Max Weight -Matching
problem is to find a -matching of maximum weight, which is a fundamental
combinatorial optimization problem with many applications. Extending on the
recent work of Zheng and Henzinger (IPCO, 2023) on standard bipartite matching
problems, we develop a simple auction algorithm to approximately solve Max
Weight -Matching. Specifically, we present a multiplicative auction
algorithm that gives a -approximation in worst case time, where
the maximum -value. Although this is a factor greater
than the current best approximation algorithm by Huang and Pettie
(Algorithmica, 2022), it is considerably simpler to present, analyze, and
implement.Comment: 14 pages; Accepted as a refereed paper in the 2024 INFORMS
Optimization Society conferenc
Semi-Streaming Algorithms for Weighted k-Disjoint Matchings
We design and implement two single-pass semi-streaming algorithms for the maximum weight k-disjoint matching (k-DM) problem. Given an integer k, the k-DM problem is to find k pairwise edge-disjoint matchings such that the sum of the weights of the matchings is maximized. For k ≥ 2, this problem is NP-hard. Our first algorithm is based on the primal-dual framework of a linear programming relaxation of the problem and is 1/(3+ε)-approximate. We also develop an approximation preserving reduction from k-DM to the maximum weight b-matching problem. Leveraging this reduction and an existing semi-streaming b-matching algorithm, we design a (1/(2+ε))(1 - 1/(k+1))-approximate semi-streaming algorithm for k-DM. For any constant ε > 0, both of these algorithms require O(nk log_{1+ε}² n) bits of space. To the best of our knowledge, this is the first study of semi-streaming algorithms for the k-DM problem.
We compare our two algorithms to state-of-the-art offline algorithms on 95 real-world and synthetic test problems, including thirteen graphs generated from data center network traces. On these instances, our streaming algorithms used significantly less memory (ranging from 6× to 512× less) and were faster in runtime than the offline algorithms. Our solutions were often within 5% of the best weights from the offline algorithms. We highlight that the existing offline algorithms run out of 1 TB memory for most of the large instances (> 1 billion edges), whereas our streaming algorithms can solve these problems using only 100 GB memory for k = 8
AGS-GNN: Attribute-guided Sampling for Graph Neural Networks
We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph
Neural Networks (GNNs) that exploits node features and connectivity structure
of a graph while simultaneously adapting for both homophily and heterophily in
graphs. (In homophilic graphs vertices of the same class are more likely to be
connected, and vertices of different classes tend to be linked in heterophilic
graphs.) While GNNs have been successfully applied to homophilic graphs, their
application to heterophilic graphs remains challenging. The best-performing
GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high
computational costs, and are not inductive. We employ samplers based on
feature-similarity and feature-diversity to select subsets of neighbors for a
node, and adaptively capture information from homophilic and heterophilic
neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm
that we know of that explicitly controls homophily in the sampled subgraph
through similar and diverse neighborhood samples. For diverse neighborhood
sampling, we employ submodularity, which was not used in this context prior to
our work. The sampling distribution is pre-computed and highly parallel,
achieving the desired scalability. Using an extensive dataset consisting of 35
small ( 100K nodes) and large (>100K nodes) homophilic and heterophilic
graphs, we demonstrate the superiority of AGS-GNN compare to the current
approaches in the literature. AGS-GNN achieves comparable test accuracy to the
best-performing heterophilic GNNs, even outperforming methods using the entire
graph for node classification. AGS-GNN also converges faster compared to
methods that sample neighborhoods randomly, and can be incorporated into
existing GNN models that employ node or graph sampling.Comment: The paper has been accepted to KDD'24 in the research trac
AMG Preconditioners based on Parallel Hybrid Coarsening and Multi-objective Graph Matching
GreediRIS: Scalable Influence Maximization using Distributed Streaming Maximum Cover
Influence maximization--the problem of identifying a subset of k influential
seeds (vertices) in a network--is a classical problem in network science with
numerous applications. The problem is NP-hard, but there exist efficient
polynomial time approximations. However, scaling these algorithms still remain
a daunting task due to the complexities associated with steps involving
stochastic sampling and large-scale aggregations. In this paper, we present a
new parallel distributed approximation algorithm for influence maximization
with provable approximation guarantees. Our approach, which we call GreediRIS,
leverages the RandGreedi framework--a state-of-the-art approach for distributed
submodular optimization--for solving a step that computes a maximum k cover.
GreediRIS combines distributed and streaming models of computations, along with
pruning techniques, to effectively address the communication bottlenecks of the
algorithm. Experimental results on up to 512 nodes (32K cores) of the NERSC
Perlmutter supercomputer show that GreediRIS can achieve good strong scaling
performance, preserve quality, and significantly outperform the other
state-of-the-art distributed implementations. For instance, on 512 nodes, the
most performant variant of GreediRIS achieves geometric mean speedups of 28.99x
and 36.35x for two different diffusion models, over a state-of-the-art parallel
implementation. We also present a communication-optimized version of GreediRIS
that further improves the speedups by two orders of magnitude
Physics-constrained graph modeling for building thermal dynamics
In this paper, we propose a graph model embedded with compact physical equations for modeling the thermal dynamics of buildings. The principles of heat flow across various components in the building, such as walls and doors, fit the message-passing strategy used by Graph Neural networks (GNNs). The proposed method is to represent the multi-zone building as a graph, in which only zones are considered as nodes, and any heat flow between zones is modeled as an edge based on prior knowledge of the building structure. Furthermore, the thermal dynamics of these components are described by compact models in the graph. GNNs are further employed to train model parameters from collected data. During model training, our proposed method enforces physical constraints (e.g., zone sizes and connections) on model parameters and propagates the penalty in the loss function of GNN. Such constraints are essential to ensure model robustness and interpretability. We evaluate the effectiveness of the proposed modeling approach on a realistic dataset with multiple zones. The results demonstrate a satisfactory accuracy in the prediction of multi-zone temperature. Moreover, we illustrate that the new model can reliably learn hidden physical parameters with incomplete data
