Classic Graph Neural Network (GNN) inference approaches, designed for static
graphs, are ill-suited for streaming graphs that evolve with time. The dynamism
intrinsic to streaming graphs necessitates constant updates, posing unique
challenges to acceleration on GPU. We address these challenges based on two key
insights: (1) Inside the k-hop neighborhood, a significant fraction of the
nodes is not impacted by the modified edges when the model uses min or max as
aggregation function; (2) When the model weights remain static while the graph
structure changes, node embeddings can incrementally evolve over time by
computing only the impacted part of the neighborhood. With these insights, we
propose a novel method, InkStream, designed for real-time inference with
minimal memory access and computation, while ensuring an identical output to
conventional methods. InkStream operates on the principle of propagating and
fetching data only when necessary. It uses an event-based system to control
inter-layer effect propagation and intra-layer incremental updates of node
embedding. InkStream is highly extensible and easily configurable by allowing
users to create and process customized events. We showcase that less than 10
lines of additional user code are needed to support popular GNN models such as
GCN, GraphSAGE, and GIN. Our experiments with three GNN models on four large
graphs demonstrate that InkStream accelerates by 2.5-427× on a CPU
cluster and 2.4-343× on two different GPU clusters while producing
identical outputs as GNN model inference on the latest graph snapshot