While Graph Neural Networks (GNNs) have recently become the de facto standard
for modeling relational data, they impose a strong assumption on the
availability of the node or edge features of the graph. In many real-world
applications, however, features are only partially available; for example, in
social networks, age and gender are available only for a small subset of users.
We present a general approach for handling missing features in graph machine
learning applications that is based on minimization of the Dirichlet energy and
leads to a diffusion-type differential equation on the graph. The
discretization of this equation produces a simple, fast and scalable algorithm
which we call Feature Propagation. We experimentally show that the proposed
approach outperforms previous methods on seven common node-classification
benchmarks and can withstand surprisingly high rates of missing features: on
average we observe only around 4% relative accuracy drop when 99% of the
features are missing. Moreover, it takes only 10 seconds to run on a graph with
∼2.5M nodes and ∼123M edges on a single GPU