Node representation learning on attributed graphs -- whose nodes are
associated with rich attributes (e.g., texts and protein sequences) -- plays a
crucial role in many important downstream tasks. To encode the attributes and
graph structures simultaneously, recent studies integrate pre-trained models
with graph neural networks (GNNs), where pre-trained models serve as node
encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs
on large-scale graphs suffers from severe scalability issues, many methods
propose to train NEs and GNNs separately. Consequently, they do not take
feature convolutions in GNNs into consideration in the training phase of NEs,
leading to a significant learning bias from that by the joint training. To
address this challenge, we propose an efficient label regularization technique,
namely Label Deconvolution (LD), to alleviate the learning bias by a novel and
highly scalable approximation to the inverse mapping of GNNs. The inverse
mapping leads to an objective function that is equivalent to that by the joint
training, while it can effectively incorporate GNNs in the training phase of
NEs against the learning bias. More importantly, we show that LD converges to
the optimal objective function values by thejoint training under mild
assumptions. Experiments demonstrate LD significantly outperforms
state-of-the-art methods on Open Graph Benchmark datasets