Recent years have witnessed the great potential of attention mechanism in
graph representation learning. However, while variants of attention-based GNNs
are setting new benchmarks for numerous real-world datasets, recent works have
pointed out that their induced attentions are less robust and generalizable
against noisy graphs due to lack of direct supervision. In this paper, we
present a new framework which utilizes the tool of causality to provide a
powerful supervision signal for the learning process of attention functions.
Specifically, we estimate the direct causal effect of attention to the final
prediction, and then maximize such effect to guide attention attending to more
meaningful neighbors. Our method can serve as a plug-and-play module for any
canonical attention-based GNNs in an end-to-end fashion. Extensive experiments
on a wide range of benchmark datasets illustrated that, by directly supervising
attention functions, the model is able to converge faster with a clearer
decision boundary, and thus yields better performances