End-to-end neural NLP architectures are notoriously difficult to understand,
which gives rise to numerous efforts towards model explainability in recent
years. An essential principle of model explanation is Faithfulness, i.e., an
explanation should accurately represent the reasoning process behind the
model's prediction. This survey first discusses the definition and evaluation
of Faithfulness, as well as its significance for explainability. We then
introduce the recent advances in faithful explanation by grouping approaches
into five categories: similarity methods, analysis of model-internal
structures, backpropagation-based methods, counterfactual intervention, and
self-explanatory models. Each category will be illustrated with its
representative studies, advantages, and shortcomings. Finally, we discuss all
the above methods in terms of their common virtues and limitations, and reflect
on future work directions towards faithful explainability. For researchers
interested in studying interpretability, this survey will offer an accessible
and comprehensive overview of the area, laying the basis for further
exploration. For users hoping to better understand their own models, this
survey will be an introductory manual helping with choosing the most suitable
explanation method(s).Comment: 62 page