VQA Natural Language Explanation (VQA-NLE) task aims to explain the
decision-making process of VQA models in natural language. Unlike traditional
attention or gradient analysis, free-text rationales can be easier to
understand and gain users' trust. Existing methods mostly use post-hoc or
self-rationalization models to obtain a plausible explanation. However, these
frameworks are bottlenecked by the following challenges: 1) the reasoning
process cannot be faithfully responded to and suffer from the problem of
logical inconsistency. 2) Human-annotated explanations are expensive and
time-consuming to collect. In this paper, we propose a new Semi-Supervised
VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate
explanations by answering rewards to improve the logical consistency between
answers and rationales. With a semi-supervised learning framework, the S3C can
benefit from a tremendous amount of samples without human-annotated
explanations. A large number of automatic measures and human evaluations all
show the effectiveness of our method. Meanwhile, the framework achieves a new
state-of-the-art performance on the two VQA-NLE datasets.Comment: CVPR202