The long-standing one-to-many issue of the open-domain dialogues poses
significant challenges for automatic evaluation methods, i.e., there may be
multiple suitable responses which differ in semantics for a given
conversational context. To tackle this challenge, we propose a novel
learning-based automatic evaluation metric (CMN), which can robustly evaluate
open-domain dialogues by augmenting Conditional Variational Autoencoders
(CVAEs) with a Next Sentence Prediction (NSP) objective and employing Mutual
Information (MI) to model the semantic similarity of text in the latent space.
Experimental results on two open-domain dialogue datasets demonstrate the
superiority of our method compared with a wide range of baselines, especially
in handling responses which are distant to the golden reference responses in
semantics.Comment: Accepted at ACL202