Conditional variational autoencoders (CVAEs) have been used recently for
diverse response generation, by introducing latent variables to represent the
relationship between a dialog context and its potential responses. However, the
diversity of the generated responses brought by a CVAE model is limited due to
the oversimplified assumption of the isotropic Gaussian prior. We propose,
Dior-CVAE, a hierarchical CVAE model with an informative prior produced by a
diffusion model. Dior-CVAE derives a series of layer-wise latent variables
using attention mechanism and infusing them into decoder layers accordingly. We
propose memory dropout in the latent infusion to alleviate posterior collapse.
The prior distribution of the latent variables is parameterized by a diffusion
model to introduce a multimodal distribution. Overall, experiments on two
popular open-domain dialog datasets indicate the advantages of our approach
over previous Transformer-based variational dialog models in dialog response
generation. We publicly release the code for reproducing Dior-CVAE and all
baselines at
https://github.com/SkyFishMoon/Latent-Diffusion-Response-Generation