The capability to generate responses with diversity and faithfulness using
factual knowledge is paramount for creating a human-like, trustworthy dialogue
system. Common strategies either adopt a two-step paradigm, which optimizes
knowledge selection and response generation separately, and may overlook the
inherent correlation between these two tasks, or leverage conditional
variational method to jointly optimize knowledge selection and response
generation by employing an inference network. In this paper, we present an
end-to-end learning framework, termed Sequential Posterior Inference (SPI),
capable of selecting knowledge and generating dialogues by approximately
sampling from the posterior distribution. Unlike other methods, SPI does not
require the inference network or assume a simple geometry of the posterior
distribution. This straightforward and intuitive inference procedure of SPI
directly queries the response generation model, allowing for accurate knowledge
selection and generation of faithful responses. In addition to modeling
contributions, our experimental results on two common dialogue datasets (Wizard
of Wikipedia and Holl-E) demonstrate that SPI outperforms previous strong
baselines according to both automatic and human evaluation metrics