26 research outputs found
DIRECTOR: Generator-Classifiers For Supervised Language Modeling
Current language models achieve low perplexity but their resulting
generations still suffer from toxic responses, repetitiveness and
contradictions. The standard language modeling setup fails to address these
issues. In this paper, we introduce a new architecture, {\sc Director}, that
consists of a unified generator-classifier with both a language modeling and a
classification head for each output token. Training is conducted jointly using
both standard language modeling data, and data labeled with desirable and
undesirable sequences. Experiments in several settings show that the model has
competitive training and decoding speed compared to standard language models
while yielding superior results, alleviating known issues while maintaining
generation quality. It also outperforms existing model guiding approaches in
terms of both accuracy and efficiency
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion
Language models (LMs) have recently been shown to generate more factual
responses by employing modularity (Zhou et al., 2021) in combination with
retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et
al. (2021) to include internet search as a module. Our SeeKeR (Search
engine->Knowledge->Response) method thus applies a single LM to three modular
tasks in succession: search, generating knowledge, and generating a final
response. We show that, when using SeeKeR as a dialogue model, it outperforms
the state-of-the-art model BlenderBot 2 (Chen et al., 2021) on open-domain
knowledge-grounded conversations for the same number of parameters, in terms of
consistency, knowledge and per-turn engagingness. SeeKeR applied to topical
prompt completions as a standard language model outperforms GPT2 (Radford et
al., 2019) and GPT3 (Brown et al., 2020) in terms of factuality and topicality,
despite GPT3 being a vastly larger model. Our code and models are made publicly
available