Search CORE

26 research outputs found

DIRECTOR: Generator-Classifiers For Supervised Language Modeling

Author: Arora Kushal
Shuster Kurt
Sukhbaatar Sainbayar
Weston Jason
Publication venue
Publication date: 25/11/2022
Field of study

Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, {\sc Director}, that consists of a unified generator-classifier with both a language modeling and a classification head for each output token. Training is conducted jointly using both standard language modeling data, and data labeled with desirable and undesirable sequences. Experiments in several settings show that the model has competitive training and decoding speed compared to standard language models while yielding superior results, alleviating known issues while maintaining generation quality. It also outperforms existing model guiding approaches in terms of both accuracy and efficiency

arXiv.org e-Print Archive

Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

Author: Adolphs Leonard
Komeili Mojtaba
Roller Stephen
Shuster Kurt
Szlam Arthur
Weston Jason
Publication venue
Publication date: 01/01/2022
Field of study

Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et al. (2021) to include internet search as a module. Our SeeKeR (Search engine->Knowledge->Response) method thus applies a single LM to three modular tasks in succession: search, generating knowledge, and generating a final response. We show that, when using SeeKeR as a dialogue model, it outperforms the state-of-the-art model BlenderBot 2 (Chen et al., 2021) on open-domain knowledge-grounded conversations for the same number of parameters, in terms of consistency, knowledge and per-turn engagingness. SeeKeR applied to topical prompt completions as a standard language model outperforms GPT2 (Radford et al., 2019) and GPT3 (Brown et al., 2020) in terms of factuality and topicality, despite GPT3 being a vastly larger model. Our code and models are made publicly available

arXiv.org e-Print Archive

Repository for Publications and Research Data