2 research outputs found
Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order
Masked language model and autoregressive language model are two types of
language models. While pretrained masked language models such as BERT overwhelm
the line of natural language understanding (NLU) tasks, autoregressive language
models such as GPT are especially capable in natural language generation (NLG).
In this paper, we propose a probabilistic masking scheme for the masked
language model, which we call probabilistically masked language model (PMLM).
We implement a specific PMLM with a uniform prior distribution on the masking
ratio named u-PMLM. We prove that u-PMLM is equivalent to an autoregressive
permutated language model. One main advantage of the model is that it supports
text generation in arbitrary order with surprisingly good quality, which could
potentially enable new applications over traditional unidirectional generation.
Besides, the pretrained u-PMLM also outperforms BERT on a set of downstream NLU
tasks.Comment: Accepted by ACL 202
M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis
Conditional image synthesis aims to create an image according to some
multi-modal guidance in the forms of textual descriptions, reference images,
and image blocks to preserve, as well as their combinations. In this paper,
instead of investigating these control signals separately, we propose a new
two-stage architecture, M6-UFC, to unify any number of multi-modal controls. In
M6-UFC, both the diverse control signals and the synthesized image are
uniformly represented as a sequence of discrete tokens to be processed by
Transformer. Different from existing two-stage autoregressive approaches such
as DALL-E and VQGAN, M6-UFC adopts non-autoregressive generation (NAR) at the
second stage to enhance the holistic consistency of the synthesized image, to
support preserving specified image blocks, and to improve the synthesis speed.
Further, we design a progressive algorithm that iteratively improves the
non-autoregressively generated image, with the help of two estimators developed
for evaluating the compliance with the controls and evaluating the fidelity of
the synthesized image, respectively. Extensive experiments on a newly collected
large-scale clothing dataset M2C-Fashion and a facial dataset Multi-Modal
CelebA-HQ verify that M6-UFC can synthesize high-fidelity images that comply
with flexible multi-modal controls.Comment: Accepted by NeurIPS2