Creating layouts is a fundamental step in graphic design. In this work, we
propose to use text as the guidance to create graphic layouts, i.e.,
Text-to-Layout, aiming to lower the design barriers. Text-to-Layout is a
challenging task, because it needs to consider the implicit, combined, and
incomplete layout constraints from text, each of which has not been studied in
previous work. To address this, we present a two-stage approach, named
parse-then-place. The approach introduces an intermediate representation (IR)
between text and layout to represent diverse layout constraints. With IR,
Text-to-Layout is decomposed into a parse stage and a place stage. The parse
stage takes a textual description as input and generates an IR, in which the
implicit constraints from the text are transformed into explicit ones. The
place stage generates layouts based on the IR. To model combined and incomplete
constraints, we use a Transformer-based layout generation model and carefully
design a way to represent constraints and layouts as sequences. Besides, we
adopt the pretrain-then-finetune strategy to boost the performance of the
layout generation model with large-scale unlabeled layouts. To evaluate our
approach, we construct two Text-to-Layout datasets and conduct experiments on
them. Quantitative results, qualitative analysis, and user studies demonstrate
the effectiveness of our approach.Comment: Accepted by ICCV202