197,340 research outputs found
PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout
Content-aware visual-textual presentation layout aims at arranging spatial
space on the given canvas for pre-defined elements, including text, logo, and
underlay, which is a key to automatic template-free creative graphic design. In
practical applications, e.g., poster designs, the canvas is originally
non-empty, and both inter-element relationships as well as inter-layer
relationships should be concerned when generating a proper layout. A few recent
works deal with them simultaneously, but they still suffer from poor graphic
performance, such as a lack of layout variety or spatial non-alignment. Since
content-aware visual-textual presentation layout is a novel task, we first
construct a new dataset named PosterLayout, which consists of 9,974
poster-layout pairs and 905 images, i.e., non-empty canvases. It is more
challenging and useful for greater layout variety, domain diversity, and
content diversity. Then, we propose design sequence formation (DSF) that
reorganizes elements in layouts to imitate the design processes of human
designers, and a novel CNN-LSTM-based conditional generative adversarial
network (GAN) is presented to generate proper layouts. Specifically, the
discriminator is design-sequence-aware and will supervise the "design" process
of the generator. Experimental results verify the usefulness of the new
benchmark and the effectiveness of the proposed approach, which achieves the
best performance by generating suitable layouts for diverse canvases.Comment: Accepted to CVPR 2023. Dataset and code are available at
https://github.com/PKU-ICST-MIPL/PosterLayout-CVPR202
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
The pre-training-fine-tuning paradigm based on layout-aware multimodal
pre-trained models has achieved significant progress on document image question
answering. However, domain pre-training and task fine-tuning for additional
visual, layout, and task modules prevent them from directly utilizing
off-the-shelf instruction-tuning language foundation models, which have
recently shown promising potential in zero-shot learning. Contrary to aligning
language models to the domain of document image question answering, we align
document image question answering to off-the-shell instruction-tuning language
foundation models to utilize their zero-shot capability. Specifically, we
propose layout and task aware instruction prompt called LATIN-Prompt, which
consists of layout-aware document content and task-aware descriptions. The
former recovers the layout information among text segments from OCR tools by
appropriate spaces and line breaks. The latter ensures that the model generates
answers that meet the requirements, especially format requirements, through a
detailed description of task. Experimental results on three benchmarks show
that LATIN-Prompt can improve the zero-shot performance of instruction-tuning
language foundation models on document image question answering and help them
achieve comparable levels to SOTAs based on the pre-training-fine-tuning
paradigm. Quantitative analysis and qualitative analysis demonstrate the
effectiveness of LATIN-Prompt. We provide the code in supplementary and will
release the code to facilitate future research.Comment: Code is available at https://github.com/WenjinW/LATIN-Promp
AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation
Advertising posters, a form of information presentation, combine visual and
linguistic modalities. Creating a poster involves multiple steps and
necessitates design experience and creativity. This paper introduces
AutoPoster, a highly automatic and content-aware system for generating
advertising posters. With only product images and titles as inputs, AutoPoster
can automatically produce posters of varying sizes through four key stages:
image cleaning and retargeting, layout generation, tagline generation, and
style attribute prediction. To ensure visual harmony of posters, two
content-aware models are incorporated for layout and tagline generation.
Moreover, we propose a novel multi-task Style Attribute Predictor (SAP) to
jointly predict visual style attributes. Meanwhile, to our knowledge, we
propose the first poster generation dataset that includes visual attribute
annotations for over 76k posters. Qualitative and quantitative outcomes from
user studies and experiments substantiate the efficacy of our system and the
aesthetic superiority of the generated posters compared to other poster
generation methods.Comment: Accepted for ACM MM 202
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Graphic layout designs play an essential role in visual communication. Yet
handcrafting layout designs is skill-demanding, time-consuming, and
non-scalable to batch production. Generative models emerge to make design
automation scalable but it remains non-trivial to produce designs that comply
with designers' multimodal desires, i.e., constrained by background images and
driven by foreground content. We propose LayoutDETR that inherits the high
quality and realism from generative modeling, while reformulating content-aware
requirements as a detection problem: we learn to detect in a background image
the reasonable locations, scales, and spatial relations for multimodal
foreground elements in a layout. Our solution sets a new state-of-the-art
performance for layout generation on public benchmarks and on our newly-curated
ad banner dataset. We integrate our solution into a graphical system that
facilitates user studies, and show that users prefer our designs over baselines
by significant margins. Our code, models, dataset, graphical system, and demos
are available at https://github.com/salesforce/LayoutDETR
On Comments in Visual Languages
Visual languages based on node-link diagrams can be used to develop software and, like textual languages, offer the possibility to write explanatory comments. Which node a comment refers to is usually not made explicit, but is implicitly clear to readers through placement and content. While automatic layout algorithms can make working with diagrams more productive, they tend to destroy such implicit clues because they are not aware of them and thus do not preserve the relative placement of comments and the nodes they refer to. Implicit clues thus need to be inferred and made explicit to be taken into account by layout algorithms. This is what we call the comment attachment problem. In this paper, we improve upon a previous paper on the subject [9], introducing further heuristics that aim to describe relations between comments and nodes. Based on an analysis of comment placement in a set of example diagrams, we develop a general comment attachment framework and evaluate the quality of its inferred attachments
- …