193 research outputs found
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
We propose an inference procedure for deep convolutional neural networks
(CNNs) when partial evidence is available. Our method consists of a general
feedback-based propagation approach (feedback-prop) that boosts the prediction
accuracy for an arbitrary set of unknown target labels when the values for a
non-overlapping arbitrary set of target labels are known. We show that existing
models trained in a multi-label or multi-task setting can readily take
advantage of feedback-prop without any retraining or fine-tuning. Our
feedback-prop inference procedure is general, simple, reliable, and works on
different challenging visual recognition tasks. We present two variants of
feedback-prop based on layer-wise and residual iterative updates. We experiment
using several multi-task models and show that feedback-prop is effective in all
of them. Our results unveil a previously unreported but interesting dynamic
property of deep CNNs. We also present an associated technical approach that
takes advantage of this property for inference under partial evidence in
general visual recognition tasks.Comment: Accepted to CVPR 201
Towards Diverse and Consistent Typography Generation
In this work, we consider the typography generation task that aims at
producing diverse typographic styling for the given graphic document. We
formulate typography generation as a fine-grained attribute generation for
multiple text elements and build an autoregressive model to generate diverse
typography that matches the input design context. We further propose a simple
yet effective sampling approach that respects the consistency and distinction
principle of typography so that generated examples share consistent typographic
styling across text elements. Our empirical study shows that our model
successfully generates diverse typographic designs while preserving a
consistent typographic structure
Generative Colorization of Structured Mobile Web Pages
Color is a critical design factor for web pages, affecting important factors
such as viewer emotions and the overall trust and satisfaction of a website.
Effective coloring requires design knowledge and expertise, but if this process
could be automated through data-driven modeling, efficient exploration and
alternative workflows would be possible. However, this direction remains
underexplored due to the lack of a formalization of the web page colorization
problem, datasets, and evaluation protocols. In this work, we propose a new
dataset consisting of e-commerce mobile web pages in a tractable format, which
are created by simplifying the pages and extracting canonical color styles with
a common web browser. The web page colorization problem is then formalized as a
task of estimating plausible color styles for a given web page content with a
given hierarchical structure of the elements. We present several
Transformer-based methods that are adapted to this task by prepending
structural message passing to capture hierarchical relationships between
elements. Experimental results, including a quantitative evaluation designed
for this task, demonstrate the advantages of our methods over statistical and
image colorization methods. The code is available at
https://github.com/CyberAgentAILab/webcolor.Comment: Accepted to WACV 202
Towards Flexible Multi-modal Document Models
Creative workflows for generating graphical documents involve complex
inter-related tasks, such as aligning elements, choosing appropriate fonts, or
employing aesthetically harmonious colors. In this work, we attempt at building
a holistic model that can jointly solve many different design tasks. Our model,
which we denote by FlexDM, treats vector graphic documents as a set of
multi-modal elements, and learns to predict masked fields such as element type,
position, styling attributes, image, or text, using a unified architecture.
Through the use of explicit multi-task learning and in-domain pre-training, our
model can better capture the multi-modal relationships among the different
document fields. Experimental results corroborate that our single FlexDM is
able to successfully solve a multitude of different design tasks, while
achieving performance that is competitive with task-specific and costly
baselines.Comment: To be published in CVPR2023 (highlight), project page:
https://cyberagentailab.github.io/flex-d
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
Controllable layout generation aims at synthesizing plausible arrangement of
element bounding boxes with optional constraints, such as type or position of a
specific element. In this work, we try to solve a broad range of layout
generation tasks in a single model that is based on discrete state-space
diffusion models. Our model, named LayoutDM, naturally handles the structured
layout data in the discrete representation and learns to progressively infer a
noiseless layout from the initial input, where we model the layout corruption
process by modality-wise discrete diffusion. For conditional generation, we
propose to inject layout constraints in the form of masking or logit adjustment
during inference. We show in the experiments that our LayoutDM successfully
generates high-quality layouts and outperforms both task-specific and
task-agnostic baselines on several layout tasks.Comment: To be published in CVPR2023, project page:
https://cyberagentailab.github.io/layout-dm
- …