829 research outputs found
Colorization as a Proxy Task for Visual Understanding
We investigate and improve self-supervision as a drop-in replacement for
ImageNet pretraining, focusing on automatic colorization as the proxy task.
Self-supervised training has been shown to be more promising for utilizing
unlabeled data than other, traditional unsupervised learning methods. We build
on this success and evaluate the ability of our self-supervised network in
several contexts. On VOC segmentation and classification tasks, we present
results that are state-of-the-art among methods not using ImageNet labels for
pretraining representations.
Moreover, we present the first in-depth analysis of self-supervision via
colorization, concluding that formulation of the loss, training details and
network architecture play important roles in its effectiveness. This
investigation is further expanded by revisiting the ImageNet pretraining
paradigm, asking questions such as: How much training data is needed? How many
labels are needed? How much do features change when fine-tuned? We relate these
questions back to self-supervision by showing that colorization provides a
similarly powerful supervisory signal as various flavors of ImageNet
pretraining.Comment: CVPR 2017 (Project page:
http://people.cs.uchicago.edu/~larsson/color-proxy/
A survey of comics research in computer science
Graphical novels such as comics and mangas are well known all over the world.
The digital transition started to change the way people are reading comics,
more and more on smartphones and tablets and less and less on paper. In the
recent years, a wide variety of research about comics has been proposed and
might change the way comics are created, distributed and read in future years.
Early work focuses on low level document image analysis: indeed comic books are
complex, they contains text, drawings, balloon, panels, onomatopoeia, etc.
Different fields of computer science covered research about user interaction
and content generation such as multimedia, artificial intelligence,
human-computer interaction, etc. with different sets of values. We propose in
this paper to review the previous research about comics in computer science, to
state what have been done and to give some insights about the main outlooks
Unsupervised Diverse Colorization via Generative Adversarial Networks
Colorization of grayscale images has been a hot topic in computer vision.
Previous research mainly focuses on producing a colored image to match the
original one. However, since many colors share the same gray value, an input
grayscale image could be diversely colored while maintaining its reality. In
this paper, we design a novel solution for unsupervised diverse colorization.
Specifically, we leverage conditional generative adversarial networks to model
the distribution of real-world item colors, in which we develop a fully
convolutional generator with multi-layer noise to enhance diversity, with
multi-layer condition concatenation to maintain reality, and with stride 1 to
keep spatial information. With such a novel network architecture, the model
yields highly competitive performance on the open LSUN bedroom dataset. The
Turing test of 80 humans further indicates our generated color schemes are
highly convincible
Language-Based Image Editing with Recurrent Attentive Models
We investigate the problem of Language-Based Image Editing (LBIE). Given a
source image and a natural language description, we want to generate a target
image by editing the source image based on the description. We propose a
generic modeling framework for two sub-tasks of LBIE: language-based image
segmentation and image colorization. The framework uses recurrent attentive
models to fuse image and language features. Instead of using a fixed step size,
we introduce for each region of the image a termination gate to dynamically
determine after each inference step whether to continue extrapolating
additional information from the textual description. The effectiveness of the
framework is validated on three datasets. First, we introduce a synthetic
dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE
system. Second, we show that the framework leads to state-of-the-art
performance on image segmentation on the ReferIt dataset. Third, we present the
first language-based colorization result on the Oxford-102 Flowers dataset.Comment: Accepted to CVPR 2018 as a Spotligh
- …