134 research outputs found
Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
The success of deep learning in computer vision is rooted in the ability of
deep networks to scale up model complexity as demanded by challenging visual
tasks. As complexity is increased, so is the need for large amounts of labeled
data to train the model. This is associated with a costly human annotation
effort. To address this concern, with the long-term goal of leveraging the
abundance of cheap unlabeled data, we explore methods of unsupervised
"pre-training." In particular, we propose to use self-supervised automatic
image colorization.
We show that traditional methods for unsupervised learning, such as
layer-wise clustering or autoencoders, remain inferior to supervised
pre-training. In search for an alternative, we develop a fully automatic image
colorization method. Our method sets a new state-of-the-art in revitalizing old
black-and-white photography, without requiring human effort or expertise.
Additionally, it gives us a method for self-supervised representation learning.
In order for the model to appropriately re-color a grayscale object, it must
first be able to identify it. This ability, learned entirely self-supervised,
can be used to improve other visual tasks, such as classification and semantic
segmentation. As a future direction for self-supervision, we investigate if
multiple proxy tasks can be combined to improve generalization. This turns out
to be a challenging open problem. We hope that our contributions to this
endeavor will provide a foundation for future efforts in making
self-supervision compete with supervised pre-training.Comment: Ph.D. thesi
Deep Geometrized Cartoon Line Inbetweening
We aim to address a significant but understudied problem in the anime
industry, namely the inbetweening of cartoon line drawings. Inbetweening
involves generating intermediate frames between two black-and-white line
drawings and is a time-consuming and expensive process that can benefit from
automation. However, existing frame interpolation methods that rely on matching
and warping whole raster images are unsuitable for line inbetweening and often
produce blurring artifacts that damage the intricate line structures. To
preserve the precision and detail of the line drawings, we propose a new
approach, AnimeInbet, which geometrizes raster line drawings into graphs of
endpoints and reframes the inbetweening task as a graph fusion problem with
vertex repositioning. Our method can effectively capture the sparsity and
unique structure of line drawings while preserving the details during
inbetweening. This is made possible via our novel modules, i.e., vertex
geometric embedding, a vertex correspondence Transformer, an effective
mechanism for vertex repositioning and a visibility predictor. To train our
method, we introduce MixamoLine240, a new dataset of line drawings with ground
truth vectorization and matching labels. Our experiments demonstrate that
AnimeInbet synthesizes high-quality, clean, and complete intermediate line
drawings, outperforming existing methods quantitatively and qualitatively,
especially in cases with large motions. Data and code are available at
https://github.com/lisiyao21/AnimeInbet.Comment: ICCV 202
- …