10 research outputs found
StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
Our paper seeks to transfer the hairstyle of a reference image to an input
photo for virtual hair try-on. We target a variety of challenges scenarios,
such as transforming a long hairstyle with bangs to a pixie cut, which requires
removing the existing hair and inferring how the forehead would look, or
transferring partially visible hair from a hat-wearing person in a different
pose. Past solutions leverage StyleGAN for hallucinating any missing parts and
producing a seamless face-hair composite through so-called GAN inversion or
projection. However, there remains a challenge in controlling the
hallucinations to accurately transfer hairstyle and preserve the face shape and
identity of the input. To overcome this, we propose a multi-view optimization
framework that uses "two different views" of reference composites to
semantically guide occluded or ambiguous regions. Our optimization shares
information between two poses, which allows us to produce high fidelity and
realistic results from incomplete references. Our framework produces
high-quality results and outperforms prior work in a user study that consists
of significantly more challenging hair transfer scenarios than previously
studied. Project page: https://stylegan-salon.github.io/.Comment: Accepted to CVPR202
Informative Features for Model Comparison
Given two candidate models, and a set of target observations, we address the
problem of measuring the relative goodness of fit of the two models. We propose
two new statistical tests which are nonparametric, computationally efficient
(runtime complexity is linear in the sample size), and interpretable. As a
unique advantage, our tests can produce a set of examples (informative
features) indicating the regions in the data domain where one model fits
significantly better than the other. In a real-world problem of comparing GAN
models, the test power of our new test matches that of the state-of-the-art
test of relative goodness of fit, while being one order of magnitude faster.Comment: Accepted to NIPS 201
TextureGAN: Controlling Deep Image Synthesis with Texture Patches
In this paper, we investigate deep image synthesis guided by sketch, color,
and texture. Previous image synthesis methods can be controlled by sketch and
color strokes but we are the first to examine texture control. We allow a user
to place a texture patch on a sketch at arbitrary locations and scales to
control the desired output texture. Our generative network learns to synthesize
objects consistent with these texture suggestions. To achieve this, we develop
a local texture loss in addition to adversarial and content loss to train the
generative network. We conduct experiments using sketches generated from real
images and textures sampled from a separate texture database and results show
that our proposed algorithm is able to generate plausible images that are
faithful to user controls. Ablation studies show that our proposed pipeline can
generate more realistic images than adapting existing methods directly.Comment: CVPR 2018 spotligh
Kernel Mean Matching for Content Addressability of GANs
We propose a novel procedure which adds "content-addressability" to any given
unconditional implicit model e.g., a generative adversarial network (GAN). The
procedure allows users to control the generative process by specifying a set
(arbitrary size) of desired examples based on which similar samples are
generated from the model. The proposed approach, based on kernel mean matching,
is applicable to any generative models which transform latent vectors to
samples, and does not require retraining of the model. Experiments on various
high-dimensional image generation problems (CelebA-HQ, LSUN bedroom, bridge,
tower) show that our approach is able to generate images which are consistent
with the input set, while retaining the image quality of the original model. To
our knowledge, this is the first work that attempts to construct, at test time,
a content-addressable generative model from a trained marginal model.Comment: Wittawat Jitkrittum and Patsorn Sangkloy contributed equally to this
wor
Generating Images Instead of Retrieving Them : Relevance Feedback on Generative Adversarial Networks
Finding images matching a user’s intention has been largely basedon matching a representation of the user’s information needs withan existing collection of images. For example, using an exampleimage or a written query to express the information need and re-trieving images that share similarities with the query or exampleimage. However, such an approach is limited to retrieving onlyimages that already exist in the underlying collection. Here, wepresent a methodology for generating images matching the userintention instead of retrieving them. The methodology utilizes arelevance feedback loop between a user and generative adversarialneural networks (GANs). GANs can generate novel photorealisticimages which are initially not present in the underlying collection,but generated in response to user feedback. We report experiments(N=29) where participants generate images using four differentdomains and various search goals with textual and image targets.The results show that the generated images match the tasks andoutperform images selected as baselines from a fixed image col-lection. Our results demonstrate that generating new informationcan be more useful for users than retrieving it from a collection ofexisting information.Peer reviewe
Argoverse: 3D Tracking and Forecasting with Rich Maps
We present Argoverse -- two datasets designed to support autonomous vehicle
machine learning tasks such as 3D tracking and motion forecasting. Argoverse
was collected by a fleet of autonomous vehicles in Pittsburgh and Miami. The
Argoverse 3D Tracking dataset includes 360 degree images from 7 cameras with
overlapping fields of view, 3D point clouds from long range LiDAR, 6-DOF pose,
and 3D track annotations. Notably, it is the only modern AV dataset that
provides forward-facing stereo imagery. The Argoverse Motion Forecasting
dataset includes more than 300,000 5-second tracked scenarios with a particular
vehicle identified for trajectory forecasting. Argoverse is the first
autonomous vehicle dataset to include "HD maps" with 290 km of mapped lanes
with geometric and semantic metadata. All data is released under a Creative
Commons license at www.argoverse.org. In our baseline experiments, we
illustrate how detailed map information such as lane direction, driveable area,
and ground height improves the accuracy of 3D object tracking and motion
forecasting. Our tracking and forecasting experiments represent only an initial
exploration of the use of rich maps in robotic perception. We hope that
Argoverse will enable the research community to explore these problems in
greater depth.Comment: CVPR 201
CONTROLLABLE CONTENT BASED IMAGE SYNTHESIS AND IMAGE RETRIEVAL
In this thesis, we address the problem of returning target images that match user queries in image retrieval and image synthesis. We investigate line drawing sketch as the main query, and explore several additional signals from the users that can helps clarify the type of images they are looking for. These additional queries may be expressed in one of the following two convenient forms:
1. visual content (sketch, scribble, texture patch);
2. language content.
For image retrieval, we first look at the problem of sketch based image retrieval. We construct cross-domain networks that embed a user query and a target image into a shared feature space. We collected Sketchy Database; a large-scale dataset of matching sketch and image pairs that can be used as training data. The dataset has been made publicly available, and has become one of the few standard benchmarks for sketch-based image retrieval. To incorporate both sketch and language content as a queries, we propose a late-fusion
dual-encoder approach, similar to CLIP; a recent successful work on vision and language
representation learning. We also collected the dataset of 5,000 hand drawn sketch, which
can be combined with existing COCO caption annotation to evaluate the task of image
retrieval with sketch and language.
For image synthesis, we present a general framework that allows users to interactively control the generated images based on specification of visual features (e.g., shape, color, texture).Ph.D
PaintsTorch: a User-Guided Anime Line Art Colorization Tool with Double Generator Conditional Adversarial Network
International audienceThe lack of information provided by line arts makes user guidedcolorization a challenging task for computer vision. Recent contributions from the deep learning community based on Generative Adversarial Network (GAN) have shown incredible results compared to previous techniques. These methods employ user input color hints as a way to condition the network. The current state of the art has shown the ability to generalize and generate realistic and precise colorization by introducing a custom dataset and a new model with its training pipeline. Nevertheless, their approach relies on randomly sampled pixels as color hints for training. Thus, in this contribution, we introduce a stroke simulation based approach for hint generation, making the model more robust to messy inputs. We also propose a new cleaner dataset, and explore the use of a double generator GAN to improve visual fidelity
Computer Vision for Supporting Fashion Creative Processes
Computer vision techniques are powerful tools to support and enhance creative workflows in fashion industries. In many cases, designer sketches and drawings, made with pen or pencil on raw paper, are the starting point of a fashion workflow. Then, such hand-drawn sketches must be imported into a software, to convert the prototype into a real-world product. This leads to a first important problem,
namely, the automatic vectorization of sketches. Moreover, the various outcomes of all creative processes consist of a large number of images, which depict a plethora of products, from clothing to footwear. Recognizing product characteristics and classifying them properly is crucial in order to avoid duplicates and support marketing campaigns. Each feature could eventually require a different method, spacing from segmentation, image retrieval, to machine learning techniques, such as deep learning. Some state-of-the-art techniques and a novel proposal for line extraction and thinning, applied to fashion sketches, are described. Newly-developed methods are presented and their effectiveness in the recognition of features is discussed