141 research outputs found
From Design Draft to Real Attire: Unaligned Fashion Image Translation
Fashion manipulation has attracted growing interest due to its great
application value, which inspires many researches towards fashion images.
However, little attention has been paid to fashion design draft. In this paper,
we study a new unaligned translation problem between design drafts and real
fashion items, whose main challenge lies in the huge misalignment between the
two modalities. We first collect paired design drafts and real fashion item
images without pixel-wise alignment. To solve the misalignment problem, our
main idea is to train a sampling network to adaptively adjust the input to an
intermediate state with structure alignment to the output. Moreover, built upon
the sampling network, we present design draft to real fashion item translation
network (D2RNet), where two separate translation streams that focus on texture
and shape, respectively, are combined tactfully to get both benefits. D2RNet is
able to generate realistic garments with both texture and shape consistency to
their design drafts. We show that this idea can be effectively applied to the
reverse translation problem and present R2DNet accordingly. Extensive
experiments on unaligned fashion design translation demonstrate the superiority
of our method over state-of-the-art methods. Our project website is available
at: https://victoriahy.github.io/MM2020/ .Comment: Accepted by ACMMM 2020. Our project website is available at:
https://victoriahy.github.io/MM2020
Dual Attention GANs for Semantic Image Synthesis
In this paper, we focus on the semantic image synthesis task that aims at
transferring semantic label maps to photo-realistic images. Existing methods
lack effective semantic constraints to preserve the semantic information and
ignore the structural correlations in both spatial and channel dimensions,
leading to unsatisfactory blurry and artifact-prone results. To address these
limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize
photo-realistic and semantically-consistent images with fine details from the
input layouts without imposing extra training overhead or modifying the network
architectures of existing methods. We also propose two novel modules, i.e.,
position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention
Module (CAM), to capture semantic structure attention in spatial and channel
dimensions, respectively. Specifically, SAM selectively correlates the pixels
at each position by a spatial attention map, leading to pixels with the same
semantic label being related to each other regardless of their spatial
distances. Meanwhile, CAM selectively emphasizes the scale-wise features at
each channel by a channel attention map, which integrates associated features
among all channel maps regardless of their scales. We finally sum the outputs
of SAM and CAM to further improve feature representation. Extensive experiments
on four challenging datasets show that DAGAN achieves remarkably better results
than state-of-the-art methods, while using fewer model parameters. The source
code and trained models are available at https://github.com/Ha0Tang/DAGAN.Comment: Accepted to ACM MM 2020, camera ready (9 pages) + supplementary (10
pages
Modeling and Mapping Location-Dependent Human Appearance
Human appearance is highly variable and depends on individual preferences, such as fashion, facial expression, and makeup. These preferences depend on many factors including a person\u27s sense of style, what they are doing, and the weather. These factors, in turn, are dependent upon geographic location and time. In our work, we build computational models to learn the relationship between human appearance, geographic location, and time. The primary contributions are a framework for collecting and processing geotagged imagery of people, a large dataset collected by our framework, and several generative and discriminative models that use our dataset to learn the relationship between human appearance, location, and time. Additionally, we build interactive maps that allow for inspection and demonstration of what our models have learned
- …