9,245 research outputs found
MsCGAN: Multi-scale Conditional Generative Adversarial Networks for Person Image Generation
To synthesize high-quality person images with arbitrary poses is challenging.
In this paper, we propose a novel Multi-scale Conditional Generative
Adversarial Networks (MsCGAN), aiming to convert the input conditional person
image to a synthetic image of any given target pose, whose appearance and the
texture are consistent with the input image. MsCGAN is a multi-scale
adversarial network consisting of two generators and two discriminators. One
generator transforms the conditional person image into a coarse image of the
target pose globally, and the other is to enhance the detailed quality of the
synthetic person image through a local reinforcement network. The outputs of
the two generators are then merged into a synthetic, discriminant and
high-resolution image. On the other hand, the synthetic image is downsampled to
multiple resolutions as the input to multi-scale discriminator networks. The
proposed multi-scale generators and discriminators handling different levels of
visual features can benefit to synthesizing high-resolution person images with
realistic appearance and texture. Experiments are conducted on the Market-1501
and DeepFashion datasets to evaluate the proposed model, and both qualitative
and quantitative results demonstrate the superior performance of the proposed
MsCGAN
Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis
Despite remarkable advances in image synthesis research, existing works often
fail in manipulating images under the context of large geometric
transformations. Synthesizing person images conditioned on arbitrary poses is
one of the most representative examples where the generation quality largely
relies on the capability of identifying and modeling arbitrary transformations
on different body parts. Current generative models are often built on local
convolutions and overlook the key challenges (e.g. heavy occlusions, different
views or dramatic appearance changes) when distinct geometric changes happen
for each part, caused by arbitrary pose manipulations. This paper aims to
resolve these challenges induced by geometric variability and spatial
displacements via a new Soft-Gated Warping Generative Adversarial Network
(Warping-GAN), which is composed of two stages: 1) it first synthesizes a
target part segmentation map given a target pose, which depicts the
region-level spatial layouts for guiding image synthesis with higher-level
structure constraints; 2) the Warping-GAN equipped with a soft-gated
warping-block learns feature-level mapping to render textures from the original
image into the generated segmentation map. Warping-GAN is capable of
controlling different transformation degrees given distinct target poses.
Moreover, the proposed warping-block is light-weight and flexible enough to be
injected into any networks. Human perceptual studies and quantitative
evaluations demonstrate the superiority of our Warping-GAN that significantly
outperforms all existing methods on two large datasets.Comment: 17 pages, 14 figure
A Variational U-Net for Conditional Appearance and Shape Generation
Deep generative models have demonstrated great performance in image
synthesis. However, results deteriorate in case of spatial deformations, since
they generate images of objects directly, rather than modeling the intricate
interplay of their inherent shape and appearance. We present a conditional
U-Net for shape-guided image generation, conditioned on the output of a
variational autoencoder for appearance. The approach is trained end-to-end on
images, without requiring samples of the same object with varying pose or
appearance. Experiments show that the model enables conditional image
generation and transfer. Therefore, either shape or appearance can be retained
from a query image, while freely altering the other. Moreover, appearance can
be sampled due to its stochastic latent representation, while preserving shape.
In quantitative and qualitative experiments on COCO, DeepFashion, shoes,
Market-1501 and handbags, the approach demonstrates significant improvements
over the state-of-the-art.Comment: CVPR 2018 (Spotlight). Project Page at
https://compvis.github.io/vunet
Unsupervised Person Image Generation with Semantic Parsing Transformation
In this paper, we address unsupervised pose-guided person image generation,
which is known challenging due to non-rigid deformation. Unlike previous
methods learning a rock-hard direct mapping between human bodies, we propose a
new pathway to decompose the hard mapping into two more accessible subtasks,
namely, semantic parsing transformation and appearance generation. Firstly, a
semantic generative network is proposed to transform between semantic parsing
maps, in order to simplify the non-rigid deformation learning. Secondly, an
appearance generative network learns to synthesize semantic-aware textures.
Thirdly, we demonstrate that training our framework in an end-to-end manner
further refines the semantic maps and final results accordingly. Our method is
generalizable to other semantic-aware person image generation tasks, eg,
clothing texture transfer and controlled image manipulation. Experimental
results demonstrate the superiority of our method on DeepFashion and
Market-1501 datasets, especially in keeping the clothing attributes and better
body shapes.Comment: Accepted to CVPR 2019 (Oral). Our project is available at
https://github.com/SijieSong/person_generation_sp
A Deep One-Shot Network for Query-based Logo Retrieval
Logo detection in real-world scene images is an important problem with
applications in advertisement and marketing. Existing general-purpose object
detection methods require large training data with annotations for every logo
class. These methods do not satisfy the incremental demand of logo classes
necessary for practical deployment since it is practically impossible to have
such annotated data for new unseen logo. In this work, we develop an
easy-to-implement query-based logo detection and localization system by
employing a one-shot learning technique. Given an image of a query logo, our
model searches for it within a given target image and predicts the possible
location of the logo by estimating a binary segmentation mask. The proposed
model consists of a conditional branch and a segmentation branch. The former
gives a conditional latent representation of the given query logo which is
combined with feature maps of the segmentation branch at multiple scales in
order to find the matching position of the query logo in a target image, should
it be present. Feature matching between the latent query representation and
multi-scale feature maps of segmentation branch using simple concatenation
operation followed by 1x1 convolution layer makes our model scale-invariant.
Despite its simplicity, our query-based logo retrieval framework achieved
superior performance in FlickrLogos-32 and TopLogos-10 dataset over different
existing baselines.Comment: Accepted in Pattern Recognition, Elsevier(2019
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
Abbreviation is a common phenomenon across languages, especially in Chinese.
In most cases, if an expression can be abbreviated, its abbreviation is used
more often than its fully expanded forms, since people tend to convey
information in a most concise way. For various language processing tasks,
abbreviation is an obstacle to improving the performance, as the textual form
of an abbreviation does not express useful information, unless it's expanded to
the full form. Abbreviation prediction means associating the fully expanded
forms with their abbreviations. However, due to the deficiency in the
abbreviation corpora, such a task is limited in current studies, especially
considering general abbreviation prediction should also include those full form
expressions that do not have valid abbreviations, namely the negative full
forms (NFFs). Corpora incorporating negative full forms for general
abbreviation prediction are few in number. In order to promote the research in
this area, we build a dataset for general Chinese abbreviation prediction,
which needs a few preprocessing steps, and evaluate several different models on
the built dataset. The dataset is available at
https://github.com/lancopku/Chinese-abbreviation-datase
Texture and Structure Incorporated ScatterNet Hybrid Deep Learning Network (TS-SHDL) For Brain Matter Segmentation
Automation of brain matter segmentation from MR images is a challenging task
due to the irregular boundaries between the grey and white matter regions. In
addition, the presence of intensity inhomogeneity in the MR images further
complicates the problem. In this paper, we propose a texture and vesselness
incorporated version of the ScatterNet Hybrid Deep Learning Network (TS-SHDL)
that extracts hierarchical invariant mid-level features, used by fisher vector
encoding and a conditional random field (CRF) to perform the desired
segmentation. The performance of the proposed network is evaluated by extensive
experimentation and comparison with the state-of-the-art methods on several 2D
MRI scans taken from the synthetic McGill Brain Web as well as on the MRBrainS
dataset of real 3D MRI scans. The advantages of the TS-SHDL network over
supervised deep learning networks is also presented in addition to its superior
performance over the state-of-the-art.Comment: To Appear in the IEEE International Conference on Computer Vision
Workshops (ICCVW) 201
Time-series modeling with undecimated fully convolutional neural networks
We present a new convolutional neural network-based time-series model.
Typical convolutional neural network (CNN) architectures rely on the use of
max-pooling operators in between layers, which leads to reduced resolution at
the top layers. Instead, in this work we consider a fully convolutional network
(FCN) architecture that uses causal filtering operations, and allows for the
rate of the output signal to be the same as that of the input signal. We
furthermore propose an undecimated version of the FCN, which we refer to as the
undecimated fully convolutional neural network (UFCNN), and is motivated by the
undecimated wavelet transform. Our experimental results verify that using the
undecimated version of the FCN is necessary in order to allow for effective
time-series modeling. The UFCNN has several advantages compared to other
time-series models such as the recurrent neural network (RNN) and long
short-term memory (LSTM), since it does not suffer from either the vanishing or
exploding gradients problems, and is therefore easier to train. Convolution
operations can also be implemented more efficiently compared to the recursion
that is involved in RNN-based models. We evaluate the performance of our model
in a synthetic target tracking task using bearing only measurements generated
from a state-space model, a probabilistic modeling of polyphonic music
sequences problem, and a high frequency trading task using a time-series of
ask/bid quotes and their corresponding volumes. Our experimental results using
synthetic and real datasets verify the significant advantages of the UFCNN
compared to the RNN and LSTM baselines
Painting on Placement: Forecasting Routing Congestion using Conditional Generative Adversarial Nets
Physical design process commonly consumes hours to days for large designs,
and routing is known as the most critical step. Demands for accurate routing
quality prediction raise to a new level to accelerate hardware innovation with
advanced technology nodes. This work presents an approach that forecasts the
density of all routing channels over the entire floorplan, with features
collected up to placement, using conditional GANs. Specifically, forecasting
the routing congestion is constructed as an image translation (colorization)
problem. The proposed approach is applied to a) placement exploration for
minimum congestion, b) constrained placement exploration and c) forecasting
congestion in real-time during incremental placement, using eight designs
targeting a fixed FPGA architecture.Comment: 6 pages, 9 figures, to appear at DAC'1
High-Quality Correspondence and Segmentation Estimation for Dual-Lens Smart-Phone Portraits
Estimating correspondence between two images and extracting the foreground
object are two challenges in computer vision. With dual-lens smart phones, such
as iPhone 7Plus and Huawei P9, coming into the market, two images of slightly
different views provide us new information to unify the two topics. We propose
a joint method to tackle them simultaneously via a joint fully connected
conditional random field (CRF) framework. The regional correspondence is used
to handle textureless regions in matching and make our CRF system
computationally efficient. Our method is evaluated over 2,000 new image pairs,
and produces promising results on challenging portrait images
- …