5 research outputs found
A Novel BiLevel Paradigm for Image-to-Image Translation
Image-to-image (I2I) translation is a pixel-level mapping that requires a
large number of paired training data and often suffers from the problems of
high diversity and strong category bias in image scenes. In order to tackle
these problems, we propose a novel BiLevel (BiL) learning paradigm that
alternates the learning of two models, respectively at an instance-specific
(IS) and a general-purpose (GP) level. In each scene, the IS model learns to
maintain the specific scene attributes. It is initialized by the GP model that
learns from all the scenes to obtain the generalizable translation knowledge.
This GP initialization gives the IS model an efficient starting point, thus
enabling its fast adaptation to the new scene with scarce training data. We
conduct extensive I2I translation experiments on human face and street view
datasets. Quantitative results validate that our approach can significantly
boost the performance of classical I2I translation models, such as PG2 and
Pix2Pix. Our visualization results show both higher image quality and more
appropriate instance-specific details, e.g., the translated image of a person
looks more like that person in terms of identity
VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs
Training robust supervised deep learning models for many geospatial
applications of computer vision is difficult due to dearth of class-balanced
and diverse training data. Conversely, obtaining enough training data for many
applications is financially prohibitive or may be infeasible, especially when
the application involves modeling rare or extreme events. Synthetically
generating data (and labels) using a generative model that can sample from a
target distribution and exploit the multi-scale nature of images can be an
inexpensive solution to address scarcity of labeled data. Towards this goal, we
present a deep conditional generative model, called VAE-Info-cGAN, that
combines a Variational Autoencoder (VAE) with a conditional Information
Maximizing Generative Adversarial Network (InfoGAN), for synthesizing
semantically rich images simultaneously conditioned on a pixel-level condition
(PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC
can only vary in the channel dimension from the synthesized image and is meant
to be a task-specific input. The FLC is modeled as an attribute vector in the
latent space of the generated image which controls the contributions of various
characteristic attributes germane to the target distribution. An interpretation
of the attribute vector to systematically generate synthetic images by varying
a chosen binary macroscopic feature is explored. Experiments on a GPS
trajectories dataset show that the proposed model can accurately generate
various forms of spatio-temporal aggregates across different geographic
locations while conditioned only on a raster representation of the road
network. The primary intended application of the VAE-Info-cGAN is synthetic
data (and label) generation for targeted data augmentation for computer
vision-based modeling of problems relevant to geospatial analysis and remote
sensing.Comment: 10 pages, 4 figures, Peer-reviewed and accepted version of the paper
published at the 13th ACM SIGSPATIAL International Workshop on Computational
Transportation Science (IWCTS 2020
A Novel BiLevel Paradigm for Image-to-Image Translation
Image-to-image (I2I) translation is a pixel-level mapping that requires a large number of paired training data and often suffers from the problems of high diversity and strong category bias in image scenes. In order to tackle these problems, we propose a novel BiLevel (BiL) learning paradigm that alternates the learning of two models, respectively at an instance-specific (IS) and a general-purpose (GP) level. In each scene, the IS model learns to maintain the specific scene attributes. It is initialized by the GP model that learns from all the scenes to obtain the generalizable translation knowledge. This GP initialization gives the IS model an efficient starting point, thus enabling its fast adaptation to the new scene with scarce training data. We conduct extensive I2I translation experiments on human face and street view datasets. Quantitative results validate that our approach can significantly boost the performance of classical I2I translation models, such as PG2 and Pix2Pix. Our visualization results show both higher image quality and more appropriate instance-specific details, e.g., the translated image of a person looks more like that person in terms of identity