511,199 research outputs found
Style Separation and Synthesis via Generative Adversarial Networks
Style synthesis attracts great interests recently, while few works focus on
its dual problem "style separation". In this paper, we propose the Style
Separation and Synthesis Generative Adversarial Network (S3-GAN) to
simultaneously implement style separation and style synthesis on object
photographs of specific categories. Based on the assumption that the object
photographs lie on a manifold, and the contents and styles are independent, we
employ S3-GAN to build mappings between the manifold and a latent vector space
for separating and synthesizing the contents and styles. The S3-GAN consists of
an encoder network, a generator network, and an adversarial network. The
encoder network performs style separation by mapping an object photograph to a
latent vector. Two halves of the latent vector represent the content and style,
respectively. The generator network performs style synthesis by taking a
concatenated vector as input. The concatenated vector contains the style half
vector of the style target image and the content half vector of the content
target image. Once obtaining the images from the generator network, an
adversarial network is imposed to generate more photo-realistic images.
Experiments on CelebA and UT Zappos 50K datasets demonstrate that the S3-GAN
has the capacity of style separation and synthesis simultaneously, and could
capture various styles in a single model
Laplacian-Steered Neural Style Transfer
Neural Style Transfer based on Convolutional Neural Networks (CNN) aims to
synthesize a new image that retains the high-level structure of a content
image, rendered in the low-level texture of a style image. This is achieved by
constraining the new image to have high-level CNN features similar to the
content image, and lower-level CNN features similar to the style image. However
in the traditional optimization objective, low-level features of the content
image are absent, and the low-level features of the style image dominate the
low-level detail structures of the new image. Hence in the synthesized image,
many details of the content image are lost, and a lot of inconsistent and
unpleasing artifacts appear. As a remedy, we propose to steer image synthesis
with a novel loss function: the Laplacian loss. The Laplacian matrix
("Laplacian" in short), produced by a Laplacian operator, is widely used in
computer vision to detect edges and contours. The Laplacian loss measures the
difference of the Laplacians, and correspondingly the difference of the detail
structures, between the content image and a new image. It is flexible and
compatible with the traditional style transfer constraints. By incorporating
the Laplacian loss, we obtain a new optimization objective for neural style
transfer named Lapstyle. Minimizing this objective will produce a stylized
image that better preserves the detail structures of the content image and
eliminates the artifacts. Experiments show that Lapstyle produces more
appealing stylized images with less artifacts, without compromising their
"stylishness".Comment: Accepted by the ACM Multimedia Conference (MM) 2017. 9 pages, 65
figure
PARASOL: Parametric Style Control for Diffusion Image Synthesis
We propose PARASOL, a multi-modal synthesis model that enables disentangled,
parametric control of the visual style of the image by jointly conditioning
synthesis on both content and a fine-grained visual style embedding. We train a
latent diffusion model (LDM) using specific losses for each modality and adapt
the classifier-free guidance for encouraging disentangled control over
independent content and style modalities at inference time. We leverage
auxiliary semantic and style-based search to create training triplets for
supervision of the LDM, ensuring complementarity of content and style cues.
PARASOL shows promise for enabling nuanced control over visual style in
diffusion models for image creation and stylization, as well as generative
search where text-based search results may be adapted to more closely match
user intent by interpolating both content and style descriptors.Comment: Added Appendi
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models
Text-to-Image synthesis is the task of generating an image according to a
specific text description. Generative Adversarial Networks have been considered
the standard method for image synthesis virtually since their introduction;
today, Denoising Diffusion Probabilistic Models are recently setting a new
baseline, with remarkable results in Text-to-Image synthesis, among other
fields. Aside its usefulness per se, it can also be particularly relevant as a
tool for data augmentation to aid training models for other document image
processing tasks. In this work, we present a latent diffusion-based method for
styled text-to-text-content-image generation on word-level. Our proposed method
manages to generate realistic word image samples from different writer styles,
by using class index styles and text content prompts without the need of
adversarial training, writer recognition, or text recognition. We gauge system
performance with Frechet Inception Distance, writer recognition accuracy, and
writer retrieval. We show that the proposed model produces samples that are
aesthetically pleasing, help boosting text recognition performance, and gets
similar writer retrieval score as real data
Content-Based Image Retrieval of Skin Lesions by Evolutionary Feature Synthesis
Abstract. This paper gives an example of evolved features that improve image retrieval performance. A content-based image retrieval system for skin lesion images is presented. The aim is to support decision making by retrieving and displaying relevant past cases visually similar to the one under examination. Skin lesions of five common classes, including two non-melanoma cancer types, are used. Colour and texture features are extracted from lesions. Evolutionary algorithms are used to create composite features that optimise a similarity matching function. Experiments on our database of 533 images are performed and results are compared to those obtained using simple features. The use of the evolved composite features improves the precision by about 7%.
- …