15,645 research outputs found
Multiple Texture Boltzmann Machines
We assess the generative power of the mPoTmodel of [10] with tiled-convolutional weight sharing as a model for visual textures by specifically training on this task, evaluating model performance on texture synthesis and inpainting tasks using quantitative metrics. We also analyze the relative importance of the mean and covariance parts of the mPoT model by comparing its performance to those of its subcomponents, tiled-convolutional versions of the PoT/FoE and Gaussian-Bernoulli restricted Boltzmann machine (GB-RBM). Our results suggest that while state-of-the-art or better performance can be achieved using the mPoT, similar performance can be achieved with the mean-only model. We then develop a model for multiple textures based on the GB-RBM, using a shared set of weights but texturespecific hidden unit biases. We show comparable performance of the multiple texture model to individually trained texture models.
Diversified Texture Synthesis with Feed-forward Networks
Recent progresses on deep discriminative and generative modeling have shown
promising results on texture synthesis. However, existing feed-forward based
methods trade off generality for efficiency, which suffer from many issues,
such as shortage of generality (i.e., build one network per texture), lack of
diversity (i.e., always produce visually identical output) and suboptimality
(i.e., generate less satisfying visual effects). In this work, we focus on
solving these issues for improved texture synthesis. We propose a deep
generative feed-forward network which enables efficient synthesis of multiple
textures within one single network and meaningful interpolation between them.
Meanwhile, a suite of important techniques are introduced to achieve better
convergence and diversity. With extensive experiments, we demonstrate the
effectiveness of the proposed model and techniques for synthesizing a large
number of textures and show its applications with the stylization.Comment: accepted by CVPR201
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
Ambient Sound Provides Supervision for Visual Learning
The sound of crashing waves, the roar of fast-moving cars -- sound conveys
important information about the objects in our surroundings. In this work, we
show that ambient sounds can be used as a supervisory signal for learning
visual models. To demonstrate this, we train a convolutional neural network to
predict a statistical summary of the sound associated with a video frame. We
show that, through this process, the network learns a representation that
conveys information about objects and scenes. We evaluate this representation
on several recognition tasks, finding that its performance is comparable to
that of other state-of-the-art unsupervised learning methods. Finally, we show
through visualizations that the network learns units that are selective to
objects that are often associated with characteristic sounds.Comment: ECCV 201
- …