7,915 research outputs found
Learning Dilation Factors for Semantic Segmentation of Street Scenes
Contextual information is crucial for semantic segmentation. However, finding
the optimal trade-off between keeping desired fine details and at the same time
providing sufficiently large receptive fields is non trivial. This is even more
so, when objects or classes present in an image significantly vary in size.
Dilated convolutions have proven valuable for semantic segmentation, because
they allow to increase the size of the receptive field without sacrificing
image resolution. However, in current state-of-the-art methods, dilation
parameters are hand-tuned and fixed. In this paper, we present an approach for
learning dilation parameters adaptively per channel, consistently improving
semantic segmentation results on street-scene datasets like Cityscapes and
Camvid.Comment: GCPR201
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
Dual-Domain Image Synthesis using Segmentation-Guided GAN
We introduce a segmentation-guided approach to synthesise images that
integrate features from two distinct domains. Images synthesised by our
dual-domain model belong to one domain within the semantic mask, and to another
in the rest of the image - smoothly integrated. We build on the successes of
few-shot StyleGAN and single-shot semantic segmentation to minimise the amount
of training required in utilising two domains. The method combines a few-shot
cross-domain StyleGAN with a latent optimiser to achieve images containing
features of two distinct domains. We use a segmentation-guided perceptual loss,
which compares both pixel-level and activations between domain-specific and
dual-domain synthetic images. Results demonstrate qualitatively and
quantitatively that our model is capable of synthesising dual-domain images on
a variety of objects (faces, horses, cats, cars), domains (natural, caricature,
sketches) and part-based masks (eyes, nose, mouth, hair, car bonnet). The code
is publicly available at:
https://github.com/denabazazian/Dual-Domain-Synthesis.Comment: CVPR2022 Workshops. 14 pages, 19 figure
Learning Object Categories From Internet Image Searches
In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models “on-the-fly.” We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets
- …