3,682 research outputs found
Weakly Supervised Learning for Multi-Image Synthesis
Machine learning-based approaches have been achieving state-of-the-art results on many computer vision tasks. While deep learning and convolutional networks have been incredibly popular, these approaches come at the expense of huge amounts of labeled data required for training. Manually annotating large amounts of data, often millions of images in a single dataset, is costly and time consuming. To deal with the problem of data annotation, the research community has been exploring approaches that require less amount of labelled data.
The central problem that we consider in this research is image synthesis without any manual labeling. Image synthesis is a classic computer vision task that requires understanding of image contents and their semantic and geometric properties. We propose that we can train image synthesis models by relying on sequences of videos and using weakly supervised learning. Large amounts of unlabeled data are freely available on the internet. We propose to set up the training in a multi-image setting so that we can use one of the images as the target - this allows us to rely only on images for training and removes the need for manual annotations. We demonstrate three main contributions in this work.
First, we present a method of fusing multiple noisy overhead images to make a single, artifact-free image. We present a weakly supervised method that relies on crowd-sourced labels from online maps and a completely unsupervised variant that only requires a series of satellite images as inputs. Second, we propose a single-image novel view synthesis method for complex, outdoor scenes. We propose a learning-based method that uses pairs of nearby images captured on urban roads and their respective GPS coordinates as supervision. We show that a model trained with this automatically captured data can render a new view of a scene that can be as far as 10 meters from the input image. Third, we consider the problem of synthesizing new images of a scene under different conditions, such as time of day and season, based on a single input image. As opposed to existing methods, we do not need manual annotations for transient attributes, such as fog or snow, for training. We train our model by using streams of images captured from outdoor webcams and time-lapse videos.
Through these applications, we show several settings where we can train state-of-the-art deep learning methods without manual annotations. This work focuses on three image synthesis tasks. We propose weakly supervised learning and remove requirements for manual annotations by relying on sequences of images. Our approach is in line with the research efforts that aim to minimize the labels required for training machine learning methods
Self-supervised Outdoor Scene Relighting
Outdoor scene relighting is a challenging problem that requires good
understanding of the scene geometry, illumination and albedo. Current
techniques are completely supervised, requiring high quality synthetic
renderings to train a solution. Such renderings are synthesized using priors
learned from limited data. In contrast, we propose a self-supervised approach
for relighting. Our approach is trained only on corpora of images collected
from the internet without any user-supervision. This virtually endless source
of training data allows training a general relighting solution. Our approach
first decomposes an image into its albedo, geometry and illumination. A novel
relighting is then produced by modifying the illumination parameters. Our
solution capture shadow using a dedicated shadow prediction map, and does not
rely on accurate geometry estimation. We evaluate our technique subjectively
and objectively using a new dataset with ground-truth relighting. Results show
the ability of our technique to produce photo-realistic and physically
plausible results, that generalizes to unseen scenes.Comment: Published in ECCV '20,
http://gvv.mpi-inf.mpg.de/projects/SelfRelight
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
Time-of-Day Neural Style Transfer for Architectural Photographs
Architectural photography is a genre of photography that focuses on capturing
a building or structure in the foreground with dramatic lighting in the
background. Inspired by recent successes in image-to-image translation methods,
we aim to perform style transfer for architectural photographs. However, the
special composition in architectural photography poses great challenges for
style transfer in this type of photographs. Existing neural style transfer
methods treat the architectural images as a single entity, which would generate
mismatched chrominance and destroy geometric features of the original
architecture, yielding unrealistic lighting, wrong color rendition, and visual
artifacts such as ghosting, appearance distortion, or color mismatching. In
this paper, we specialize a neural style transfer method for architectural
photography. Our method addresses the composition of the foreground and
background in an architectural photograph in a two-branch neural network that
separately considers the style transfer of the foreground and the background,
respectively. Our method comprises a segmentation module, a learning-based
image-to-image translation module, and an image blending optimization module.
We trained our image-to-image translation neural network with a new dataset of
unconstrained outdoor architectural photographs captured at different magic
times of a day, utilizing additional semantic information for better
chrominance matching and geometry preservation. Our experiments show that our
method can produce photorealistic lighting and color rendition on both the
foreground and background, and outperforms general image-to-image translation
and arbitrary style transfer baselines quantitatively and qualitatively. Our
code and data are available at
https://github.com/hkust-vgd/architectural_style_transfer.Comment: Updated version with corrected equations. Paper published at the
International Conference on Computational Photography (ICCP) 2022. 12 pages
of content with 6 pages of supplementary material
Data-driven hallucination of different times of day from a single outdoor photo
We introduce "time hallucination": synthesizing a plausible image at a different time of day from an input image. This challenging task often requires dramatically altering the color appearance of the picture. In this paper, we introduce the first data-driven approach to automatically creating a plausible-looking photo that appears as though it were taken at a different time of day. The time of day is specified by a semantic time label, such as "night".
Our approach relies on a database of time-lapse videos of various scenes. These videos provide rich information about the variations in color appearance of a scene throughout the day. Our method transfers the color appearance from videos with a similar scene as the input photo. We propose a locally affine model learned from the video for the transfer, allowing our model to synthesize new color data while retaining image details. We show that this model can hallucinate a wide range of different times of day. The model generates a large sparse linear system, which can be solved by off-the-shelf solvers. We validate our methods by synthesizing transforming photos of various outdoor scenes to four times of interest: daytime, the golden hour, the blue hour, and nighttime.National Science Foundation (U.S.) (NSF No.0964004)National Science Foundation (U.S.) (NSF CGV-1111415
Recommended from our members
Factored Time-Lapse Video
We describe a method for converting time-lapse photography captured with outdoor cameras into Factored Time-Lapse Video (FTLV): a video in which time appears to move faster (i.e., lapsing) and where data at each pixel has been factored into shadow, illumination, and reflectance components. The factorization allows a user to easily relight the scene, recover a portion of the scene geometry (normals), and to perform advanced image editing operations. Our method is easy to implement, robust, and provides a compact representation with good reconstruction characteristics. We show results using several publicly available time-lapse sequences.Engineering and Applied Science
- …