323 research outputs found
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Low-end and compact mobile cameras demonstrate limited photo quality mainly
due to space, hardware and budget constraints. In this work, we propose a deep
learning solution that translates photos taken by cameras with limited
capabilities into DSLR-quality photos automatically. We tackle this problem by
introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image
Generative Adversarial Network-based architecture. The proposed model is
trained by under weak supervision: unlike previous works, there is no need for
strong supervision in the form of a large annotated dataset of aligned
original/enhanced photo pairs. The sole requirement is two distinct datasets:
one from the source camera, and one composed of arbitrary high-quality images
that can be generally crawled from the Internet - the visual content they
exhibit may be unrelated. Hence, our solution is repeatable for any camera:
collecting the data and training can be achieved in a couple of hours. In this
work, we emphasize on extensive evaluation of obtained results. Besides
standard objective metrics and subjective user study, we train a virtual rater
in the form of a separate CNN that mimics human raters on Flickr data and use
this network to get reference scores for both original and enhanced photos. Our
experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from
several generations of smartphones demonstrate that WESPE produces comparable
or improved qualitative results with state-of-the-art strongly supervised
methods
Invertible Rescaling Network and Its Extensions
Image rescaling is a commonly used bidirectional operation, which first
downscales high-resolution images to fit various display screens or to be
storage- and bandwidth-friendly, and afterward upscales the corresponding
low-resolution images to recover the original resolution or the details in the
zoom-in images. However, the non-injective downscaling mapping discards
high-frequency contents, leading to the ill-posed problem for the inverse
restoration task. This can be abstracted as a general image
degradation-restoration problem with information loss. In this work, we propose
a novel invertible framework to handle this general problem, which models the
bidirectional degradation and restoration from a new perspective, i.e.
invertible bijective transformation. The invertibility enables the framework to
model the information loss of pre-degradation in the form of distribution,
which could mitigate the ill-posed problem during post-restoration. To be
specific, we develop invertible models to generate valid degraded images and
meanwhile transform the distribution of lost contents to the fixed distribution
of a latent variable during the forward degradation. Then restoration is made
tractable by applying the inverse transformation on the generated degraded
image together with a randomly-drawn latent variable. We start from image
rescaling and instantiate the model as Invertible Rescaling Network (IRN),
which can be easily extended to the similar decolorization-colorization task.
We further propose to combine the invertible framework with existing
degradation methods such as image compression for wider applications.
Experimental results demonstrate the significant improvement of our model over
existing methods in terms of both quantitative and qualitative evaluations of
upscaling and colorizing reconstruction from downscaled and decolorized images,
and rate-distortion of image compression.Comment: Accepted by IJC
A new adaptive colorization filter for video decompression
HD content is more in demand and requires a lot of bandwidth. In this paper, a new real-time adaptive colorization filter for HD videos is presented. This approach reduces the required bandwidth by reducing non-key frames in the HD video sequence to grayscale and colourizing these frames at the decompression stage. Additionally this technique determines the frame status based on the image information
Cross Pixel Optical Flow Similarity for Self-Supervised Learning
We propose a novel method for learning convolutional neural image
representations without manual supervision. We use motion cues in the form of
optical flow, to supervise representations of static images. The obvious
approach of training a network to predict flow from a single image can be
needlessly difficult due to intrinsic ambiguities in this prediction task. We
instead propose a much simpler learning goal: embed pixels such that the
similarity between their embeddings matches that between their optical flow
vectors. At test time, the learned deep network can be used without access to
video or flow information and transferred to tasks such as image
classification, detection, and segmentation. Our method, which significantly
simplifies previous attempts at using motion for self-supervision, achieves
state-of-the-art results in self-supervision using motion cues, competitive
results for self-supervision in general, and is overall state of the art in
self-supervised pretraining for semantic image segmentation, as demonstrated on
standard benchmarks
- …