16,808 research outputs found
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Low-end and compact mobile cameras demonstrate limited photo quality mainly
due to space, hardware and budget constraints. In this work, we propose a deep
learning solution that translates photos taken by cameras with limited
capabilities into DSLR-quality photos automatically. We tackle this problem by
introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image
Generative Adversarial Network-based architecture. The proposed model is
trained by under weak supervision: unlike previous works, there is no need for
strong supervision in the form of a large annotated dataset of aligned
original/enhanced photo pairs. The sole requirement is two distinct datasets:
one from the source camera, and one composed of arbitrary high-quality images
that can be generally crawled from the Internet - the visual content they
exhibit may be unrelated. Hence, our solution is repeatable for any camera:
collecting the data and training can be achieved in a couple of hours. In this
work, we emphasize on extensive evaluation of obtained results. Besides
standard objective metrics and subjective user study, we train a virtual rater
in the form of a separate CNN that mimics human raters on Flickr data and use
this network to get reference scores for both original and enhanced photos. Our
experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from
several generations of smartphones demonstrate that WESPE produces comparable
or improved qualitative results with state-of-the-art strongly supervised
methods
StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation
Unconditional video generation is a challenging task that involves
synthesizing high-quality videos that are both coherent and of extended
duration. To address this challenge, researchers have used pretrained StyleGAN
image generators for high-quality frame synthesis and focused on motion
generator design. The motion generator is trained in an autoregressive manner
using heavy 3D convolutional discriminators to ensure motion coherence during
video generation. In this paper, we introduce a novel motion generator design
that uses a learning-based inversion network for GAN. The encoder in our method
captures rich and smooth priors from encoding images to latents, and given the
latent of an initially generated frame as guidance, our method can generate
smooth future latent by modulating the inversion encoder temporally. Our method
enjoys the advantage of sparse training and naturally constrains the generation
space of our motion generator with the inversion network guided by the initial
frame, eliminating the need for heavy discriminators. Moreover, our method
supports style transfer with simple fine-tuning when the encoder is paired with
a pretrained StyleGAN generator. Extensive experiments conducted on various
benchmarks demonstrate the superiority of our method in generating long and
high-resolution videos with decent single-frame quality and temporal
consistency.Comment: ICCV 2023. Code: https://github.com/johannwyh/StyleInV Project page:
https://www.mmlab-ntu.com/project/styleinv/index.htm
Distributed Rate Allocation Policies for Multi-Homed Video Streaming over Heterogeneous Access Networks
We consider the problem of rate allocation among multiple simultaneous video
streams sharing multiple heterogeneous access networks. We develop and evaluate
an analytical framework for optimal rate allocation based on observed available
bit rate (ABR) and round-trip time (RTT) over each access network and video
distortion-rate (DR) characteristics. The rate allocation is formulated as a
convex optimization problem that minimizes the total expected distortion of all
video streams. We present a distributed approximation of its solution and
compare its performance against H-infinity optimal control and two heuristic
schemes based on TCP-style additive-increase-multiplicative decrease (AIMD)
principles. The various rate allocation schemes are evaluated in simulations of
multiple high-definition (HD) video streams sharing multiple access networks.
Our results demonstrate that, in comparison with heuristic AIMD-based schemes,
both media-aware allocation and H-infinity optimal control benefit from
proactive congestion avoidance and reduce the average packet loss rate from 45%
to below 2%. Improvement in average received video quality ranges between 1.5
to 10.7 dB in PSNR for various background traffic loads and video playout
deadlines. Media-aware allocation further exploits its knowledge of the video
DR characteristics to achieve a more balanced video quality among all streams.Comment: 12 pages, 22 figure
- …