16,808 research outputs found

    WESPE: Weakly Supervised Photo Enhancer for Digital Cameras

    Full text link
    Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods

    StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation

    Full text link
    Unconditional video generation is a challenging task that involves synthesizing high-quality videos that are both coherent and of extended duration. To address this challenge, researchers have used pretrained StyleGAN image generators for high-quality frame synthesis and focused on motion generator design. The motion generator is trained in an autoregressive manner using heavy 3D convolutional discriminators to ensure motion coherence during video generation. In this paper, we introduce a novel motion generator design that uses a learning-based inversion network for GAN. The encoder in our method captures rich and smooth priors from encoding images to latents, and given the latent of an initially generated frame as guidance, our method can generate smooth future latent by modulating the inversion encoder temporally. Our method enjoys the advantage of sparse training and naturally constrains the generation space of our motion generator with the inversion network guided by the initial frame, eliminating the need for heavy discriminators. Moreover, our method supports style transfer with simple fine-tuning when the encoder is paired with a pretrained StyleGAN generator. Extensive experiments conducted on various benchmarks demonstrate the superiority of our method in generating long and high-resolution videos with decent single-frame quality and temporal consistency.Comment: ICCV 2023. Code: https://github.com/johannwyh/StyleInV Project page: https://www.mmlab-ntu.com/project/styleinv/index.htm

    Distributed Rate Allocation Policies for Multi-Homed Video Streaming over Heterogeneous Access Networks

    Full text link
    We consider the problem of rate allocation among multiple simultaneous video streams sharing multiple heterogeneous access networks. We develop and evaluate an analytical framework for optimal rate allocation based on observed available bit rate (ABR) and round-trip time (RTT) over each access network and video distortion-rate (DR) characteristics. The rate allocation is formulated as a convex optimization problem that minimizes the total expected distortion of all video streams. We present a distributed approximation of its solution and compare its performance against H-infinity optimal control and two heuristic schemes based on TCP-style additive-increase-multiplicative decrease (AIMD) principles. The various rate allocation schemes are evaluated in simulations of multiple high-definition (HD) video streams sharing multiple access networks. Our results demonstrate that, in comparison with heuristic AIMD-based schemes, both media-aware allocation and H-infinity optimal control benefit from proactive congestion avoidance and reduce the average packet loss rate from 45% to below 2%. Improvement in average received video quality ranges between 1.5 to 10.7 dB in PSNR for various background traffic loads and video playout deadlines. Media-aware allocation further exploits its knowledge of the video DR characteristics to achieve a more balanced video quality among all streams.Comment: 12 pages, 22 figure
    • …
    corecore