164 research outputs found

    Self-Asymmetric Invertible Network for Compression-Aware Image Rescaling

    Full text link
    High-resolution (HR) images are usually downscaled to low-resolution (LR) ones for better display and afterward upscaled back to the original size to recover details. Recent work in image rescaling formulates downscaling and upscaling as a unified task and learns a bijective mapping between HR and LR via invertible networks. However, in real-world applications (e.g., social media), most images are compressed for transmission. Lossy compression will lead to irreversible information loss on LR images, hence damaging the inverse upscaling procedure and degrading the reconstruction accuracy. In this paper, we propose the Self-Asymmetric Invertible Network (SAIN) for compression-aware image rescaling. To tackle the distribution shift, we first develop an end-to-end asymmetric framework with two separate bijective mappings for high-quality and compressed LR images, respectively. Then, based on empirical analysis of this framework, we model the distribution of the lost information (including downscaling and compression) using isotropic Gaussian mixtures and propose the Enhanced Invertible Block to derive high-quality/compressed LR images in one forward pass. Besides, we design a set of losses to regularize the learned LR images and enhance the invertibility. Extensive experiments demonstrate the consistent improvements of SAIN across various image rescaling datasets in terms of both quantitative and qualitative evaluation under standard image compression formats (i.e., JPEG and WebP).Comment: Accepted by AAAI 2023. Code is available at https://github.com/yang-jin-hai/SAI

    HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

    Full text link
    Contemporary image rescaling aims at embedding a high-resolution (HR) image into a low-resolution (LR) thumbnail image that contains embedded information for HR image reconstruction. Unlike traditional image super-resolution, this enables high-fidelity HR image restoration faithful to the original one, given the embedded information in the LR thumbnail. However, state-of-the-art image rescaling methods do not optimize the LR image file size for efficient sharing and fall short of real-time performance for ultra-high-resolution (e.g., 6K) image reconstruction. To address these two challenges, we propose a novel framework (HyperThumbnail) for real-time 6K rate-distortion-aware image rescaling. Our framework first embeds an HR image into a JPEG LR thumbnail by an encoder with our proposed quantization prediction module, which minimizes the file size of the embedding LR JPEG thumbnail while maximizing HR reconstruction quality. Then, an efficient frequency-aware decoder reconstructs a high-fidelity HR image from the LR one in real time. Extensive experiments demonstrate that our framework outperforms previous image rescaling baselines in rate-distortion performance and can perform 6K image reconstruction in real time.Comment: Accepted by CVPR 2023; Github Repository: https://github.com/AbnerVictor/HyperThumbnai

    Invertible Rescaling Network and Its Extensions

    Full text link
    Image rescaling is a commonly used bidirectional operation, which first downscales high-resolution images to fit various display screens or to be storage- and bandwidth-friendly, and afterward upscales the corresponding low-resolution images to recover the original resolution or the details in the zoom-in images. However, the non-injective downscaling mapping discards high-frequency contents, leading to the ill-posed problem for the inverse restoration task. This can be abstracted as a general image degradation-restoration problem with information loss. In this work, we propose a novel invertible framework to handle this general problem, which models the bidirectional degradation and restoration from a new perspective, i.e. invertible bijective transformation. The invertibility enables the framework to model the information loss of pre-degradation in the form of distribution, which could mitigate the ill-posed problem during post-restoration. To be specific, we develop invertible models to generate valid degraded images and meanwhile transform the distribution of lost contents to the fixed distribution of a latent variable during the forward degradation. Then restoration is made tractable by applying the inverse transformation on the generated degraded image together with a randomly-drawn latent variable. We start from image rescaling and instantiate the model as Invertible Rescaling Network (IRN), which can be easily extended to the similar decolorization-colorization task. We further propose to combine the invertible framework with existing degradation methods such as image compression for wider applications. Experimental results demonstrate the significant improvement of our model over existing methods in terms of both quantitative and qualitative evaluations of upscaling and colorizing reconstruction from downscaled and decolorized images, and rate-distortion of image compression.Comment: Accepted by IJC

    Real-world super-resolution of face-images from surveillance cameras

    Get PDF
    Most existing face image Super-Resolution (SR) methods assume that the Low-Resolution (LR) images were artificially downsampled from High-Resolution (HR) images with bicubic interpolation. This operation changes the natural image characteristics and reduces noise. Hence, SR methods trained on such data most often fail to produce good results when applied to real LR images. To solve this problem, we propose a novel framework for generation of realistic LR/HR training pairs. Our framework estimates realistic blur kernels, noise distributions, and JPEG compression artifacts to generate LR images with similar image characteristics as the ones in the source domain. This allows us to train a SR model using high quality face images as Ground-Truth (GT). For better perceptual quality we use a Generative Adversarial Network (GAN) based SR model where we have exchanged the commonly used VGG-loss [24] with LPIPS-loss [52]. Experimental results on both real and artificially corrupted face images show that our method results in more detailed reconstructions with less noise compared to existing State-of-the-Art (SoTA) methods. In addition, we show that the traditional non-reference Image Quality Assessment (IQA) methods fail to capture this improvement and demonstrate that the more recent NIMA metric [16] correlates better with human perception via Mean Opinion Rank (MOR)

    Digital Multimedia Forensics and Anti-Forensics

    Get PDF
    As the use of digital multimedia content such as images and video has increased, so has the means and the incentive to create digital forgeries. Presently, powerful editing software allows forgers to create perceptually convincing digital forgeries. Accordingly, there is a great need for techniques capable of authenticating digital multimedia content. In response to this, researchers have begun developing digital forensic techniques capable of identifying digital forgeries. These forensic techniques operate by detecting imperceptible traces left by editing operations in digital multimedia content. In this dissertation, we propose several new digital forensic techniques to detect evidence of editing in digital multimedia content. We begin by identifying the fingerprints left by pixel value mappings and show how these can be used to detect the use of contrast enhancement in images. We use these fingerprints to perform a number of additional forensic tasks such as identifying cut-and-paste forgeries, detecting the addition of noise to previously JPEG compressed images, and estimating the contrast enhancement mapping used to alter an image. Additionally, we consider the problem of multimedia security from the forger's point of view. We demonstrate that an intelligent forger can design anti-forensic operations to hide editing fingerprints and fool forensic techniques. We propose an anti-forensic technique to remove compression fingerprints from digital images and show that this technique can be used to fool several state-of-the-art forensic algorithms. We examine the problem of detecting frame deletion in digital video and develop both a technique to detect frame deletion and an anti-forensic technique to hide frame deletion fingerprints. We show that this anti-forensic operation leaves behind fingerprints of its own and propose a technique to detect the use of frame deletion anti-forensics. The ability of a forensic investigator to detect both editing and the use of anti-forensics results in a dynamic interplay between the forger and forensic investigator. We use develop a game theoretic framework to analyze this interplay and identify the set of actions that each party will rationally choose. Additionally, we show that anti-forensics can be used protect against reverse engineering. To demonstrate this, we propose an anti-forensic module that can be integrated into digital cameras to protect color interpolation methods

    Post-stack seismic data compression with multidimensional deep autoencoders

    Get PDF
    Seismic data are surveys from the Earth's subsurface with the goal of representing the geophysical characteristics from the region where they were obtained in order to be interpreted. These data can occupy hundreds of Gigabytes of storage, motivating their compression. In this work, we approach the problem of three-dimensional post-stack seismic data using models based on deep autoencoders. The deep autoencoder is a neural network that allows representing most of the information of a seismic data with a lower cost in comparison to its original representation. To the best of our knowledge, this is the rst work to deal with seismic compression using deep learning. Four compression methods for post-stack data are proposed: two based on a bi-dimensional compression, named 2D-based Seismic Data Compression(2DSC) and 2D-based Seismic Data Compression using Multi-resolution (2DSC-MR), and two based on three-dimensional compression, named 3D-based Seismic Data Compression (3DSC) and 3D-based Seismic Data Compression using Vector Quantization (3DSC-VQ). The 2DSC is our simplest method for seismic compression, in which the volume is compressed through its bi-dimensional sections. The 2DSC-MR extends the previous method by introducing the data compression in multiple resolutions. The 3DSC extends the 2DSC method by allowing the seismic data compression by using the three-dimensional volume instead of 2D slices. This method considers the similarity between sections to compress a whole volume with the cost of a single section. The 3DSC-VQ uses vector quantization aiming to extract more information from the seismic volumes in the encoding part. Our main goal is to compress the seismic data at low bit rates, attaining a high quality reconstruction. Experiments show that our methods can compress seismic data yielding PSNR values over 40 dB and bit rates below 1.0 bpv.Dados sísmicos s~ao mapeamentos da subsuperfície terrestre que têm como objetivo representar as características geofísicas da região onde eles foram obtidos de forma que possam ser interpretados. Esses dados podem ocupar centenas de Gigabytes de armazenamento, motivando sua compressão. Neste trabalho o problema de compressão de dados sísmicos tridimensionais pós-pilha é abordado usando modelos baseados em autocodificadores profundos. O autocodificador profundo é uma rede neural que permite representar a maior parte da informação contida em um dado sísmico com um custo menor que sua representação original. De acordo com nosso conhecimento, este é o primeiro trabalho a lidar com compressão de dados sísmicos utilizando aprendizado profundo. Dessa forma, através de aproximações sucessivas, são propostos quatro métodos de compressão de dados tridimensionais pós-pilha: dois baseados em compressão bidimensional, chamados Método de Compressão 2D de Dado Sísmico (2DSC) e Método de Compressão 2D de Dado Sísmico usando Multi-resolução (2DSC-MR), e dois baseados em compressão tridimensional, chamados Método de Compressão 3D de Dado Sísmico (3DSC) e Método de Compressão 3D de Dado Sísmico usando Quantização Vetorial (3DSC-VQ). O método 2DSC é o nosso método de compressão do dado sísmico mais simples, onde o volume é comprimido a partir de suas seções bidimensionais. O método 2DSC-MR estende o método anterior introduzindo a compressão do dado em múltiplas resoluções. O método 3DSC estende o método 2DSC permitindo a compressão do dado sísmico em sua forma volumétrica, considerando a similaridade entre seções para representar um volume inteiro com o custo de apenas uma seção. O método 3DSC-VQ utiliza quantização vetorial para relaxar a etapa de codificação do método anterior, dando maior liberdade à rede para extrair informação dos volumes sísmicos. O objetivo deste trabalho é comprimir o dado sísmico a baixas taxas de bits e com alta qualidade de reconstrução em termos de PSNR e bits-por-voxel (bpv). Experimentos mostram que os quatro métodos podem comprimir o dado sísmico fornecendo valores de PSNR acima de 40 dB a taxas de bits abaixo de 1.0 bpv.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio

    Deep Learning Methods for Streaming Image Reconstruction in Fixed-camera Settings

    Get PDF
    A streaming video reconstruction system is described and implemented as a convolutional neural network. The system performs combined 2x super-resolution and H.264 artefacts removal with a processing speed of about 6 frames per second at 1920×1080 output resolution on current workstation-grade hardware. In 4x super-resolution mode, the system can output 3840×2160 video at a similar rate. The base system provides quality improvements of 0.010–0.025 SSIM over Lanczos filtering. Scene-specific training, in which the system automatically adapts to the current scene viewed by the camera, is shown to achieve up to 0.030 SSIM additional improvement in some scenarios. It is further shown that scene-specific training can provide some improvement even when reconstructing an unfamiliar scene, as long as the camera and capture settings remain the same.Många kameror sitter fast monterade och filmar samma plats varje dag. Tänk om kameror kunde tränas till att minnas vad de sett
    corecore