164 research outputs found
Self-Asymmetric Invertible Network for Compression-Aware Image Rescaling
High-resolution (HR) images are usually downscaled to low-resolution (LR)
ones for better display and afterward upscaled back to the original size to
recover details. Recent work in image rescaling formulates downscaling and
upscaling as a unified task and learns a bijective mapping between HR and LR
via invertible networks. However, in real-world applications (e.g., social
media), most images are compressed for transmission. Lossy compression will
lead to irreversible information loss on LR images, hence damaging the inverse
upscaling procedure and degrading the reconstruction accuracy. In this paper,
we propose the Self-Asymmetric Invertible Network (SAIN) for compression-aware
image rescaling. To tackle the distribution shift, we first develop an
end-to-end asymmetric framework with two separate bijective mappings for
high-quality and compressed LR images, respectively. Then, based on empirical
analysis of this framework, we model the distribution of the lost information
(including downscaling and compression) using isotropic Gaussian mixtures and
propose the Enhanced Invertible Block to derive high-quality/compressed LR
images in one forward pass. Besides, we design a set of losses to regularize
the learned LR images and enhance the invertibility. Extensive experiments
demonstrate the consistent improvements of SAIN across various image rescaling
datasets in terms of both quantitative and qualitative evaluation under
standard image compression formats (i.e., JPEG and WebP).Comment: Accepted by AAAI 2023. Code is available at
https://github.com/yang-jin-hai/SAI
HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization
Contemporary image rescaling aims at embedding a high-resolution (HR) image
into a low-resolution (LR) thumbnail image that contains embedded information
for HR image reconstruction. Unlike traditional image super-resolution, this
enables high-fidelity HR image restoration faithful to the original one, given
the embedded information in the LR thumbnail. However, state-of-the-art image
rescaling methods do not optimize the LR image file size for efficient sharing
and fall short of real-time performance for ultra-high-resolution (e.g., 6K)
image reconstruction. To address these two challenges, we propose a novel
framework (HyperThumbnail) for real-time 6K rate-distortion-aware image
rescaling. Our framework first embeds an HR image into a JPEG LR thumbnail by
an encoder with our proposed quantization prediction module, which minimizes
the file size of the embedding LR JPEG thumbnail while maximizing HR
reconstruction quality. Then, an efficient frequency-aware decoder reconstructs
a high-fidelity HR image from the LR one in real time. Extensive experiments
demonstrate that our framework outperforms previous image rescaling baselines
in rate-distortion performance and can perform 6K image reconstruction in real
time.Comment: Accepted by CVPR 2023; Github Repository:
https://github.com/AbnerVictor/HyperThumbnai
Invertible Rescaling Network and Its Extensions
Image rescaling is a commonly used bidirectional operation, which first
downscales high-resolution images to fit various display screens or to be
storage- and bandwidth-friendly, and afterward upscales the corresponding
low-resolution images to recover the original resolution or the details in the
zoom-in images. However, the non-injective downscaling mapping discards
high-frequency contents, leading to the ill-posed problem for the inverse
restoration task. This can be abstracted as a general image
degradation-restoration problem with information loss. In this work, we propose
a novel invertible framework to handle this general problem, which models the
bidirectional degradation and restoration from a new perspective, i.e.
invertible bijective transformation. The invertibility enables the framework to
model the information loss of pre-degradation in the form of distribution,
which could mitigate the ill-posed problem during post-restoration. To be
specific, we develop invertible models to generate valid degraded images and
meanwhile transform the distribution of lost contents to the fixed distribution
of a latent variable during the forward degradation. Then restoration is made
tractable by applying the inverse transformation on the generated degraded
image together with a randomly-drawn latent variable. We start from image
rescaling and instantiate the model as Invertible Rescaling Network (IRN),
which can be easily extended to the similar decolorization-colorization task.
We further propose to combine the invertible framework with existing
degradation methods such as image compression for wider applications.
Experimental results demonstrate the significant improvement of our model over
existing methods in terms of both quantitative and qualitative evaluations of
upscaling and colorizing reconstruction from downscaled and decolorized images,
and rate-distortion of image compression.Comment: Accepted by IJC
Real-world super-resolution of face-images from surveillance cameras
Most existing face image Super-Resolution (SR) methods assume that the
Low-Resolution (LR) images were artificially downsampled from High-Resolution
(HR) images with bicubic interpolation. This operation changes the natural
image characteristics and reduces noise. Hence, SR methods trained on such data
most often fail to produce good results when applied to real LR images. To
solve this problem, we propose a novel framework for generation of realistic
LR/HR training pairs. Our framework estimates realistic blur kernels, noise
distributions, and JPEG compression artifacts to generate LR images with
similar image characteristics as the ones in the source domain. This allows us
to train a SR model using high quality face images as Ground-Truth (GT). For
better perceptual quality we use a Generative Adversarial Network (GAN) based
SR model where we have exchanged the commonly used VGG-loss [24] with
LPIPS-loss [52]. Experimental results on both real and artificially corrupted
face images show that our method results in more detailed reconstructions with
less noise compared to existing State-of-the-Art (SoTA) methods. In addition,
we show that the traditional non-reference Image Quality Assessment (IQA)
methods fail to capture this improvement and demonstrate that the more recent
NIMA metric [16] correlates better with human perception via Mean Opinion Rank
(MOR)
Digital Multimedia Forensics and Anti-Forensics
As the use of digital multimedia content such as images and video has increased, so has the means and the incentive to create digital forgeries. Presently, powerful editing software allows forgers to create perceptually convincing digital forgeries. Accordingly, there is a great need for techniques capable of authenticating digital multimedia content. In response to this, researchers have begun developing digital forensic techniques capable of identifying digital forgeries. These forensic techniques operate by detecting imperceptible traces left by editing operations in digital multimedia content. In this dissertation, we propose several new digital forensic techniques to detect evidence of editing in digital multimedia content.
We begin by identifying the fingerprints left by pixel value mappings and show how these can be used to detect the use of contrast enhancement in images. We use these fingerprints to perform a number of additional forensic tasks such as identifying cut-and-paste forgeries, detecting the addition of noise to previously JPEG compressed images, and estimating the contrast enhancement mapping used to alter an image.
Additionally, we consider the problem of multimedia security from the forger's point of view. We demonstrate that an intelligent forger can design anti-forensic operations to hide editing fingerprints and fool forensic techniques. We propose an anti-forensic technique to remove compression fingerprints from digital images and show that this technique can be used to fool several state-of-the-art forensic algorithms. We examine the problem of detecting frame deletion in digital video and develop both a technique to detect frame deletion and an anti-forensic technique to hide frame deletion fingerprints. We show that this anti-forensic operation leaves behind fingerprints of its own and propose a technique to detect the use of frame deletion anti-forensics. The ability of a forensic investigator to detect both editing and the use of anti-forensics results in a dynamic interplay between the forger and forensic investigator. We use develop a game theoretic framework to analyze this interplay and identify the set of actions that each party will rationally choose. Additionally, we show that anti-forensics can be used protect against reverse engineering. To demonstrate this, we propose an anti-forensic module that can be integrated into digital cameras to protect color interpolation methods
Post-stack seismic data compression with multidimensional deep autoencoders
Seismic data are surveys from the Earth's subsurface with the goal of representing the
geophysical characteristics from the region where they were obtained in order to be
interpreted. These data can occupy hundreds of Gigabytes of storage, motivating their
compression. In this work, we approach the problem of three-dimensional post-stack
seismic data using models based on deep autoencoders. The deep autoencoder is a neural
network that allows representing most of the information of a seismic data with a lower
cost in comparison to its original representation. To the best of our knowledge, this is the
rst work to deal with seismic compression using deep learning. Four compression methods
for post-stack data are proposed: two based on a bi-dimensional compression, named
2D-based Seismic Data Compression(2DSC) and 2D-based Seismic Data Compression using
Multi-resolution (2DSC-MR), and two based on three-dimensional compression, named
3D-based Seismic Data Compression (3DSC) and 3D-based Seismic Data Compression
using Vector Quantization (3DSC-VQ). The 2DSC is our simplest method for seismic
compression, in which the volume is compressed through its bi-dimensional sections. The
2DSC-MR extends the previous method by introducing the data compression in multiple
resolutions. The 3DSC extends the 2DSC method by allowing the seismic data compression
by using the three-dimensional volume instead of 2D slices. This method considers the
similarity between sections to compress a whole volume with the cost of a single section.
The 3DSC-VQ uses vector quantization aiming to extract more information from the
seismic volumes in the encoding part. Our main goal is to compress the seismic data at
low bit rates, attaining a high quality reconstruction. Experiments show that our methods
can compress seismic data yielding PSNR values over 40 dB and bit rates below 1.0 bpv.Dados sísmicos s~ao mapeamentos da subsuperfície terrestre que têm como objetivo representar
as características geofísicas da região onde eles foram obtidos de forma que possam
ser interpretados. Esses dados podem ocupar centenas de Gigabytes de armazenamento,
motivando sua compressão. Neste trabalho o problema de compressão de dados sísmicos
tridimensionais pós-pilha é abordado usando modelos baseados em autocodificadores
profundos. O autocodificador profundo é uma rede neural que permite representar a
maior parte da informação contida em um dado sísmico com um custo menor que sua
representação original. De acordo com nosso conhecimento, este é o primeiro trabalho a
lidar com compressão de dados sísmicos utilizando aprendizado profundo. Dessa forma,
através de aproximações sucessivas, são propostos quatro métodos de compressão de dados
tridimensionais pós-pilha: dois baseados em compressão bidimensional, chamados Método
de Compressão 2D de Dado Sísmico (2DSC) e Método de Compressão 2D de Dado Sísmico
usando Multi-resolução (2DSC-MR), e dois baseados em compressão tridimensional,
chamados Método de Compressão 3D de Dado Sísmico (3DSC) e Método de Compressão
3D de Dado Sísmico usando Quantização Vetorial (3DSC-VQ). O método 2DSC é o nosso
método de compressão do dado sísmico mais simples, onde o volume é comprimido a
partir de suas seções bidimensionais. O método 2DSC-MR estende o método anterior
introduzindo a compressão do dado em múltiplas resoluções. O método 3DSC estende
o método 2DSC permitindo a compressão do dado sísmico em sua forma volumétrica,
considerando a similaridade entre seções para representar um volume inteiro com o custo de
apenas uma seção. O método 3DSC-VQ utiliza quantização vetorial para relaxar a etapa
de codificação do método anterior, dando maior liberdade à rede para extrair informação
dos volumes sísmicos. O objetivo deste trabalho é comprimir o dado sísmico a baixas
taxas de bits e com alta qualidade de reconstrução em termos de PSNR e bits-por-voxel
(bpv). Experimentos mostram que os quatro métodos podem comprimir o dado sísmico
fornecendo valores de PSNR acima de 40 dB a taxas de bits abaixo de 1.0 bpv.CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superio
Deep Learning Methods for Streaming Image Reconstruction in Fixed-camera Settings
A streaming video reconstruction system is described and implemented as a convolutional neural network. The system performs combined 2x super-resolution and H.264 artefacts removal with a processing speed of about 6 frames per second at 1920×1080 output resolution on current workstation-grade hardware. In 4x super-resolution mode, the system can output 3840×2160 video at a similar rate. The base system provides quality improvements of 0.010–0.025 SSIM over Lanczos filtering. Scene-specific training, in which the system automatically adapts to the current scene viewed by the camera, is shown to achieve up to 0.030 SSIM additional improvement in some scenarios. It is further shown that scene-specific training can provide some improvement even when reconstructing an unfamiliar scene, as long as the camera and capture settings remain the same.Många kameror sitter fast monterade och filmar samma plats varje dag. Tänk om kameror kunde tränas till att minnas vad de sett
- …