155 research outputs found
Perceptual Image Compression with Cooperative Cross-Modal Side Information
The explosion of data has resulted in more and more associated text being
transmitted along with images. Inspired by from distributed source coding, many
works utilize image side information to enhance image compression. However,
existing methods generally do not consider using text as side information to
enhance perceptual compression of images, even though the benefits of
multimodal synergy have been widely demonstrated in research. This begs the
following question: How can we effectively transfer text-level semantic
dependencies to help image compression, which is only available to the decoder?
In this work, we propose a novel deep image compression method with text-guided
side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial
Aware block to fuse the text and image features. This is done by predicting a
semantic mask to guide the learned text-adaptive affine transformation at the
pixel level. Furthermore, we design a text-conditional generative adversarial
networks to improve the perceptual quality of reconstructed images. Extensive
experiments involving four datasets and ten image quality assessment metrics
demonstrate that the proposed approach achieves superior results in terms of
rate-perception trade-off and semantic distortion
Bayesian Domain Invariant Learning via Posterior Generalization of Parameter Distributions
Domain invariant learning aims to learn models that extract invariant
features over various training domains, resulting in better generalization to
unseen target domains. Recently, Bayesian Neural Networks have achieved
promising results in domain invariant learning, but most works concentrate on
aligning features distributions rather than parameter distributions. Inspired
by the principle of Bayesian Neural Network, we attempt to directly learn the
domain invariant posterior distribution of network parameters. We first propose
a theorem to show that the invariant posterior of parameters can be implicitly
inferred by aggregating posteriors on different training domains. Our
assumption is more relaxed and allows us to extract more domain invariant
information. We also propose a simple yet effective method, named PosTerior
Generalization (PTG), that can be used to estimate the invariant parameter
distribution. PTG fully exploits variational inference to approximate parameter
distributions, including the invariant posterior and the posteriors on training
domains. Furthermore, we develop a lite version of PTG for widespread
applications. PTG shows competitive performance on various domain
generalization benchmarks on DomainBed. Additionally, PTG can use any existing
domain generalization methods as its prior, and combined with previous
state-of-the-art method the performance can be further improved. Code will be
made public
Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression
In this paper, we propose a progressive learning paradigm for
transformer-based variable-rate image compression. Our approach covers a wide
range of compression rates with the assistance of the Layer-adaptive Prompt
Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts
for input images and hidden features at the encoder side and decoder side,
respectively, which are fed as additional information into the Swin Transformer
layer of a pre-trained transformer-based image compression model to affect the
allocation of attention region and the bits, which in turn changes the target
compression ratio of the model. To ensure the network is more lightweight, we
involves the integration of prompt networks with less convolutional layers.
Exhaustive experiments show that compared to methods based on multiple models,
which are optimized separately for different target rates, the proposed method
arrives at the same performance with 80% savings in parameter storage and 90%
savings in datasets. Meanwhile, our model outperforms all current variable
bitrate image methods in terms of rate-distortion performance and approaches
the state-of-the-art fixed bitrate image compression methods trained from
scratch
Exploring the Relationship between Architecture and Adversarially Robust Generalization
Adversarial training has been demonstrated to be one of the most effective
remedies for defending adversarial examples, yet it often suffers from the huge
robustness generalization gap on unseen testing adversaries, deemed as the
adversarially robust generalization problem. Despite the preliminary
understandings devoted to adversarially robust generalization, little is known
from the architectural perspective. To bridge the gap, this paper for the first
time systematically investigated the relationship between adversarially robust
generalization and architectural design. Inparticular, we comprehensively
evaluated 20 most representative adversarially trained architectures on
ImageNette and CIFAR-10 datasets towards multiple `p-norm adversarial attacks.
Based on the extensive experiments, we found that, under aligned settings,
Vision Transformers (e.g., PVT, CoAtNet) often yield better adversarially
robust generalization while CNNs tend to overfit on specific attacks and fail
to generalize on multiple adversaries. To better understand the nature behind
it, we conduct theoretical analysis via the lens of Rademacher complexity. We
revealed the fact that the higher weight sparsity contributes significantly
towards the better adversarially robust generalization of Transformers, which
can be often achieved by the specially-designed attention blocks. We hope our
paper could help to better understand the mechanism for designing robust DNNs.
Our model weights can be found at http://robust.art
Measurement of the high-temperature strain of UHTC materials using chemical composition gratings
Seeing through the hedge:Phylogenomics of Thuja (Cupressaceae) reveals prominent incomplete lineage sorting and ancient introgression for Tertiary relict flora
Rate and risk factors of recurrent immune checkpoint inhibitor-related pneumonitis in patients with lung cancer
- …