930 research outputs found
Semantic and generative models for lossy text compression
The apparent divergence between the research paradigms of text and image compression has led us to consider the potential for applying methods developed for one domain to the other. This paper examines the idea of "lossy" text compression, which transmits an approximation to the input text rather than the text itself. In image coding, lossy techniques have proven to yield compression factors that are vastly superior to those of the best lossless schemes, and we show that this a also the case for text. Two different methods are described here, one inspired by the use of fractals in image compression. They can be combined into an extremely effective technique that provides much better compression than the present state of the art and yet preserves a reasonable degree of match between the original and received text. The major challenge for lossy text compression is identified as the reliable evaluation of the quality of this match
Multi-Modality Deep Network for Extreme Learned Image Compression
Image-based single-modality compression learning approaches have demonstrated
exceptionally powerful encoding and decoding capabilities in the past few years
, but suffer from blur and severe semantics loss at extremely low bitrates. To
address this issue, we propose a multimodal machine learning method for
text-guided image compression, in which the semantic information of text is
used as prior information to guide image compression for better compression
performance. We fully study the role of text description in different
components of the codec, and demonstrate its effectiveness. In addition, we
adopt the image-text attention module and image-request complement module to
better fuse image and text features, and propose an improved multimodal
semantic-consistent loss to produce semantically complete reconstructions.
Extensive experiments, including a user study, prove that our method can obtain
visually pleasing results at extremely low bitrates, and achieves a comparable
or even better performance than state-of-the-art methods, even though these
methods are at 2x to 4x bitrates of ours.Comment: 13 pages, 14 figures, accepted by AAAI 202
An Introduction to Neural Data Compression
Neural compression is the application of neural networks and other machine
learning methods to data compression. Recent advances in statistical machine
learning have opened up new possibilities for data compression, allowing
compression algorithms to be learned end-to-end from data using powerful
generative models such as normalizing flows, variational autoencoders,
diffusion probabilistic models, and generative adversarial networks. The
present article aims to introduce this field of research to a broader machine
learning audience by reviewing the necessary background in information theory
(e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image
quality assessment, perceptual metrics), and providing a curated guide through
the essential ideas and methods in the literature thus far
Perceptual Image Compression with Cooperative Cross-Modal Side Information
The explosion of data has resulted in more and more associated text being
transmitted along with images. Inspired by from distributed source coding, many
works utilize image side information to enhance image compression. However,
existing methods generally do not consider using text as side information to
enhance perceptual compression of images, even though the benefits of
multimodal synergy have been widely demonstrated in research. This begs the
following question: How can we effectively transfer text-level semantic
dependencies to help image compression, which is only available to the decoder?
In this work, we propose a novel deep image compression method with text-guided
side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial
Aware block to fuse the text and image features. This is done by predicting a
semantic mask to guide the learned text-adaptive affine transformation at the
pixel level. Furthermore, we design a text-conditional generative adversarial
networks to improve the perceptual quality of reconstructed images. Extensive
experiments involving four datasets and ten image quality assessment metrics
demonstrate that the proposed approach achieves superior results in terms of
rate-perception trade-off and semantic distortion
- …