796 research outputs found
Operational one-to-one mapping between coherence and entanglement measures
We establish a general operational one-to-one mapping between coherence
measures and entanglement measures: Any entanglement measure of bipartite pure
states is the minimum of a suitable coherence measure over product bases. Any
coherence measure of pure states, with extension to mixed states by convex
roof, is the maximum entanglement generated by incoherent operations acting on
the system and an incoherent ancilla. Remarkably, the generalized CNOT gate is
the universal optimal incoherent operation. In this way, all convex-roof
coherence measures, including the coherence of formation, are endowed with
(additional) operational interpretations. By virtue of this connection, many
results on entanglement can be translated to the coherence setting, and vice
versa. As applications, we provide tight observable lower bounds for
generalized entanglement concurrence and coherence concurrence, which enable
experimentalists to quantify entanglement and coherence of the maximal
dimension in real experiments.Comment: 14 pages, 1 figure, new results added, published in PR
Lossy Image Compression with Quantized Hierarchical VAEs
Recent research has shown a strong theoretical connection between variational
autoencoders (VAEs) and the rate-distortion theory. Motivated by this, we
consider the problem of lossy image compression from the perspective of
generative modeling. Starting with ResNet VAEs, which are originally designed
for data (image) distribution modeling, we redesign their latent variable model
using a quantization-aware posterior and prior, enabling easy quantization and
entropy coding at test time. Along with improved neural network architecture,
we present a powerful and efficient model that outperforms previous methods on
natural image lossy compression. Our model compresses images in a
coarse-to-fine fashion and supports parallel encoding and decoding, leading to
fast execution on GPUs. Code is available at
https://github.com/duanzhiihao/lossy-vae.Comment: WACV 2023 Best Algorithms Paper Award, revised versio
An Improved Upper Bound on the Rate-Distortion Function of Images
Recent work has shown that Variational Autoencoders (VAEs) can be used to
upper-bound the information rate-distortion (R-D) function of images, i.e., the
fundamental limit of lossy image compression. In this paper, we report an
improved upper bound on the R-D function of images implemented by (1)
introducing a new VAE model architecture, (2) applying variable-rate
compression techniques, and (3) proposing a novel \ourfunction{} to stabilize
training. We demonstrate that at least 30\% BD-rate reduction w.r.t. the intra
prediction mode in VVC codec is achievable, suggesting that there is still
great potential for improving lossy image compression. Code is made publicly
available at https://github.com/duanzhiihao/lossy-vae.Comment: Conference paper at ICIP 2023. The first two authors share equal
contribution
Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation
Text-to-Image (T2I) generation with diffusion models allows users to control
the semantic content in the synthesized images given text conditions. As a
further step toward a more customized image creation application, we introduce
a new multi-modality generation setting that synthesizes images based on not
only the semantic-level textual input but also on the pixel-level visual
conditions. Existing literature first converts the given visual information to
semantic-level representation by connecting it to languages, and then
incorporates it into the original denoising process. Seemingly intuitive, such
methodological design loses the pixel values during the semantic transition,
thus failing to fulfill the task scenario where the preservation of low-level
vision is desired (e.g., ID of a given face image). To this end, we propose
Cyclic One-Way Diffusion (COW), a training-free framework for creating
customized images with respect to semantic text and pixel-visual conditioning.
Notably, we observe that sub-regions of an image impose mutual interference,
just like physical diffusion, to achieve ultimate harmony along the denoising
trajectory. Thus we propose to repetitively utilize the given visual condition
in a cyclic way, by planting the visual condition as a high-concentration
"seed" at the initialization step of the denoising process, and "diffuse" it
into a harmonious picture by controlling a one-way information flow from the
visual condition. We repeat the destroy-and-construct process multiple times to
gradually but steadily impose the internal diffusion process within the image.
Experiments on the challenging one-shot face and text-conditioned image
synthesis task demonstrate our superiority in terms of speed, image quality,
and conditional fidelity compared to learning-based text-vision conditional
methods. Project page is available at: https://bigaandsmallq.github.io/COW/Comment: Project page is available at: https://bigaandsmallq.github.io/COW
- …