By optimizing the rate-distortion-realism trade-off, generative compression
approaches produce detailed, realistic images, even at low bit rates, instead
of the blurry reconstructions produced by rate-distortion optimized models.
However, previous methods do not explicitly control how much detail is
synthesized, which results in a common criticism of these methods: users might
be worried that a misleading reconstruction far from the input image is
generated. In this work, we alleviate these concerns by training a decoder that
can bridge the two regimes and navigate the distortion-realism trade-off. From
a single compressed representation, the receiver can decide to either
reconstruct a low mean squared error reconstruction that is close to the input,
a realistic reconstruction with high perceptual quality, or anything in
between. With our method, we set a new state-of-the-art in distortion-realism,
pushing the frontier of achievable distortion-realism pairs, i.e., our method
achieves better distortions at high realism and better realism at low
distortion than ever before