We propose conditional perceptual quality, an extension of the perceptual
quality defined in \citet{blau2018perception}, by conditioning it on user
defined information. Specifically, we extend the original perceptual quality
d(pX​,pX^​) to the conditional perceptual quality
d(pX∣Y​,pX^∣Y​), where X is the original image, X^ is the
reconstructed, Y is side information defined by user and d(.,.) is
divergence. We show that conditional perceptual quality has similar theoretical
properties as rate-distortion-perception trade-off \citep{blau2019rethinking}.
Based on these theoretical results, we propose an optimal framework for
conditional perceptual quality preserving compression. Experimental results
show that our codec successfully maintains high perceptual quality and semantic
quality at all bitrate. Besides, by providing a lowerbound of common randomness
required, we settle the previous arguments on whether randomness should be
incorporated into generator for (conditional) perceptual quality compression.
The source code is provided in supplementary material