We introduce a new method for diverse foreground generation with explicit
control over various factors. Existing image inpainting based foreground
generation methods often struggle to generate diverse results and rarely allow
users to explicitly control specific factors of variation (e.g., varying the
facial identity or expression for face inpainting results). We leverage
contrastive learning with latent codes to generate diverse foreground results
for the same masked input. Specifically, we define two sets of latent codes,
where one controls a pre-defined factor (``known''), and the other controls the
remaining factors (``unknown''). The sampled latent codes from the two sets
jointly bi-modulate the convolution kernels to guide the generator to
synthesize diverse results. Experiments demonstrate the superiority of our
method over state-of-the-arts in result diversity and generation
controllability.Comment: ECCV 202