46 research outputs found
Collage Diffusion
We seek to give users precise control over diffusion-based image generation
by modeling complex scenes as sequences of layers, which define the desired
spatial arrangement and visual attributes of objects in the scene. Collage
Diffusion harmonizes the input layers to make objects fit together -- the key
challenge involves minimizing changes in the positions and key visual
attributes of the input layers while allowing other attributes to change in the
harmonization process. We ensure that objects are generated in the correct
locations by modifying text-image cross-attention with the layers' alpha masks.
We preserve key visual attributes of input layers by learning specialized text
representations per layer and by extending ControlNet to operate on layers.
Layer input allows users to control the extent of image harmonization on a
per-object basis, and users can even iteratively edit individual objects in
generated images while keeping other objects fixed. By leveraging the rich
information present in layer input, Collage Diffusion generates globally
harmonized images that maintain desired object characteristics better than
prior approaches
Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning
An ideal learned representation should display transferability and
robustness. Supervised contrastive learning (SupCon) is a promising method for
training accurate models, but produces representations that do not capture
these properties due to class collapse -- when all points in a class map to the
same representation. Recent work suggests that "spreading out" these
representations improves them, but the precise mechanism is poorly understood.
We argue that creating spread alone is insufficient for better representations,
since spread is invariant to permutations within classes. Instead, both the
correct degree of spread and a mechanism for breaking this invariance are
necessary. We first prove that adding a weighted class-conditional InfoNCE loss
to SupCon controls the degree of spread. Next, we study three mechanisms to
break permutation invariance: using a constrained encoder, adding a
class-conditional autoencoder, and using data augmentation. We show that the
latter two encourage clustering of latent subclasses under more realistic
conditions than the former. Using these insights, we show that adding a
properly-weighted class-conditional InfoNCE loss and a class-conditional
autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer
across 5 standard datasets and 4.7 points on worst-group robustness on 3
datasets, setting state-of-the-art on CelebA by 11.5 points