109 research outputs found
F?D: On understanding the role of deep feature spaces on face generation evaluation
Perceptual metrics, like the Fr\'echet Inception Distance (FID), are widely
used to assess the similarity between synthetically generated and ground truth
(real) images. The key idea behind these metrics is to compute errors in a deep
feature space that captures perceptually and semantically rich image features.
Despite their popularity, the effect that different deep features and their
design choices have on a perceptual metric has not been well studied. In this
work, we perform a causal analysis linking differences in semantic attributes
and distortions between face image distributions to Fr\'echet distances (FD)
using several popular deep feature spaces. A key component of our analysis is
the creation of synthetic counterfactual faces using deep face generators. Our
experiments show that the FD is heavily influenced by its feature space's
training dataset and objective function. For example, FD using features
extracted from ImageNet-trained models heavily emphasize hats over regions like
the eyes and mouth. Moreover, FD using features from a face gender classifier
emphasize hair length more than distances in an identity (recognition) feature
space. Finally, we evaluate several popular face generation models across
feature spaces and find that StyleGAN2 consistently ranks higher than other
face generators, except with respect to identity (recognition) features. This
suggests the need for considering multiple feature spaces when evaluating
generative models and using feature spaces that are tuned to nuances of the
domain of interest.Comment: Code and dataset to be released soo
Correlators of Mixed Symmetry Operators in Defect CFTs
We use the embedding formalism to study correlation functions of a
d-dimensional Euclidean CFT in the presence of a co-dimensional defect. The
defect breaks the global conformal group into . We calculate all possible invariant structures that can appear in
one-point, two-point and three-point correlation functions of bulk and defect
operators in mixed symmetry representation. Their generalization to n-point
correlation functions are also worked out. Correlation functions in the
presence of a defect, in arbitrary representation of , are also
calculated.Comment: 39 pages, 3 figures v2: published version. Corrected typos and
results from section 4.3 of v
An Unsupervised Learning Model for Deformable Medical Image Registration
We present a fast learning-based algorithm for deformable, pairwise 3D
medical image registration. Current registration methods optimize an objective
function independently for each pair of images, which can be time-consuming for
large data. We define registration as a parametric function, and optimize its
parameters given a set of images from a collection of interest. Given a new
pair of scans, we can quickly compute a registration field by directly
evaluating the function using the learned parameters. We model this function
using a convolutional neural network (CNN), and use a spatial transform layer
to reconstruct one image from another while imposing smoothness constraints on
the registration field. The proposed method does not require supervised
information such as ground truth registration fields or anatomical landmarks.
We demonstrate registration accuracy comparable to state-of-the-art 3D image
registration, while operating orders of magnitude faster in practice. Our
method promises to significantly speed up medical image analysis and processing
pipelines, while facilitating novel directions in learning-based registration
and its applications. Our code is available at
https://github.com/balakg/voxelmorph .Comment: 9 pages, in CVPR 201
Visualizing chest X-ray dataset biases using GANs
Recent work demonstrates that images from various chest X-ray datasets
contain visual features that are strongly correlated with protected demographic
attributes like race and gender. This finding raises issues of fairness, since
some of these factors may be used by downstream algorithms for clinical
predictions. In this work, we propose a framework, using generative adversarial
networks (GANs), to visualize what features are most different between X-rays
belonging to two demographic subgroups.Comment: Medical Imaging with Deep Learning(MIDL) 202
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Diffusion models have revolutionized image generation in recent years, yet
they are still limited to a few sizes and aspect ratios. We propose
ElasticDiffusion, a novel training-free decoding method that enables pretrained
text-to-image diffusion models to generate images with various sizes.
ElasticDiffusion attempts to decouple the generation trajectory of a pretrained
model into local and global signals. The local signal controls low-level pixel
information and can be estimated on local patches, while the global signal is
used to maintain overall structural consistency and is estimated with a
reference image. We test our method on CelebA-HQ (faces) and LAION-COCO
(objects/indoor/outdoor scenes). Our experiments and qualitative results show
superior image coherence quality across aspect ratios compared to
MultiDiffusion and the standard decoding strategy of Stable Diffusion. Project
page: https://elasticdiffusion.github.io/Comment: Accepted at CVPR 2024. Project Page:
https://elasticdiffusion.github.io
GELDA: A generative language annotation framework to reveal visual biases in datasets
Bias analysis is a crucial step in the process of creating fair datasets for
training and evaluating computer vision models. The bottleneck in dataset
analysis is annotation, which typically requires: (1) specifying a list of
attributes relevant to the dataset domain, and (2) classifying each
image-attribute pair. While the second step has made rapid progress in
automation, the first has remained human-centered, requiring an experimenter to
compile lists of in-domain attributes. However, an experimenter may have
limited foresight leading to annotation "blind spots," which in turn can lead
to flawed downstream dataset analyses. To combat this, we propose GELDA, a
nearly automatic framework that leverages large generative language models
(LLMs) to propose and label various attributes for a domain. GELDA takes a
user-defined domain caption (e.g., "a photo of a bird," "a photo of a living
room") and uses an LLM to hierarchically generate attributes. In addition,
GELDA uses the LLM to decide which of a set of vision-language models (VLMs) to
use to classify each attribute in images. Results on real datasets show that
GELDA can generate accurate and diverse visual attribute suggestions, and
uncover biases such as confounding between class labels and background
features. Results on synthetic datasets demonstrate that GELDA can be used to
evaluate the biases of text-to-image diffusion models and generative
adversarial networks. Overall, we show that while GELDA is not accurate enough
to replace human annotators, it can serve as a complementary tool to help
humans analyze datasets in a cheap, low-effort, and flexible manner.Comment: 21 pages, 15 figures, 9 table
- …