5 research outputs found
F?D: On understanding the role of deep feature spaces on face generation evaluation
Perceptual metrics, like the Fr\'echet Inception Distance (FID), are widely
used to assess the similarity between synthetically generated and ground truth
(real) images. The key idea behind these metrics is to compute errors in a deep
feature space that captures perceptually and semantically rich image features.
Despite their popularity, the effect that different deep features and their
design choices have on a perceptual metric has not been well studied. In this
work, we perform a causal analysis linking differences in semantic attributes
and distortions between face image distributions to Fr\'echet distances (FD)
using several popular deep feature spaces. A key component of our analysis is
the creation of synthetic counterfactual faces using deep face generators. Our
experiments show that the FD is heavily influenced by its feature space's
training dataset and objective function. For example, FD using features
extracted from ImageNet-trained models heavily emphasize hats over regions like
the eyes and mouth. Moreover, FD using features from a face gender classifier
emphasize hair length more than distances in an identity (recognition) feature
space. Finally, we evaluate several popular face generation models across
feature spaces and find that StyleGAN2 consistently ranks higher than other
face generators, except with respect to identity (recognition) features. This
suggests the need for considering multiple feature spaces when evaluating
generative models and using feature spaces that are tuned to nuances of the
domain of interest.Comment: Code and dataset to be released soo
GELDA: A generative language annotation framework to reveal visual biases in datasets
Bias analysis is a crucial step in the process of creating fair datasets for
training and evaluating computer vision models. The bottleneck in dataset
analysis is annotation, which typically requires: (1) specifying a list of
attributes relevant to the dataset domain, and (2) classifying each
image-attribute pair. While the second step has made rapid progress in
automation, the first has remained human-centered, requiring an experimenter to
compile lists of in-domain attributes. However, an experimenter may have
limited foresight leading to annotation "blind spots," which in turn can lead
to flawed downstream dataset analyses. To combat this, we propose GELDA, a
nearly automatic framework that leverages large generative language models
(LLMs) to propose and label various attributes for a domain. GELDA takes a
user-defined domain caption (e.g., "a photo of a bird," "a photo of a living
room") and uses an LLM to hierarchically generate attributes. In addition,
GELDA uses the LLM to decide which of a set of vision-language models (VLMs) to
use to classify each attribute in images. Results on real datasets show that
GELDA can generate accurate and diverse visual attribute suggestions, and
uncover biases such as confounding between class labels and background
features. Results on synthetic datasets demonstrate that GELDA can be used to
evaluate the biases of text-to-image diffusion models and generative
adversarial networks. Overall, we show that while GELDA is not accurate enough
to replace human annotators, it can serve as a complementary tool to help
humans analyze datasets in a cheap, low-effort, and flexible manner.Comment: 21 pages, 15 figures, 9 table
Recommended from our members
Diverse R-PPG: Contactless Smartphone Camera-based Heart Rate Estimation for Diverse Skin Tones and Scenes
Heart rate (HR) is an essential clinical measure for the assessment of cardiorespiratory instability. The growing telemedicine market opens up the urgent requirement for scalable yet affordable remote HR estimation. Smartphones that use in-built camera modules to measure HR from facial videos offer a more economical solution in comparison to mass deployment of wearable sensors. However, existing computer vision methods that estimate HR from facial videos exhibit biased performance against dark skin tones. This is a major concern, since communities of color are disproportionately affected by both COVID-19 and cardiovascular disease. We identify the origin of this bias and present a novel physics-driven algorithm that boosts performance on darker skin tones in our reported data. We assess the performance of our method through the creation of the first telemedicine-focused remote vital signs dataset, the VITAL dataset. 472 videos (~944 minutes) of 59 subjects with diverse skin tones are recorded under realistic scene conditions with corresponding vital sign data. Our method reduces errors due to lighting changes, shadows, and specular highlights and imparts unbiased performance gains across skin tones, setting the stage for making non-contact HR sensing technologies a viable reality for patients across skin tones, using just smartphone cameras
Recommended from our members
Diverse R-PPG: Contactless Smartphone Camera-based Heart Rate Estimation for Diverse Skin Tones and Scenes
Heart rate (HR) is an essential clinical measure for the assessment of cardiorespiratory instability. The growing telemedicine market opens up the urgent requirement for scalable yet affordable remote HR estimation. Smartphones that use in-built camera modules to measure HR from facial videos offer a more economical solution in comparison to mass deployment of wearable sensors. However, existing computer vision methods that estimate HR from facial videos exhibit biased performance against dark skin tones. This is a major concern, since communities of color are disproportionately affected by both COVID-19 and cardiovascular disease. We identify the origin of this bias and present a novel physics-driven algorithm that boosts performance on darker skin tones in our reported data. We assess the performance of our method through the creation of the first telemedicine-focused remote vital signs dataset, the VITAL dataset. 472 videos (~944 minutes) of 59 subjects with diverse skin tones are recorded under realistic scene conditions with corresponding vital sign data. Our method reduces errors due to lighting changes, shadows, and specular highlights and imparts unbiased performance gains across skin tones, setting the stage for making non-contact HR sensing technologies a viable reality for patients across skin tones, using just smartphone cameras