Bias analysis is a crucial step in the process of creating fair datasets for
training and evaluating computer vision models. The bottleneck in dataset
analysis is annotation, which typically requires: (1) specifying a list of
attributes relevant to the dataset domain, and (2) classifying each
image-attribute pair. While the second step has made rapid progress in
automation, the first has remained human-centered, requiring an experimenter to
compile lists of in-domain attributes. However, an experimenter may have
limited foresight leading to annotation "blind spots," which in turn can lead
to flawed downstream dataset analyses. To combat this, we propose GELDA, a
nearly automatic framework that leverages large generative language models
(LLMs) to propose and label various attributes for a domain. GELDA takes a
user-defined domain caption (e.g., "a photo of a bird," "a photo of a living
room") and uses an LLM to hierarchically generate attributes. In addition,
GELDA uses the LLM to decide which of a set of vision-language models (VLMs) to
use to classify each attribute in images. Results on real datasets show that
GELDA can generate accurate and diverse visual attribute suggestions, and
uncover biases such as confounding between class labels and background
features. Results on synthetic datasets demonstrate that GELDA can be used to
evaluate the biases of text-to-image diffusion models and generative
adversarial networks. Overall, we show that while GELDA is not accurate enough
to replace human annotators, it can serve as a complementary tool to help
humans analyze datasets in a cheap, low-effort, and flexible manner.Comment: 21 pages, 15 figures, 9 table