Few images on the Web receive alt-text descriptions that would make them
accessible to blind and low vision (BLV) users. Image-based NLG systems have
progressed to the point where they can begin to address this persistent
societal problem, but these systems will not be fully successful unless we
evaluate them on metrics that guide their development correctly. Here, we argue
against current referenceless metrics -- those that don't rely on
human-generated ground-truth descriptions -- on the grounds that they do not
align with the needs of BLV users. The fundamental shortcoming of these metrics
is that they cannot take context into account, whereas contextual information
is highly valued by BLV users. To substantiate these claims, we present a study
with BLV participants who rated descriptions along a variety of dimensions. An
in-depth analysis reveals that the lack of context-awareness makes current
referenceless metrics inadequate for advancing image accessibility, requiring a
rethinking of referenceless evaluation metrics for image-based NLG systems