Large visual-language models, like CLIP, learn generalized representations
and have shown promising zero-shot performance. Few-shot adaptation methods,
based on prompt tuning, have also been shown to further improve performance on
downstream datasets. However, these models are not hierarchically consistent.
Frequently, they infer incorrect labels at coarser taxonomic class levels, even
when the inference at the leaf level (original class labels) is correct. This
is problematic, given their support for open set classification and, in
particular, open-grained classification, where practitioners define label sets
at various levels of granularity. To address this problem, we propose a prompt
tuning technique to calibrate the hierarchical consistency of model
predictions. A set of metrics of hierarchical consistency, the Hierarchical
Consistent Accuracy (HCA) and the Mean Treecut Accuracy (MTA), are first
proposed to benchmark model performance in the open-granularity setting. A
prompt tuning technique, denoted as Prompt Tuning for Hierarchical Consistency
(ProTeCt), is then proposed to calibrate classification across all possible
label set granularities. Results show that ProTeCt can be combined with
existing prompt tuning methods to significantly improve open-granularity
classification performance without degradation of the original classification
performance at the leaf level