50,711 research outputs found
TransHP: Image Classification with Hierarchical Prompting
This paper explores a hierarchical prompting mechanism for the hierarchical
image classification (HIC) task. Different from prior HIC methods, our
hierarchical prompting is the first to explicitly inject ancestor-class
information as a tokenized hint that benefits the descendant-class
discrimination. We think it well imitates human visual recognition, i.e.,
humans may use the ancestor class as a prompt to draw focus on the subtle
differences among descendant classes. We model this prompting mechanism into a
Transformer with Hierarchical Prompting (TransHP). TransHP consists of three
steps: 1) learning a set of prompt tokens to represent the coarse (ancestor)
classes, 2) on-the-fly predicting the coarse class of the input image at an
intermediate block, and 3) injecting the prompt token of the predicted coarse
class into the intermediate feature. Though the parameters of TransHP maintain
the same for all input images, the injected coarse-class prompt conditions
(modifies) the subsequent feature extraction and encourages a dynamic focus on
relatively subtle differences among the descendant classes. Extensive
experiments show that TransHP improves image classification on accuracy (e.g.,
improving ViT-B/16 by +2.83% ImageNet classification accuracy), training data
efficiency (e.g., +12.69% improvement under 10% ImageNet training data), and
model explainability. Moreover, TransHP also performs favorably against prior
HIC methods, showing that TransHP well exploits the hierarchical information
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
- âŠ