Hierarchical classification (HC) assigns each object with multiple labels
organized into a hierarchical structure. The existing deep learning based HC
methods usually predict an instance starting from the root node until a leaf
node is reached. However, in the real world, images interfered by noise,
occlusion, blur, or low resolution may not provide sufficient information for
the classification at subordinate levels. To address this issue, we propose a
novel semantic guided level-category hybrid prediction network (SGLCHPN) that
can jointly perform the level and category prediction in an end-to-end manner.
SGLCHPN comprises two modules: a visual transformer that extracts feature
vectors from the input images, and a semantic guided cross-attention module
that uses categories word embeddings as queries to guide learning
category-specific representations. In order to evaluate the proposed method, we
construct two new datasets in which images are at a broad range of quality and
thus are labeled to different levels (depths) in the hierarchy according to
their individual quality. Experimental results demonstrate the effectiveness of
our proposed HC method.Comment: 3 figure