Hierarchical Text Categorization (HTC) is becoming increasingly important
with the rapidly growing amount of text data available in the World Wide Web.
Among the different strategies proposed to cope with HTC, the Local Classifier
per Node (LCN) approach attains good performance by mirroring the underlying
class hierarchy while enforcing a top-down strategy in the testing step.
However, the problem of embedding hierarchical information (parent-child
relationship) to improve the performance of HTC systems still remains open. A
confidence evaluation method for a selected route in the hierarchy is proposed
to evaluate the reliability of the final candidate labels in an HTC system. In
order to take into account the information embedded in the hierarchy, weight
factors are used to take into account the importance of each level. An
acceptance/rejection strategy in the top-down decision making process is
proposed, which improves the overall categorization accuracy by rejecting a few
percentage of samples, i.e., those with low reliability score. Experimental
results on the Reuters benchmark dataset (RCV1- v2) confirm the effectiveness
of the proposed method, compared to other state-of-the art HTC methods