1 research outputs found
Unsupervised Label Refinement Improves Dataless Text Classification
Dataless text classification is capable of classifying documents into
previously unseen labels by assigning a score to any document paired with a
label description. While promising, it crucially relies on accurate
descriptions of the label set for each downstream task. This reliance causes
dataless classifiers to be highly sensitive to the choice of label descriptions
and hinders the broader application of dataless classification in practice. In
this paper, we ask the following question: how can we improve dataless text
classification using the inputs of the downstream task dataset? Our primary
solution is a clustering based approach. Given a dataless classifier, our
approach refines its set of predictions using k-means clustering. We
demonstrate the broad applicability of our approach by improving the
performance of two widely used classifier architectures, one that encodes
text-category pairs with two independent encoders and one with a single joint
encoder. Experiments show that our approach consistently improves dataless
classification across different datasets and makes the classifier more robust
to the choice of label descriptions