1 research outputs found
Fast Nearest-Neighbor Classification using RNN in Domains with Large Number of Classes
In scenarios involving text classification where the number of classes is
large (in multiples of 10000s) and training samples for each class are few and
often verbose, nearest neighbor methods are effective but very slow in
computing a similarity score with training samples of every class. On the other
hand, machine learning models are fast at runtime but training them adequately
is not feasible using few available training samples per class. In this paper,
we propose a hybrid approach that cascades 1) a fast but less-accurate
recurrent neural network (RNN) model and 2) a slow but more-accurate
nearest-neighbor model using bag of syntactic features.
Using the cascaded approach, our experiments, performed on data set from IT
support services where customer complaint text needs to be classified to return
top- possible error codes, show that the query-time of the slow system is
reduced to while its accuracy is being improved. Our approach
outperforms an LSH-based baseline for query-time reduction. We also derive a
lower bound on the accuracy of the cascaded model in terms of the accuracies of
the individual models. In any two-stage approach, choosing the right number of
candidates to pass on to the second stage is crucial. We prove a result that
aids in choosing this cutoff number for the cascaded system.Comment: 12 Pages, 2 Theorems, 6 Figures, 1 Algorith