1 research outputs found
Quantifying the Uncertainty of Precision Estimates for Rule based Text Classifiers
Rule based classifiers that use the presence and absence of key sub-strings
to make classification decisions have a natural mechanism for quantifying the
uncertainty of their precision. For a binary classifier, the key insight is to
treat partitions of the sub-string set induced by the documents as Bernoulli
random variables. The mean value of each random variable is an estimate of the
classifier's precision when presented with a document inducing that partition.
These means can be compared, using standard statistical tests, to a desired or
expected classifier precision. A set of binary classifiers can be combined into
a single, multi-label classifier by an application of the Dempster-Shafer
theory of evidence. The utility of this approach is demonstrated with a
benchmark problem