Most classifiers rely on discriminative boundaries that separate instances of
each class from everything else. We argue that discriminative boundaries are
counter-intuitive as they define semantics by what-they-are-not; and should be
replaced by generative classifiers which define semantics by what-they-are.
Unfortunately, generative classifiers are significantly less accurate. This may
be caused by the tendency of generative models to focus on easy to model
semantic generative factors and ignore non-semantic factors that are important
but difficult to model. We propose a new generative model in which semantic
factors are accommodated by shell theory's hierarchical generative process and
non-semantic factors by an instance specific noise term. We use the model to
develop a classification scheme which suppresses the impact of noise while
preserving semantic cues. The result is a surprisingly accurate generative
classifier, that takes the form of a modified nearest-neighbor algorithm; we
term it distance classification. Unlike discriminative classifiers, a distance
classifier: defines semantics by what-they-are; is amenable to incremental
updates; and scales well with the number of classes.Comment: accepted by IJC