We introduce a new method for building classification models when we have
prior knowledge of how the classes can be arranged in a hierarchy, based on how
easily they can be distinguished. The new method uses a Bayesian form of the
multinomial logit (MNL, a.k.a. ``softmax'') model, with a prior that introduces
correlations between the parameters for classes that are nearby in the tree. We
compare the performance on simulated data of the new method, the ordinary MNL
model, and a model that uses the hierarchy in different way. We also test the
new method on a document labelling problem, and find that it performs better
than the other methods, particularly when the amount of training data is small