We introduce an architecture for large-scale image categorization that
enables the end-to-end learning of separate visual features for the different
classes to distinguish. The proposed model consists of a deep CNN shaped like a
tree. The stem of the tree includes a sequence of convolutional layers common
to all classes. The stem then splits into multiple branches implementing
parallel feature extractors, which are ultimately connected to the final
classification layer via learned gated connections. These learned gates
determine for each individual class the subset of features to use. Such a
scheme naturally encourages the learning of a heterogeneous set of specialized
features through the separate branches and it allows each class to use the
subset of features that are optimal for its recognition. We show the generality
of our proposed method by reshaping several popular CNNs from the literature
into our proposed architecture. Our experiments on the CIFAR100, CIFAR10, and
Synth datasets show that in each case our resulting model yields a substantial
improvement in accuracy over the original CNN. Our empirical analysis also
suggests that our scheme acts as a form of beneficial regularization improving
generalization performance.Comment: WACV 201