Most deep architectures for image classification--even those that are trained
to classify a large number of diverse categories--learn shared image
representations with a single model. Intuitively, however, categories that are
more similar should share more information than those that are very different.
While hierarchical deep networks address this problem by learning separate
features for subsets of related categories, current implementations require
simplified models using fixed architectures specified via heuristic clustering
methods. Instead, we propose Blockout, a method for regularization and model
selection that simultaneously learns both the model architecture and
parameters. A generalization of Dropout, our approach gives a novel
parametrization of hierarchical architectures that allows for structure
learning via back-propagation. To demonstrate its utility, we evaluate Blockout
on the CIFAR and ImageNet datasets, demonstrating improved classification
accuracy, better regularization performance, faster training, and the clear
emergence of hierarchical network structures