In this paper we propose cross-modal convolutional neural networks (X-CNNs),
a novel biologically inspired type of CNN architectures, treating gradient
descent-specialised CNNs as individual units of processing in a larger-scale
network topology, while allowing for unconstrained information flow and/or
weight sharing between analogous hidden layers of the network---thus
generalising the already well-established concept of neural network ensembles
(where information typically may flow only between the output layers of the
individual networks). The constituent networks are individually designed to
learn the output function on their own subset of the input data, after which
cross-connections between them are introduced after each pooling operation to
periodically allow for information exchange between them. This injection of
knowledge into a model (by prior partition of the input data through domain
knowledge or unsupervised methods) is expected to yield greatest returns in
sparse data environments, which are typically less suitable for training CNNs.
For evaluation purposes, we have compared a standard four-layer CNN as well as
a sophisticated FitNet4 architecture against their cross-modal variants on the
CIFAR-10 and CIFAR-100 datasets with differing percentages of the training data
being removed, and find that at lower levels of data availability, the X-CNNs
significantly outperform their baselines (typically providing a 2--6% benefit,
depending on the dataset size and whether data augmentation is used), while
still maintaining an edge on all of the full dataset tests.Comment: To appear in the 7th IEEE Symposium Series on Computational
Intelligence (IEEE SSCI 2016), 8 pages, 6 figures. Minor revisions, in
response to reviewers' comment