Numerosity, the number of objects in a set, is a basic property of a given
visual scene. Many animals develop the perceptual ability to subitize: the
near-instantaneous identification of the numerosity in small sets of visual
items. In computer vision, it has been shown that numerosity emerges as a
statistical property in neural networks during unsupervised learning from
simple synthetic images. In this work, we focus on more complex natural images
using unsupervised hierarchical neural networks. Specifically, we show that
variational autoencoders are able to spontaneously perform subitizing after
training without supervision on a large amount images from the Salient Object
Subitizing dataset. While our method is unable to outperform supervised
convolutional networks for subitizing, we observe that the networks learn to
encode numerosity as basic visual property. Moreover, we find that the learned
representations are likely invariant to object area; an observation in
alignment with studies on biological neural networks in cognitive neuroscience