What spatial frequency information do humans and neural networks use to
recognize objects? In neuroscience, critical band masking is an established
tool that can reveal the frequency-selective filters used for object
recognition. Critical band masking measures the sensitivity of recognition
performance to noise added at each spatial frequency. Existing critical band
masking studies show that humans recognize periodic patterns (gratings) and
letters by means of a spatial-frequency filter (or "channel'') that has a
frequency bandwidth of one octave (doubling of frequency). Here, we introduce
critical band masking as a task for network-human comparison and test 14 humans
and 76 neural networks on 16-way ImageNet categorization in the presence of
narrowband noise. We find that humans recognize objects in natural images using
the same one-octave-wide channel that they use for letters and gratings, making
it a canonical feature of human object recognition. On the other hand, the
neural network channel, across various architectures and training strategies,
is 2-4 times as wide as the human channel. In other words, networks are
vulnerable to high and low frequency noise that does not affect human
performance. Adversarial and augmented-image training are commonly used to
increase network robustness and shape bias. Does this training align network
and human object recognition channels? Three network channel properties
(bandwidth, center frequency, peak noise sensitivity) correlate strongly with
shape bias (53% variance explained) and with robustness of
adversarially-trained networks (74% variance explained). Adversarial training
increases robustness but expands the channel bandwidth even further away from
the human bandwidth. Thus, critical band masking reveals that the network
channel is more than twice as wide as the human channel, and that adversarial
training only increases this difference.Comment: Accepted to Neural Information Processing Systems (NeurIPS) 2023
(Oral Presentation