This paper presents a theory by which idealized models of auditory receptive
fields can be derived in a principled axiomatic manner, from a set of
structural properties to enable invariance of receptive field responses under
natural sound transformations and ensure internal consistency between
spectro-temporal receptive fields at different temporal and spectral scales.
For defining a time-frequency transformation of a purely temporal sound
signal, it is shown that the framework allows for a new way of deriving the
Gabor and Gammatone filters as well as a novel family of generalized Gammatone
filters, with additional degrees of freedom to obtain different trade-offs
between the spectral selectivity and the temporal delay of time-causal temporal
window functions.
When applied to the definition of a second-layer of receptive fields from a
spectrogram, it is shown that the framework leads to two canonical families of
spectro-temporal receptive fields, in terms of spectro-temporal derivatives of
either spectro-temporal Gaussian kernels for non-causal time or the combination
of a time-causal generalized Gammatone filter over the temporal domain and a
Gaussian filter over the logspectral domain. For each filter family, the
spectro-temporal receptive fields can be either separable over the
time-frequency domain or be adapted to local glissando transformations that
represent variations in logarithmic frequencies over time. Within each domain
of either non-causal or time-causal time, these receptive field families are
derived by uniqueness from the assumptions.
It is demonstrated how the presented framework allows for computation of
basic auditory features for audio processing and that it leads to predictions
about auditory receptive fields with good qualitative similarity to biological
receptive fields measured in the inferior colliculus (ICC) and primary auditory
cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table