13,979 research outputs found
Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation
To date a number of studies have shown that receptive field shapes of early
sensory neurons can be reproduced by optimizing coding efficiency of natural
stimulus ensembles. A still unresolved question is whether the efficient coding
hypothesis explains formation of neurons which explicitly represent
environmental features of different functional importance. This paper proposes
that the spatial selectivity of higher auditory neurons emerges as a direct
consequence of learning efficient codes for natural binaural sounds. Firstly,
it is demonstrated that a linear efficient coding transform - Independent
Component Analysis (ICA) trained on spectrograms of naturalistic simulated
binaural sounds extracts spatial information present in the signal. A simple
hierarchical ICA extension allowing for decoding of sound position is proposed.
Furthermore, it is shown that units revealing spatial selectivity can be
learned from a binaural recording of a natural auditory scene. In both cases a
relatively small subpopulation of learned spectrogram features suffices to
perform accurate sound localization. Representation of the auditory space is
therefore learned in a purely unsupervised way by maximizing the coding
efficiency and without any task-specific constraints. This results imply that
efficient coding is a useful strategy for learning structures which allow for
making behaviorally vital inferences about the environment.Comment: 22 pages, 9 figure
Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus
We have developed a sparse mathematical representation of speech that
minimizes the number of active model neurons needed to represent typical speech
sounds. The model learns several well-known acoustic features of speech such as
harmonic stacks, formants, onsets and terminations, but we also find more
exotic structures in the spectrogram representation of sound such as localized
checkerboard patterns and frequency-modulated excitatory subregions flanked by
suppressive sidebands. Moreover, several of these novel features resemble
neuronal receptive fields reported in the Inferior Colliculus (IC), as well as
auditory thalamus and cortex, and our model neurons exhibit the same tradeoff
in spectrotemporal resolution as has been observed in IC. To our knowledge,
this is the first demonstration that receptive fields of neurons in the
ascending mammalian auditory pathway beyond the auditory nerve can be predicted
based on coding principles and the statistical properties of recorded sounds.Comment: For Supporting Information, see PLoS website:
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.100259
Impaired Auditory Temporal Selectivity in the Inferior Colliculus of Aged Mongolian Gerbils
Aged humans show severe difficulties in temporal auditory processing tasks (e.g., speech recognition in noise, low-frequency sound localization, gap detection). A degradation of auditory function with age is also evident in experimental animals. To investigate age-related changes in temporal processing, we compared extracellular responses to temporally variable pulse trains and human speech in the inferior colliculus of young adult (3 month) and aged (3 years) Mongolian gerbils. We observed a significant decrease of selectivity to the pulse trains in neuronal responses from aged animals. This decrease in selectivity led, on the population level, to an increase in signal correlations and therefore a decrease in heterogeneity of temporal receptive fields and a decreased efficiency in encoding of speech signals. A decrease in selectivity to temporal modulations is consistent with a downregulation of the inhibitory transmitter system in aged animals. These alterations in temporal processing could underlie declines in the aging auditory system, which are unrelated to peripheral hearing loss. These declines cannot be compensated by traditional hearing aids (that rely on amplification of sound) but may rather require pharmacological treatment
Audio Event Detection using Weakly Labeled Data
Acoustic event detection is essential for content analysis and description of
multimedia recordings. The majority of current literature on the topic learns
the detectors through fully-supervised techniques employing strongly labeled
data. However, the labels available for majority of multimedia data are
generally weak and do not provide sufficient detail for such methods to be
employed. In this paper we propose a framework for learning acoustic event
detectors using only weakly labeled data. We first show that audio event
detection using weak labels can be formulated as an Multiple Instance Learning
problem. We then suggest two frameworks for solving multiple-instance learning,
one based on support vector machines, and the other on neural networks. The
proposed methods can help in removing the time consuming and expensive process
of manually annotating data to facilitate fully supervised learning. Moreover,
it can not only detect events in a recording but can also provide temporal
locations of events in the recording. This helps in obtaining a complete
description of the recording and is notable since temporal information was
never known in the first place in weakly labeled data.Comment: ACM Multimedia 201
Recognition of Harmonic Sounds in Polyphonic Audio using a Missing Feature Approach: Extended Report
A method based on local spectral features and missing feature techniques
is proposed for the recognition of harmonic sounds in mixture
signals. A mask estimation algorithm is proposed for identifying
spectral regions that contain reliable information for each sound
source and then bounded marginalization is employed to treat the
feature vector elements that are determined as unreliable. The proposed
method is tested on musical instrument sounds due to the
extensive availability of data but it can be applied on other sounds
(i.e. animal sounds, environmental sounds), whenever these are harmonic.
In simulations the proposed method clearly outperformed a
baseline method for mixture signals
- …