34 research outputs found
Unsupervised Learning of Semantic Audio Representations
Even in the absence of any explicit semantic annotation, vast collections of
audio recordings provide valuable information for learning the categorical
structure of sounds. We consider several class-agnostic semantic constraints
that apply to unlabeled nonspeech audio: (i) noise and translations in time do
not change the underlying sound category, (ii) a mixture of two sound events
inherits the categories of the constituents, and (iii) the categories of events
in close temporal proximity are likely to be the same or related. Without
labels to ground them, these constraints are incompatible with classification
loss functions. However, they may still be leveraged to identify geometric
inequalities needed for triplet loss-based training of convolutional neural
networks. The result is low-dimensional embeddings of the input spectrograms
that recover 41% and 84% of the performance of their fully-supervised
counterparts when applied to downstream query-by-example sound retrieval and
sound event classification tasks, respectively. Moreover, in
limited-supervision settings, our unsupervised embeddings double the
state-of-the-art classification performance.Comment: Submitted to ICASSP 201
CNN Architectures for Large-Scale Audio Classification
Convolutional Neural Networks (CNNs) have proven very effective in image
classification and show promise for audio. We use various CNN architectures to
classify the soundtracks of a dataset of 70M training videos (5.24 million
hours) with 30,871 video-level labels. We examine fully connected Deep Neural
Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We
investigate varying the size of both training set and label vocabulary, finding
that analogs of the CNNs used in image classification do well on our audio
classification task, and larger training and label sets help up to a point. A
model using embeddings from these classifiers does much better than raw
features on the Audio Set [5] Acoustic Event Detection (AED) classification
task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of
mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on
changes of latest Audio Set revision. Changed wording to fit 4 page limit
with new addition
Receptor-Mediated Gonadotropin Action in Ovary
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/66088/1/j.1432-1033.1981.tb05481.x.pd
Recommended from our members
The bii4africa dataset of faunal and floral population intactness estimates across Africa’s major land uses
Sub-Saharan Africa is under-represented in global biodiversity datasets, particularly regarding the impact of land use on species’ population abundances. Drawing on recent advances in expert elicitation to ensure data consistency, 200 experts were convened using a modified-Delphi process to estimate ‘intactness scores’: the remaining proportion of an ‘intact’ reference population of a species group in a particular land use, on a scale from 0 (no remaining individuals) to 1 (same abundance as the reference) and, in rare cases, to 2 (populations that thrive in human-modified landscapes). The resulting bii4africa dataset contains intactness scores representing terrestrial vertebrates (tetrapods: ±5,400 amphibians, reptiles, birds, mammals) and vascular plants (±45,000 forbs, graminoids, trees, shrubs) in sub-Saharan Africa across the region’s major land uses (urban, cropland, rangeland, plantation, protected, etc.) and intensities (e.g., large-scale vs smallholder cropland). This dataset was co-produced as part of the Biodiversity Intactness Index for Africa Project. Additional uses include assessing ecosystem condition; rectifying geographic/ taxonomic biases in global biodiversity indicators and maps; and informing the Red List of Ecosystems
Noise-invariant Neurons in the Avian Auditory Cortex: Hearing the Song in Noise
<div><p>Given the extraordinary ability of humans and animals to recognize communication signals over a background of noise, describing noise invariant neural responses is critical not only to pinpoint the brain regions that are mediating our robust perceptions but also to understand the neural computations that are performing these tasks and the underlying circuitry. Although invariant neural responses, such as rotation-invariant face cells, are well described in the visual system, high-level auditory neurons that can represent the same behaviorally relevant signal in a range of listening conditions have yet to be discovered. Here we found neurons in a secondary area of the avian auditory cortex that exhibit noise-invariant responses in the sense that they responded with similar spike patterns to song stimuli presented in silence and over a background of naturalistic noise. By characterizing the neurons' tuning in terms of their responses to modulations in the temporal and spectral envelope of the sound, we then show that noise invariance is partly achieved by selectively responding to long sounds with sharp spectral structure. Finally, to demonstrate that such computations could explain noise invariance, we designed a biologically inspired noise-filtering algorithm that can be used to separate song or speech from noise. This novel noise-filtering method performs as well as other state-of-the-art de-noising algorithms and could be used in clinical or consumer oriented applications. Our biologically inspired model also shows how high-level noise-invariant responses could be created from neural responses typically found in primary auditory cortex.</p> </div
Model STRFs for noise reduction.
<p><b>A</b>. The eight most positively (top) and most negatively (bottom) weighted STRFs from the noise reduction algorithm trained with a background of colony noise. <b>B</b>, Same as in A, but for the model trained with a background of modulation-limited noise. <b>C.</b> The ensemble modulation transfer functions for the top 16 and bottom 16 STRFs for the model trained in colony noise, sorted as in A. <b>D</b> Same as in C, but for the model trained in modulation-limited noise.</p