4,237 research outputs found
Exact asymptotics of monomer-dimer model on rectangular semi-infinite lattices
By using the asymptotic theory of Pemantle and Wilson, exact asymptotic
expansions of the free energy of the monomer-dimer model on rectangular lattices in terms of dimer density are obtained for small values
of , at both high and low dimer density limits. In the high dimer density
limit, the theoretical results confirm the dependence of the free energy on the
parity of , a result obtained previously by computational methods. In the
low dimer density limit, the free energy on a cylinder
lattice strip has exactly the same first terms in the series expansion as
that of infinite lattice.Comment: 9 pages, 6 table
Surrey-cvssp system for DCASE2017 challenge task4
In this technique report, we present a bunch of methods for the task 4 of
Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017)
challenge. This task evaluates systems for the large-scale detection of sound
events using weakly labeled training data. The data are YouTube video excerpts
focusing on transportation and warnings due to their industry applications.
There are two tasks, audio tagging and sound event detection from weakly
labeled data. Convolutional neural network (CNN) and gated recurrent unit (GRU)
based recurrent neural network (RNN) are adopted as our basic framework. We
proposed a learnable gating activation function for selecting informative local
features. Attention-based scheme is used for localizing the specific events in
a weakly-supervised mode. A new batch-level balancing strategy is also proposed
to tackle the data unbalancing problem. Fusion of posteriors from different
systems are found effective to improve the performance. In a summary, we get
61% F-value for the audio tagging subtask and 0.73 error rate (ER) for the
sound event detection subtask on the development set. While the official
multilayer perceptron (MLP) based baseline just obtained 13.1% F-value for the
audio tagging and 1.02 for the sound event detection.Comment: DCASE2017 challenge ranked 1st system, task4, tech repor
Audio Set classification with attention model: A probabilistic perspective
This paper investigates the classification of the Audio Set dataset. Audio
Set is a large scale weakly labelled dataset of sound clips. Previous work used
multiple instance learning (MIL) to classify weakly labelled data. In MIL, a
bag consists of several instances, and a bag is labelled positive if at least
one instances in the audio clip is positive. A bag is labelled negative if all
the instances in the bag are negative. We propose an attention model to tackle
the MIL problem and explain this attention model from a novel probabilistic
perspective. We define a probability space on each bag, where each instance in
the bag has a trainable probability measure for each class. Then the
classification of a bag is the expectation of the classification output of the
instances in the bag with respect to the learned probability measure.
Experimental results show that our proposed attention model modeled by fully
connected deep neural network obtains mAP of 0.327 on Audio Set dataset,
outperforming the Google's baseline of 0.314 and recurrent neural network of
0.325.Comment: Accepted by ICASSP 201
A joint separation-classification model for sound event detection of weakly labelled data
Source separation (SS) aims to separate individual sources from an audio
recording. Sound event detection (SED) aims to detect sound events from an
audio recording. We propose a joint separation-classification (JSC) model
trained only on weakly labelled audio data, that is, only the tags of an audio
recording are known but the time of the events are unknown. First, we propose a
separation mapping from the time-frequency (T-F) representation of an audio to
the T-F segmentation masks of the audio events. Second, a classification
mapping is built from each T-F segmentation mask to the presence probability of
each audio event. In the source separation stage, sources of audio events and
time of sound events can be obtained from the T-F segmentation masks. The
proposed method achieves an equal error rate (EER) of 0.14 in SED,
outperforming deep neural network baseline of 0.29. Source separation SDR of
8.08 dB is obtained by using global weighted rank pooling (GWRP) as probability
mapping, outperforming the global max pooling (GMP) based probability mapping
giving SDR at 0.03 dB. Source code of our work is published.Comment: Accepted by ICASSP 201
- …