1,188 research outputs found
Approximate Message Passing for Underdetermined Audio Source Separation
Approximate message passing (AMP) algorithms have shown great promise in
sparse signal reconstruction due to their low computational requirements and
fast convergence to an exact solution. Moreover, they provide a probabilistic
framework that is often more intuitive than alternatives such as convex
optimisation. In this paper, AMP is used for audio source separation from
underdetermined instantaneous mixtures. In the time-frequency domain, it is
typical to assume a priori that the sources are sparse, so we solve the
corresponding sparse linear inverse problem using AMP. We present a block-based
approach that uses AMP to process multiple time-frequency points
simultaneously. Two algorithms known as AMP and vector AMP (VAMP) are evaluated
in particular. Results show that they are promising in terms of artefact
suppression.Comment: Paper accepted for 3rd International Conference on Intelligent Signal
Processing (ISP 2017
Environmental Sound Classification with Parallel Temporal-spectral Attention
Convolutional neural networks (CNN) are one of the best-performing neural
network architectures for environmental sound classification (ESC). Recently,
temporal attention mechanisms have been used in CNN to capture the useful
information from the relevant time frames for audio classification, especially
for weakly labelled data where the onset and offset times of the sound events
are not applied. In these methods, however, the inherent spectral
characteristics and variations are not explicitly exploited when obtaining the
deep features. In this paper, we propose a novel parallel temporal-spectral
attention mechanism for CNN to learn discriminative sound representations,
which enhances the temporal and spectral features by capturing the importance
of different time frames and frequency bands. Parallel branches are constructed
to allow temporal attention and spectral attention to be applied respectively
in order to mitigate interference from the segments without the presence of
sound events. The experiments on three environmental sound classification (ESC)
datasets and two acoustic scene classification (ASC) datasets show that our
method improves the classification performance and also exhibits robustness to
noise.Comment: submitted to INTERSPEECH202
Structural Deep Embedding for Hyper-Networks
Network embedding has recently attracted lots of attentions in data mining.
Existing network embedding methods mainly focus on networks with pairwise
relationships. In real world, however, the relationships among data points
could go beyond pairwise, i.e., three or more objects are involved in each
relationship represented by a hyperedge, thus forming hyper-networks. These
hyper-networks pose great challenges to existing network embedding methods when
the hyperedges are indecomposable, that is to say, any subset of nodes in a
hyperedge cannot form another hyperedge. These indecomposable hyperedges are
especially common in heterogeneous networks. In this paper, we propose a novel
Deep Hyper-Network Embedding (DHNE) model to embed hyper-networks with
indecomposable hyperedges. More specifically, we theoretically prove that any
linear similarity metric in embedding space commonly used in existing methods
cannot maintain the indecomposibility property in hyper-networks, and thus
propose a new deep model to realize a non-linear tuplewise similarity function
while preserving both local and global proximities in the formed embedding
space. We conduct extensive experiments on four different types of
hyper-networks, including a GPS network, an online social network, a drug
network and a semantic network. The empirical results demonstrate that our
method can significantly and consistently outperform the state-of-the-art
algorithms.Comment: Accepted by AAAI 1
Investigations of supernovae and supernova remnants in the era of SKA
Two main physical mechanisms are used to explain supernova explosions:
thermonuclear explosion of a white dwarf(Type Ia) and core collapse of a
massive star (Type II and Type Ib/Ic). Type Ia supernovae serve as distance
indicators that led to the discovery of the accelerating expansion of the
Universe. The exact nature of their progenitor systems however remain unclear.
Radio emission from the interaction between the explosion shock front and its
surrounding CSM or ISM provides an important probe into the progenitor star's
last evolutionary stage. No radio emission has yet been detected from Type Ia
supernovae by current telescopes. The SKA will hopefully detect radio emission
from Type Ia supernovae due to its much better sensitivity and resolution.
There is a 'supernovae rate problem' for the core collapse supernovae because
the optically dim ones are missed due to being intrinsically faint and/or due
to dust obscuration. A number of dust-enshrouded optically hidden supernovae
should be discovered via SKA1-MID/survey, especially for those located in the
innermost regions of their host galaxies. Meanwhile, the detection of
intrinsically dim SNe will also benefit from SKA1. The detection rate will
provide unique information about the current star formation rate and the
initial mass function. A supernova explosion triggers a shock wave which expels
and heats the surrounding CSM and ISM, and forms a supernova remnant (SNR). It
is expected that more SNRs will be discovered by the SKA. This may decrease the
discrepancy between the expected and observed numbers of SNRs. Several SNRs
have been confirmed to accelerate protons, the main component of cosmic rays,
to very high energy by their shocks. This brings us hope of solving the
Galactic cosmic ray origin's puzzle by combining the low frequency (SKA) and
very high frequency (Cherenkov Telescope Array: CTA) bands' observations of
SNRs.Comment: To be published in: "Advancing Astrophysics with the Square Kilometre
Array", Proceedings of Science, PoS(AASKA14
Matrix of Polynomials Model based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling
We study the problem of dictionary learning for signals that can be
represented as polynomials or polynomial matrices, such as convolutive signals
with time delays or acoustic impulse responses. Recently, we developed a method
for polynomial dictionary learning based on the fact that a polynomial matrix
can be expressed as a polynomial with matrix coefficients, where the
coefficient of the polynomial at each time lag is a scalar matrix. However, a
polynomial matrix can be also equally represented as a matrix with polynomial
elements. In this paper, we develop an alternative method for learning a
polynomial dictionary and a sparse representation method for polynomial signal
reconstruction based on this model. The proposed methods can be used directly
to operate on the polynomial matrix without having to access its coefficients
matrices. We demonstrate the performance of the proposed method for acoustic
impulse response modeling.Comment: 5 pages, 2 figure
A joint separation-classification model for sound event detection of weakly labelled data
Source separation (SS) aims to separate individual sources from an audio
recording. Sound event detection (SED) aims to detect sound events from an
audio recording. We propose a joint separation-classification (JSC) model
trained only on weakly labelled audio data, that is, only the tags of an audio
recording are known but the time of the events are unknown. First, we propose a
separation mapping from the time-frequency (T-F) representation of an audio to
the T-F segmentation masks of the audio events. Second, a classification
mapping is built from each T-F segmentation mask to the presence probability of
each audio event. In the source separation stage, sources of audio events and
time of sound events can be obtained from the T-F segmentation masks. The
proposed method achieves an equal error rate (EER) of 0.14 in SED,
outperforming deep neural network baseline of 0.29. Source separation SDR of
8.08 dB is obtained by using global weighted rank pooling (GWRP) as probability
mapping, outperforming the global max pooling (GMP) based probability mapping
giving SDR at 0.03 dB. Source code of our work is published.Comment: Accepted by ICASSP 201
Large-scale weakly supervised audio classification using gated convolutional neural network
In this paper, we present a gated convolutional neural network and a temporal
attention-based localization method for audio classification, which won the 1st
place in the large-scale weakly supervised sound event detection task of
Detection and Classification of Acoustic Scenes and Events (DCASE) 2017
challenge. The audio clips in this task, which are extracted from YouTube
videos, are manually labeled with one or a few audio tags but without
timestamps of the audio events, which is called as weakly labeled data. Two
sub-tasks are defined in this challenge including audio tagging and sound event
detection using this weakly labeled data. A convolutional recurrent neural
network (CRNN) with learnable gated linear units (GLUs) non-linearity applied
on the log Mel spectrogram is proposed. In addition, a temporal attention
method is proposed along the frames to predicate the locations of each audio
event in a chunk from the weakly labeled data. We ranked the 1st and the 2nd as
a team in these two sub-tasks of DCASE 2017 challenge with F value 55.6\% and
Equal error 0.73, respectively.Comment: submitted to ICASSP2018, summary on the 1st place system in DCASE2017
task4 challeng
Audio Set classification with attention model: A probabilistic perspective
This paper investigates the classification of the Audio Set dataset. Audio
Set is a large scale weakly labelled dataset of sound clips. Previous work used
multiple instance learning (MIL) to classify weakly labelled data. In MIL, a
bag consists of several instances, and a bag is labelled positive if at least
one instances in the audio clip is positive. A bag is labelled negative if all
the instances in the bag are negative. We propose an attention model to tackle
the MIL problem and explain this attention model from a novel probabilistic
perspective. We define a probability space on each bag, where each instance in
the bag has a trainable probability measure for each class. Then the
classification of a bag is the expectation of the classification output of the
instances in the bag with respect to the learned probability measure.
Experimental results show that our proposed attention model modeled by fully
connected deep neural network obtains mAP of 0.327 on Audio Set dataset,
outperforming the Google's baseline of 0.314 and recurrent neural network of
0.325.Comment: Accepted by ICASSP 201
- …