6,545 research outputs found
SEGAN: Speech Enhancement Generative Adversarial Network
Current speech enhancement techniques operate on the spectral domain and/or
exploit some higher-level feature. The majority of them tackle a limited number
of noise conditions and rely on first-order statistics. To circumvent these
issues, deep networks are being increasingly used, thanks to their ability to
learn complex functions from large example sets. In this work, we propose the
use of generative adversarial networks for speech enhancement. In contrast to
current techniques, we operate at the waveform level, training the model
end-to-end, and incorporate 28 speakers and 40 different noise conditions into
the same model, such that model parameters are shared across them. We evaluate
the proposed model using an independent, unseen test set with two speakers and
20 alternative noise conditions. The enhanced samples confirm the viability of
the proposed model, and both objective and subjective evaluations confirm the
effectiveness of it. With that, we open the exploration of generative
architectures for speech enhancement, which may progressively incorporate
further speech-centric design choices to improve their performance.Comment: 5 pages, 4 figures, accepted in INTERSPEECH 201
ScarGAN: Chained Generative Adversarial Networks to Simulate Pathological Tissue on Cardiovascular MR Scans
Medical images with specific pathologies are scarce, but a large amount of
data is usually required for a deep convolutional neural network (DCNN) to
achieve good accuracy. We consider the problem of segmenting the left
ventricular (LV) myocardium on late gadolinium enhancement (LGE) cardiovascular
magnetic resonance (CMR) scans of which only some of the scans have scar
tissue. We propose ScarGAN to simulate scar tissue on healthy myocardium using
chained generative adversarial networks (GAN). Our novel approach factorizes
the simulation process into 3 steps: 1) a mask generator to simulate the shape
of the scar tissue; 2) a domain-specific heuristic to produce the initial
simulated scar tissue from the simulated shape; 3) a refining generator to add
details to the simulated scar tissue. Unlike other approaches that generate
samples from scratch, we simulate scar tissue on normal scans resulting in
highly realistic samples. We show that experienced radiologists are unable to
distinguish between real and simulated scar tissue. Training a U-Net with
additional scans with scar tissue simulated by ScarGAN increases the percentage
of scar pixels correctly included in LV myocardium prediction from 75.9% to
80.5%.Comment: 12 pages, 5 figures. To appear in MICCAI DLMIA 201
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
- …