731 research outputs found
SNR-Based Teachers-Student Technique for Speech Enhancement
It is very challenging for speech enhancement methods to achieves robust
performance under both high signal-to-noise ratio (SNR) and low SNR
simultaneously. In this paper, we propose a method that integrates an SNR-based
teachers-student technique and time-domain U-Net to deal with this problem.
Specifically, this method consists of multiple teacher models and a student
model. We first train the teacher models under multiple small-range SNRs that
do not coincide with each other so that they can perform speech enhancement
well within the specific SNR range. Then, we choose different teacher models to
supervise the training of the student model according to the SNR of the
training data. Eventually, the student model can perform speech enhancement
under both high SNR and low SNR. To evaluate the proposed method, we
constructed a dataset with an SNR ranging from -20dB to 20dB based on the
public dataset. We experimentally analyzed the effectiveness of the SNR-based
teachers-student technique and compared the proposed method with several
state-of-the-art methods.Comment: Published in 2020 IEEE International Conference on Multimedia and
Expo (ICME 2020
The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
With recent breakthroughs in artificial neural networks, deep generative
models have become one of the leading techniques for computational creativity.
Despite very promising progress on image and short sequence generation,
symbolic music generation remains a challenging problem since the structure of
compositions are usually complicated. In this study, we attempt to solve the
melody generation problem constrained by the given chord progression. This
music meta-creation problem can also be incorporated into a plan recognition
system with user inputs and predictive structural outputs. In particular, we
explore the effect of explicit architectural encoding of musical structure via
comparing two sequential generative models: LSTM (a type of RNN) and WaveNet
(dilated temporal-CNN). As far as we know, this is the first study of applying
WaveNet to symbolic music generation, as well as the first systematic
comparison between temporal-CNN and RNN for music generation. We conduct a
survey for evaluation in our generations and implemented Variable Markov Oracle
in music pattern discovery. Experimental results show that to encode structure
more explicitly using a stack of dilated convolution layers improved the
performance significantly, and a global encoding of underlying chord
progression into the generation procedure gains even more.Comment: 8 pages, 13 figure
RawNet: Fast End-to-End Neural Vocoder
Neural networks based vocoders have recently demonstrated the powerful
ability to synthesize high quality speech. These models usually generate
samples by conditioning on some spectrum features, such as Mel-spectrum.
However, these features are extracted by using speech analysis module including
some processing based on the human knowledge. In this work, we proposed RawNet,
a truly end-to-end neural vocoder, which use a coder network to learn the
higher representation of signal, and an autoregressive voder network to
generate speech sample by sample. The coder and voder together act like an
auto-encoder network, and could be jointly trained directly on raw waveform
without any human-designed features. The experiments on the Copy-Synthesis
tasks show that RawNet can achieve the comparative synthesized speech quality
with LPCNet, with a smaller model architecture and faster speech generation at
the inference step.Comment: Submitted to Interspeech 2019, Graz, Austri
- …