897 research outputs found
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer
This paper introduces a deep neural network model for subband-based speech
synthesizer. The model benefits from the short bandwidth of the subband signals
to reduce the complexity of the time-domain speech generator. We employed the
multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into
subbands in time domain. Inspired from the WaveNet, a convolutional neural
network (CNN) model predicts subband speech signals fully in time domain. Due
to the short bandwidth of the subbands, a simple network architecture is enough
to train the simple patterns of the subbands accurately. In the ground truth
experiments with teacher-forcing, the subband synthesizer outperforms the
fullband model significantly in terms of both subjective and objective
measures. In addition, by conditioning the model on the phoneme sequence using
a pronunciation dictionary, we have achieved the fully time-domain neural model
for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end.
The generated speech of the subband TTS shows comparable quality as the
fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur
Revisiting Multi-Step Nonlinearity Compensation with Machine Learning
For the efficient compensation of fiber nonlinearity, one of the guiding
principles appears to be: fewer steps are better and more efficient. We
challenge this assumption and show that carefully designed multi-step
approaches can lead to better performance-complexity trade-offs than their
few-step counterparts.Comment: 4 pages, 3 figures, This is a preprint of a paper submitted to the
2019 European Conference on Optical Communicatio
A Structurally Regularized CNN Architecture via Adaptive Subband Decomposition
We propose a generalized convolutional neural network (CNN) architecture that
first decomposes the input signal into subbands by an adaptive filter bank
structure, and then uses convolutional layers to extract features from each
subband independently. Fully connected layers finally combine the extracted
features to perform classification. The proposed architecture restrains each of
the subband CNNs from learning using the entire input signal spectrum,
resulting in structural regularization. Our proposed CNN architecture is fully
compatible with the end-to-end learning mechanism of typical CNN architectures
and learns the subband decomposition from the input dataset. We show that the
proposed CNN architecture has attractive properties, such as robustness to
input and weight-and-bias quantization noise, compared to regular full-band CNN
architectures. Importantly, the proposed architecture significantly reduces
computational costs, while maintaining state-of-the-art classification
accuracy.
Experiments on image classification tasks using the MNIST, CIFAR-10/100,
Caltech-101, and ImageNet-2012 datasets show that the proposed architecture
allows accuracy surpassing state-of-the-art results. On the ImageNet-2012
dataset, we achieved top-5 and top-1 validation set accuracy of 86.91% and
69.73%, respectively. Notably, the proposed architecture offers over 90%
reduction in computation cost in the inference path and approximately 75%
reduction in back-propagation (per iteration) with just a single-layer subband
decomposition. With a 2-layer subband decomposition, the computational gains
are even more significant with comparable accuracy results to the single-layer
decomposition
Wideband Time-Domain Digital Backpropagation via Subband Processing and Deep Learning
We propose a low-complexity sub-banded DSP architecture for digital
backpropagation where the walk-off effect is compensated using simple delay
elements. For a simulated 96-Gbaud signal and 2500 km optical link, our method
achieves a 2.8 dB SNR improvement over linear equalization.Comment: 3 pages, 3 figur
Recommended from our members
Adaptive Noise Reduction for Sound Event Detection Using Subband-Weighted NMF
Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). First, a scheme for noise dictionary learning from the input noisy signal is employed by the technique of robust NMF, which supports adaptation to noise variations. The estimated noise dictionary is used to develop a supervised source separation framework in combination with a pre-trained event dictionary. Second, to improve the separation quality, we extend the basic NMF model to a weighted form, with the aim of varying the relative importance of the different components when separating a target sound event from noise. With properly designed weights, the separation process is forced to rely more on those dominant event components, whereas the noise gets greatly suppressed. The proposed method is evaluated on a dataset of the rare sound event detection task of the DCASE 2017 challenge, and achieves comparable results to the top-ranking system based on convolutional recurrent neural networks (CRNNs). The proposed weighted NMF method shows an excellent noise reduction ability, and achieves an improvement of an F-score by 5%, compared to the unweighted approach
Residual echo signal in critically sampled subband acoustic echo cancellers based on IIR and FIR filter banks
Published versio
- …