182 research outputs found
Multi-scale Multi-band DenseNets for Audio Source Separation
This paper deals with the problem of audio source separation. To handle the
complex and ill-posed nature of the problems of audio source separation, the
current state-of-the-art approaches employ deep neural networks to obtain
instrumental spectra from a mixture. In this study, we propose a novel network
architecture that extends the recently developed densely connected
convolutional network (DenseNet), which has shown excellent results on image
classification tasks. To deal with the specific problem of audio source
separation, an up-sampling layer, block skip connection and band-dedicated
dense blocks are incorporated on top of DenseNet. The proposed approach takes
advantage of long contextual information and outperforms state-of-the-art
results on SiSEC 2016 competition by a large margin in terms of
signal-to-distortion ratio. Moreover, the proposed architecture requires
significantly fewer parameters and considerably less training time compared
with other methods.Comment: to appear at WASPAA 201
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Generative adversarial network (GAN)-based vocoders have been intensively
studied because they can synthesize high-fidelity audio waveforms faster than
real-time. However, it has been reported that most GANs fail to obtain the
optimal projection for discriminating between real and fake data in the feature
space. In the literature, it has been demonstrated that slicing adversarial
network (SAN), an improved GAN training framework that can find the optimal
projection, is effective in the image generation task. In this paper, we
investigate the effectiveness of SAN in the vocoding task. For this purpose, we
propose a scheme to modify least-squares GAN, which most GAN-based vocoders
adopt, so that their loss functions satisfy the requirements of SAN. Through
our experiments, we demonstrate that SAN can improve the performance of
GAN-based vocoders, including BigVGAN, with small modifications. Our code is
available at https://github.com/sony/bigvsan.Comment: Submitted to ICASSP 202
Mode Domain Spatial Active Noise Control Using Sparse Signal Representation
Active noise control (ANC) over a sizeable space requires a large number of
reference and error microphones to satisfy the spatial Nyquist sampling
criterion, which limits the feasibility of practical realization of such
systems. This paper proposes a mode-domain feedforward ANC method to attenuate
the noise field over a large space while reducing the number of microphones
required. We adopt a sparse reference signal representation to precisely
calculate the reference mode coefficients. The proposed system consists of
circular reference and error microphone arrays, which capture the reference
noise signal and residual error signal, respectively, and a circular
loudspeaker array to drive the anti-noise signal. Experimental results indicate
that above the spatial Nyquist frequency,our proposed method can perform well
compared to a conventional methods. Moreover, the proposed method can even
reduce the number of reference microphones while achieving better noise
attenuation.Comment: to appear at ICASSP 201
- …