Search CORE

182 research outputs found

Multi-scale Multi-band DenseNets for Audio Source Separation

Author: Mitsufuji Yuki
Takahashi Naoya
Publication venue
Publication date: 29/06/2017
Field of study

This paper deals with the problem of audio source separation. To handle the complex and ill-posed nature of the problems of audio source separation, the current state-of-the-art approaches employ deep neural networks to obtain instrumental spectra from a mixture. In this study, we propose a novel network architecture that extends the recently developed densely connected convolutional network (DenseNet), which has shown excellent results on image classification tasks. To deal with the specific problem of audio source separation, an up-sampling layer, block skip connection and band-dedicated dense blocks are incorporated on top of DenseNet. The proposed approach takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio. Moreover, the proposed architecture requires significantly fewer parameters and considerably less training time compared with other methods.Comment: to appear at WASPAA 201

arXiv.org e-Print Archive

Crossref

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

Author: Mitsufuji Yuki
Shibuya Takashi
Takida Yuhta
Publication venue
Publication date: 06/09/2023
Field of study

Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between real and fake data in the feature space. In the literature, it has been demonstrated that slicing adversarial network (SAN), an improved GAN training framework that can find the optimal projection, is effective in the image generation task. In this paper, we investigate the effectiveness of SAN in the vocoding task. For this purpose, we propose a scheme to modify least-squares GAN, which most GAN-based vocoders adopt, so that their loss functions satisfy the requirements of SAN. Through our experiments, we demonstrate that SAN can improve the performance of GAN-based vocoders, including BigVGAN, with small modifications. Our code is available at https://github.com/sony/bigvsan.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Mode Domain Spatial Active Noise Control Using Sparse Signal Representation

Author: Abhayapala Thushara D.
Maeno Yu
Mitsufuji Yuki
Publication venue
Publication date: 28/02/2018
Field of study

Active noise control (ANC) over a sizeable space requires a large number of reference and error microphones to satisfy the spatial Nyquist sampling criterion, which limits the feasibility of practical realization of such systems. This paper proposes a mode-domain feedforward ANC method to attenuate the noise field over a large space while reducing the number of microphones required. We adopt a sparse reference signal representation to precisely calculate the reference mode coefficients. The proposed system consists of circular reference and error microphone arrays, which capture the reference noise signal and residual error signal, respectively, and a circular loudspeaker array to drive the anti-noise signal. Experimental results indicate that above the spatial Nyquist frequency,our proposed method can perform well compared to a conventional methods. Moreover, the proposed method can even reduce the number of reference microphones while achieving better noise attenuation.Comment: to appear at ICASSP 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Clusterinソガイハサイボウロウカヲユウドウシスイガンノゾウショクヲテイカサセル

Author: Mitsufuji Suguru
ミツフジスグル
光藤傑
Publication venue
Publication date
Field of study

Osaka University Knowledge Archive