Search CORE

630 research outputs found

Foreground-Background Ambient Sound Scene Separation

Author: Gasso Gilles
Olvera Michel
Serizel Romain
Vincent Emmanuel
Publication venue
Publication date: 27/07/2020
Field of study

Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background. We consider the task of separating these events from the background, which we call foreground-background ambient sound scene separation. We propose a deep learning-based separation framework with a suitable feature normaliza-tion scheme and an optional auxiliary network capturing the background statistics, and we investigate its ability to handle the great variety of sound classes encountered in ambient sound scenes, which have often not been seen in training. To do so, we create single-channel foreground-background mixtures using isolated sounds from the DESED and Audioset datasets, and we conduct extensive experiments with mixtures of seen or unseen sound classes at various signal-to-noise ratios. Our experimental findings demonstrate the generalization ability of the proposed approach

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

INRIA a CCSD electronic archive server

Reverberant Audio Source Separation via Sparse and Low-Rank Modeling

Author: Arberet Simon
Vandergheynst Pierre
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/12/2013
Field of study

The performance of audio source separation from underdetermined convolutive mixture assuming known mixing filters can be significantly improved by using an analysis sparse prior optimized by a reweighting l1 scheme and a wideband datafidelity term, as demonstrated by a recent article. In this letter, we show that the performance can be improved even more significantly by exploiting a low-rank prior on the source spectrograms.We present a new algorithm to estimate the sources based on i) an analysis sparse prior, ii) a reweighting scheme so as to increase the sparsity, iii) a wideband data-fidelity term in a constrained form, and iv) a low-rank constraint on the source spectrograms. Evaluation on reverberant music mixtures shows that the resulting algorithm improves state-of-the-art methods by more than 2 dB of signal-to-distortion ratio

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Deep Clustering and Conventional Networks for Music Separation: Stronger Together

Author: Chen Zhuo
Hershey John R.
Luo Yi
Mesgarani Nima
Roux Jonathan Le
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/06/2017
Field of study

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore combining them in a single hybrid network trained via an approach akin to multi-task learning. Remarkably, the combination significantly outperforms either of its components.Comment: Published in ICASSP 201

arXiv.org e-Print Archive

Crossref

IMPROVED MULTIPLE BIRDSONG TRACKING WITH DISTRIBUTION DERIVATIVE METHOD AND MARKOV RENEWAL PROCESS CLUSTERING

Author: Bonada J
IEEE
Musevic S
Plumbley MD
Stowell D
Publication venue
Publication date: 01/01/2013
Field of study

DS & MP are supported by an EPSRC Leadership Fellowship EP/G007144/1

arXiv.org e-Print Archive

Crossref

University of Surrey

UPF Digital Repository

Queen Mary Research Online

Surrey Research Insight

TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer

Author: Anil Cem
Bao Xuchan
Grosse Roger B.
Huang Sicong
Li Qiyang
Oore Sageev
Publication venue
Publication date: 01/05/2019
Field of study

In this work, we address the problem of musical timbre transfer, where the goal is to manipulate the timbre of a sound sample from one instrument to match another instrument while preserving other musical content, such as pitch, rhythm, and loudness. In principle, one could apply image-based style transfer techniques to a time-frequency representation of an audio signal, but this depends on having a representation that allows independent manipulation of timbre as well as high-quality waveform generation. We introduce TimbreTron, a method for musical timbre transfer which applies "image" domain style transfer to a time-frequency representation of the audio signal, and then produces a high-quality waveform using a conditional WaveNet synthesizer. We show that the Constant Q Transform (CQT) representation is particularly well-suited to convolutional architectures due to its approximate pitch equivariance. Based on human perceptual evaluations, we confirmed that TimbreTron recognizably transferred the timbre while otherwise preserving the musical content, for both monophonic and polyphonic samples.Comment: 17 pages, published as a conference paper at ICLR 201

arXiv.org e-Print Archive