Search CORE

24 research outputs found

An overview of informed audio source separation

Author: Daudet Laurent
Durrieu Jean-Louis
Liutkus Antoine
Richard Gael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

International audienceAudio source separation consists in recovering different unknown signals called sources by filtering their observed mixtures. In music processing, most mixtures are stereophonic songs and the sources are the individual signals played by the instruments, e.g. bass, vocals, guitar, etc. Source separation is often achieved through a classical generalized Wiener filtering, which is controlled by parameters such as the power spectrograms and the spatial locations of the sources. For an efficient filtering, those parameters need to be available and their estimation is the main challenge faced by separation algorithms. In the blind scenario, only the mixtures are available and performance strongly depends on the mixtures considered. In recent years, much research has focused on informed separation, which consists in using additional available information about the sources to improve the separation quality. In this paper, we review some recent trends in this direction

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Recommended from our members

Improving instrument recognition in polyphonic music through system integration

Author: Benetos E
Giannoulis D
IEEE
Klapuri A
Plumbley MD
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

A method is proposed for instrument recognition in polyphonic music which combines two independent detector systems. A polyphonic musical instrument recognition system using a missing feature approach and an automatic music transcription system based on shift invariant probabilistic latent component analysis that includes instrument assignment. We propose a method to integrate the two systems by fusing the instrument contributions estimated by the first system onto the transcription system in the form of Dirichlet priors. Both systems, as well as the integrated system are evaluated using a dataset of continuous polyphonic music recordings. Detailed results that highlight a clear improvement in the performance of the integrated system are reported for different training conditions

City Research Online

Crossref

Queen Mary Research Online

Surrey Research Insight

On the Use of Masking Filters in Sound Source Separation

Author: Fitzgerald Derry
Jaiswal Amit
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2012
Field of study

Many sound source separation algorithms, such as NMF and related approaches, disregard phase information and operate only on magnitude or power spectrograms. In this context, generalised Wiener filters have been widely used to generate masks which are applied to the original complex-valued spectrogram before inversion to the time domain, as these masks have been shown to give good results. However, these masks may not be optimal from a perceptual point of view. To this end, we propose new families of masks and compare their performance to generalised Wiener filter masks using three different factorisation-based separation algorithms. Further, to-date no analysis of how the performance of masking varies with the number of iterations performed when estimating the separated sources. We perform such an analysis and show that when using these masks, running to convergence may not be required in order to obtain good separation performance

Arrow@TUDublin

Recommended from our members

An RNN-based Music Language Model for Improving Automatic Music Transcription

Author: Benetos E.
Cherla S.
Dixon S.
Garcez A.
Sigtia S.
Weyde T.
Publication venue: International Society for Music Information Retrieval
Publication date: 01/01/2014
Field of study

In this paper, we investigate the use of Music Language Models (MLMs) for improving Automatic Music Transcription performance. The MLMs are trained on sequences of symbolic polyphonic music from the Nottingham dataset. We train Recurrent Neural Network (RNN)-based models, as they are capable of capturing complex temporal structure present in symbolic music data. Similar to the function of language models in automatic speech recognition, we use the MLMs to generate a prior probability for the occurrence of a sequence. The acoustic AMT model is based on probabilistic latent component analysis, and prior information from the MLM is incorporated into the transcription framework using Dirichlet priors. We test our hybrid models on a dataset of multiple-instrument polyphonic music and report a significant 3% improvement in terms of F-measure, when compared to using an acoustic-only model

City Research Online

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

User Assisted Separation of Repeating Patterns in Time and Frequency using Magnitude Projections

Author: Fitzgerald Derry
Liutkus Antoine
Rafii Zafar
Publication venue: HAL CCSD
Publication date: 05/03/2017
Field of study

International audienceIn this paper, we propose a simple user-assisted method for the recovery of repeating patterns in time and frequency which can occur in audio mixtures. Here, the user selects a region in a log-frequency spectrogram from which they seek to recover the underlying pattern, such as a repeating chord masked by a cough. Cross-correlation is then performed between the selected region and the spectrogram, revealing similar regions. The most similar region is then selected and a variant on the PROJET algorithm, termed PROJET-MAG, is used to extract the common time-frequency components from the two regions , as well as extracting the components which are not common. The results obtained are compared to another user-assisted method based on REPET, and PROJET-MAG is demonstrated to give improved results over this baseline

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

An interactive audio source separation framework based on non-negative matrix factorization

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization

Author: A Cichocki
A Ozerov
A Ozerov
A Ozerov
A Ozerov
A Shashua
C Févotte
C Févotte
C Févotte
D FitzGerald
DD Lee
E Vincent
E Vincent
F Weninger
H Sawada
JM Becker
M Cranitch
M Nakano
M Spiertz
N Bertin
NQ Duong
NQK Duong
O Dikmen
P Smaragdis
R Jaiswal
S Araki
S Arberet
S Doclo
S Ewert
TJ Klasen
TO Virtanen
Y Mitsufuji
Ö Yilmaz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

RNN-based Music Language Models for Improving Automatic Music Transcription

Author: Benetos E
Cherla S
Dixon S
Garcez A
Sigtia S
Weyde T
Publication venue
Publication date: 01/01/2014
Field of study

Queen Mary Research Online

Generalized Wiener filtering with fractional power spectrograms

Author: Badeau Roland
Liutkus Antoine
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/04/2015
Field of study

International audienceIn the recent years, many studies have focused on the single-sensor separation of independent waveforms using so-called soft-masking strategies, where the short term Fourier transform of the mixture is multiplied element-wise by a ratio of spectrogram models. When the signals are wide-sense stationary, this strategy is theoretically justified as an optimal Wiener filtering: the power spectrograms of the sources are supposed to add up to yield the power spectrogram of the mixture. However, experience shows that using fractional spectrograms instead, such as the amplitude, yields good performance in practice, because they experimentally better fit the additivity assumption. To the best of our knowledge, no probabilistic interpretation of this filtering procedure was available to date. In this paper, we show that assuming the additivity of fractional spectrograms for the purpose of building soft-masks can be understood as separating locally stationary alpha-stable harmonizable processes, alpha-harmonizable in short, thus justifying the procedure theoretically

INRIA a CCSD electronic archive server

HAL-Rennes 1