Search CORE

146 research outputs found

Multichannel high resolution NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain

Author: Badeau Roland
Plumbley Mark
Publication venue
Publication date: 01/01/2013
Field of study

Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies

Queen Mary Research Online

Surrey Research Insight

Sound Source Separation

Author: Evangelista G
Marchand S
Plumbley MD
Vincent E
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Queen Mary Research Online

Surrey Research Insight

HAL-Rennes 1

Improved Convolutive and Under-Determined Blind Audio Source Separation with MRF Smoothing

Author: Rafał Zdunek
Publication venue: Springer Nature
Publication date: 01/01/2012
Field of study

Convolutive and under-determined blind audio source separation from noisy recordings is a challenging problem. Several computational strategies have been proposed to address this problem. This study is concerned with several modifications to the expectation-minimization-based algorithm, which iteratively estimates the mixing and source parameters. This strategy assumes that any entry in each source spectrogram is modeled using superimposed Gaussian components, which are mutually and individually independent across frequency and time bins. In our approach, we resolve this issue by considering a locally smooth temporal and frequency structure in the power source spectrograms. Local smoothness is enforced by incorporating a Gibbs prior in the complete data likelihood function, which models the interactions between neighboring spectrogram bins using a Markov random field. Simulations using audio files derived from stereo audio source separation evaluation campaign 2008 demonstrate high efficiency with the proposed improvement

Springer - Publisher Connector

PubMed Central

Trennung und Schätzung der Anzahl von Audiosignalquellen mit Zeit- und Frequenzüberlappung

Author: Stöter Fabian-Robert
Publication venue
Publication date: 01/01/2020
Field of study

Everyday audio recordings involve mixture signals: music contains a mixture of instruments; in a meeting or conference, there is a mixture of human voices. For these mixtures, automatically separating or estimating the number of sources is a challenging task. A common assumption when processing mixtures in the time-frequency domain is that sources are not fully overlapped. However, in this work we consider some cases where the overlap is severe — for instance, when instruments play the same note (unison) or when many people speak concurrently ("cocktail party") — highlighting the need for new representations and more powerful models. To address the problems of source separation and count estimation, we use conventional signal processing techniques as well as deep neural networks (DNN). We ﬁrst address the source separation problem for unison instrument mixtures, studying the distinct spectro-temporal modulations caused by vibrato. To exploit these modulations, we developed a method based on time warping, informed by an estimate of the fundamental frequency. For cases where such estimates are not available, we present an unsupervised model, inspired by the way humans group time-varying sources (common fate). This contribution comes with a novel representation that improves separation for overlapped and modulated sources on unison mixtures but also improves vocal and accompaniment separation when used as an input for a DNN model. Then, we focus on estimating the number of sources in a mixture, which is important for real-world scenarios. Our work on count estimation was motivated by a study on how humans can address this task, which lead us to conduct listening experiments, conﬁrming that humans are only able to estimate the number of up to four sources correctly. To answer the question of whether machines can perform similarly, we present a DNN architecture, trained to estimate the number of concurrent speakers. Our results show improvements compared to other methods, and the model even outperformed humans on the same task. In both the source separation and source count estimation tasks, the key contribution of this thesis is the concept of “modulation”, which is important to computationally mimic human performance. Our proposed Common Fate Transform is an adequate representation to disentangle overlapping signals for separation, and an inspection of our DNN count estimation model revealed that it proceeds to ﬁnd modulation-like intermediate features.Im Alltag sind wir von gemischten Signalen umgeben: Musik besteht aus einer Mischung von Instrumenten; in einem Meeting oder auf einer Konferenz sind wir einer Mischung menschlicher Stimmen ausgesetzt. Für diese Mischungen ist die automatische Quellentrennung oder die Bestimmung der Anzahl an Quellen eine anspruchsvolle Aufgabe. Eine häuﬁge Annahme bei der Verarbeitung von gemischten Signalen im Zeit-Frequenzbereich ist, dass die Quellen sich nicht vollständig überlappen. In dieser Arbeit betrachten wir jedoch einige Fälle, in denen die Überlappung immens ist zum Beispiel, wenn Instrumente den gleichen Ton spielen (unisono) oder wenn viele Menschen gleichzeitig sprechen (Cocktailparty) —, so dass neue Signal-Repräsentationen und leistungsfähigere Modelle notwendig sind. Um die zwei genannten Probleme zu bewältigen, verwenden wir sowohl konventionelle Signalverbeitungsmethoden als auch tiefgehende neuronale Netze (DNN). Wir gehen zunächst auf das Problem der Quellentrennung für Unisono-Instrumentenmischungen ein und untersuchen die speziellen, durch Vibrato ausgelösten, zeitlich-spektralen Modulationen. Um diese Modulationen auszunutzen entwickelten wir eine Methode, die auf Zeitverzerrung basiert und eine Schätzung der Grundfrequenz als zusätzliche Information nutzt. Für Fälle, in denen diese Schätzungen nicht verfügbar sind, stellen wir ein unüberwachtes Modell vor, das inspiriert ist von der Art und Weise, wie Menschen zeitveränderliche Quellen gruppieren (Common Fate). Dieser Beitrag enthält eine neuartige Repräsentation, die die Separierbarkeit für überlappte und modulierte Quellen in Unisono-Mischungen erhöht, aber auch die Trennung in Gesang und Begleitung verbessert, wenn sie in einem DNN-Modell verwendet wird. Im Weiteren beschäftigen wir uns mit der Schätzung der Anzahl von Quellen in einer Mischung, was für reale Szenarien wichtig ist. Unsere Arbeit an der Schätzung der Anzahl war motiviert durch eine Studie, die zeigt, wie wir Menschen diese Aufgabe angehen. Dies hat uns dazu veranlasst, eigene Hörexperimente durchzuführen, die bestätigten, dass Menschen nur in der Lage sind, die Anzahl von bis zu vier Quellen korrekt abzuschätzen. Um nun die Frage zu beantworten, ob Maschinen dies ähnlich gut können, stellen wir eine DNN-Architektur vor, die erlernt hat, die Anzahl der gleichzeitig sprechenden Sprecher zu ermitteln. Die Ergebnisse zeigen Verbesserungen im Vergleich zu anderen Methoden, aber vor allem auch im Vergleich zu menschlichen Hörern. Sowohl bei der Quellentrennung als auch bei der Schätzung der Anzahl an Quellen ist ein Kernbeitrag dieser Arbeit das Konzept der “Modulation”, welches wichtig ist, um die Strategien von Menschen mittels Computern nachzuahmen. Unsere vorgeschlagene Common Fate Transformation ist eine adäquate Darstellung, um die Überlappung von Signalen für die Trennung zugänglich zu machen und eine Inspektion unseres DNN-Zählmodells ergab schließlich, dass sich auch hier modulationsähnliche Merkmale ﬁnden lassen

Common Fate Model for Unison source Separation

Author: Badeau Roland
Edler Bernd
Liutkus Antoine
Magron Paul
Stöter Fabian-Robert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

International audienceIn this paper we present a novel source separation method aiming to overcome the difficulty of modelling non-stationary signals. The method can be applied to mixtures of musical instruments with frequency and/or amplitude modulation, e.g. typically caused by vi-brato. It is based on a signal representation that divides the complex spectrogram into a grid of patches of arbitrary size. These complex patches are then processed by a two-dimensional discrete Fourier transform, forming a tensor representation which reveals spectral and temporal modulation textures. Our representation can be seen as an alternative to modulation transforms computed on magnitude spectrograms. An adapted factorization model allows to decompose different time-varying harmonic sources based on their particular common modulation profile: hence the name Common Fate Model. The method is evaluated on musical instrument mixtures playing the same fundamental frequency (unison), showing improvement over other state-of-the-art methods

INRIA a CCSD electronic archive server

TOWARDS COMPLEX MATRIX DECOMPOSITION OF SPECTROGRAMS BASED ON THE RELATIVE PHASE OFFSETS OF HARMONIC SOUNDS

Author: Badeau R
Dixon S
IEEE
Kirchhoff H
Publication venue
Publication date: 01/01/2014
Field of study

Queen Mary Research Online

Recommended from our members

Time-Frequency Analysis as Probabilistic Inference

Author: Sahani Maneesh
Turner Richard E
Publication venue: IEEE Transactions on Signal Processing
Publication date: 01/12/2014
Field of study

This is the final published version. It was originally published by IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6918491.This paper proposes a new view of time-frequency analysis framed in terms of probabilistic inference. Natural signals are assumed to be formed by the superposition of distinct time-frequency components, with the analytic goal being to infer these components by application of Bayes' rule. The framework serves to unify various existing models for natural time-series; it relates to both the Wiener and Kalman filters, and with suitable assumptions yields inferential interpretations of the short-time Fourier transform, spectrogram, filter bank, and wavelet representations. Value is gained by placing time-frequency analysis on the same probabilistic basis as is often employed in applications such as denoising, source separation, or recognition. Uncertainty in the time-frequency representation can be propagated correctly to application-specific stages, improving the handing of noise and missing data. Probabilistic learning allows modules to be co-adapted; thus, the time-frequency representation can be adapted to both the demands of the application and the time-varying statistics of the signal at hand. Similarly, the application module can be adapted to fine properties of the signal propagated by the initial time-frequency processing. We demonstrate these benefits by combining probabilistic time-frequency representations with non-negative matrix factorization, finding benefits in audio denoising and inpainting tasks, albeit with higher computational cost than incurred by the standard approach.Funding was provided by EPSRC (grant numbers EP/G050821/1 and EP/L000776/1) and Google (R.E.T.) and by the Gatsby Charitable Foundation (M.S.)

Apollo (Cambridge)

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES

Author: Ewert S
IEEE
Plumbley MD
Sandler M
Publication venue
Publication date: 01/01/2014
Field of study

(c)2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published in: Proc IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), Florence, Italy, 5-9 May 2014

Queen Mary Research Online

A generative model for natural sounds based on latent force modelling

Author: C Févotte
CE Rasmussen
GJ Mysore
JH McDermott
MA Alvarez
RE Turner
RE Turner
S Särkkä
T Virtanen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/07/2018
Field of study

Generative models based on subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitude modulation to be a crucial component of auditory perception. Probabilistic latent variable analysis can be particularly insightful, but existing approaches don’t incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay or feedback. We use latent force modelling, a probabilistic learning paradigm that encodes physical knowledge into Gaussian process regression, to model correlation across spectral subband envelopes. We augment the standard latent force model approach by explicitly modelling dependencies across multiple time steps. Incorporating this prior knowledge strengthens the interpretation of the latent functions as the source that generated the signal. We examine this interpretation via an experiment showing that sounds generated by sampling from our probabilistic model are perceived to be more realistic than those generated by comparative models based on nonnegative matrix factorisation, even in cases where our model is outperformed from a reconstruction error perspective

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

A diagonal plus low-rank covariance model for computationally efficient source separation

Author: Liutkus Antoine
Yoshii Kazuyoshi
Publication venue: HAL CCSD
Publication date: 25/09/2017
Field of study

International audienceThis paper presents an accelerated version of positive semidef-inite tensor factorization (PSDTF) for blind source separation. PSDTF works better than nonnegative matrix factoriza-tion (NMF) by dropping the arguable assumption that audio signals can be whitened in the frequency domain by using short-term Fourier transform (STFT). Indeed, this assumption only holds true in an ideal situation where each frame is infinitely long and the target signal is completely stationary in each frame. PSDTF thus deals with full covariance matrices over frequency bins instead of forcing them to be diagonal as in NMF. Although PSDTF significantly outperforms NMF in terms of separation performance, it suffers from a heavy computational cost due to the repeated inversion of big covariance matrices. To solve this problem, we propose an intermediate model based on diagonal plus low-rank covariance matrices and derive the expectation-maximization (EM) algorithm for efficiently updating the parameters of PSDTF. Experimental results showed that our method can dramatically reduce the complexity of PSDTF by several orders of magnitude without a significant decrease in separation performance. Index Terms— Blind source separation, nonnegative matrix factorization, positive semidefinite tensor factorization, low-rank approximation

INRIA a CCSD electronic archive server