Search CORE

12 research outputs found

Algorithms for nonnegative matrix factorization with the beta-divergence

Author: Févotte Cédric
Idier Jérôme
Publication venue
Publication date: 01/01/2011
Field of study

This paper describes algorithms for nonnegative matrix factorization (NMF) with the beta-divergence (beta-NMF). The beta-divergence is a family of cost functions parametrized by a single shape parameter beta that takes the Euclidean distance, the Kullback-Leibler divergence and the Itakura-Saito divergence as special cases (beta = 2,1,0, respectively). The proposed algorithms are based on a surrogate auxiliary function (a local majorization of the criterion function). We first describe a majorization-minimization (MM) algorithm that leads to multiplicative updates, which differ from standard heuristic multiplicative updates by a beta-dependent power exponent. The monotonicity of the heuristic algorithm can however be proven for beta in (0,1) using the proposed auxiliary function. Then we introduce the concept of majorization-equalization (ME) algorithm which produces updates that move along constant level sets of the auxiliary function and lead to larger steps than MM. Simulations on synthetic and real data illustrate the faster convergence of the ME approach. The paper also describes how the proposed algorithms can be adapted to two common variants of NMF : penalized NMF (i.e., when a penalty function of the factors is added to the criterion function) and convex-NMF (when the dictionary is assumed to belong to a known subspace).Comment: \`a para\^itre dans Neural Computatio

arXiv.org e-Print Archive

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization

Author: A Cichocki
A Ozerov
A Ozerov
A Ozerov
A Ozerov
A Shashua
C Févotte
C Févotte
C Févotte
D FitzGerald
DD Lee
E Vincent
E Vincent
F Weninger
H Sawada
JM Becker
M Cranitch
M Nakano
M Spiertz
N Bertin
NQ Duong
NQK Duong
O Dikmen
P Smaragdis
R Jaiswal
S Araki
S Arberet
S Doclo
S Ewert
TJ Klasen
TO Virtanen
Y Mitsufuji
Ö Yilmaz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The DESAM toolbox: spectral analysis of musical audio

Author: Badeau Roland
Bertin Nancy
Daudet Laurent
David Bertrand
Derrien Olivier
Echeveste Jose
Lagrange Mathieu
Marchand Sylvain
Publication venue: HAL CCSD
Publication date: 01/09/2010
Field of study

International audienceIn this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different ``mid-level'' representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities

HAL-CentraleSupelec

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Real-time detection of overlapping sound events with non-negative matrix factorization

Author: Cont Arshia
Dessein Arnaud
Lemaitre Guillaume
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceIn this paper, we investigate the problem of real-time detection of overlapping sound events by employing non-negative matrix factorization techniques. We consider a setup where audio streams arrive in real-time to the system and are decomposed onto a dictionary of event templates learned off-line prior to the decomposition. An important drawback of existing approaches in this context is the lack of controls on the decomposition. We propose and compare two provably convergent algorithms that address this issue, by controlling respectively the sparsity of the decomposition and the trade-off of the decomposition between the different frequency components. Sparsity regularization is considered in the framework of convex quadratic programming, while frequency compromise is introduced by employing the beta-divergence as a cost function. The two algorithms are evaluated on the multi-source detection tasks of polyphonic music transcription, drum transcription and environmental sound recognition. The obtained results show how the proposed approaches can improve detection in such applications, while maintaining low computational costs that are suitable for real-time

INRIA a CCSD electronic archive server

A Perceptual Evaluation of Short-Time Fourier Transform Window Duration and Divergence Cost Function on Audio Source Separation using Non-negative Matrix Factorization

Author: Miller Ryan J.
Publication venue: Belmont Digital Repository
Publication date: 12/05/2020
Field of study

Non-negative matrix factorization (NMF) is an established method of performing audio source separation. Previous studies used NMF with supplementary systems to improve performance, but little has been done to investigate perceptual effects of NMF parameters. The present study aimed to evaluate two NMF parameters for speech enhancement: the short-time Fourier transform (STFT) window duration and divergence cost function. Two experiments were conducted: the first investigated the effect of STFT window duration on target speech intelligibility in a sentence keyword identification task. The second experiment had participants rate residual noise levels present in target speech using three different cost functions: the Euclidian Distance (EU), the Kullback-Leibler (KL) divergence, and the Itakura-Saito (IS) divergence. It was found that a 92.9 ms window duration produced the highest intelligibility scores, while the IS divergence produced significantly lower residual noise levels than the EU and KL divergences. Additionally, significant positive correlations were found between subjective residual noise scores and objective metrics from the Blind Source Separation (BSS_Eval) and Perceptual Evaluation method for Audio Source Separation (PEASS) toolboxes. Results suggest longer window durations, with increased frequency resolution, allow more accurate distinction between sources, improving intelligibility scores. Additionally, the IS divergence is able to more accurately approximate high frequency and transient components of audio, increasing separation of speech and noise. Correlation results suggest that using full bandwidth stimuli could increase reliability of objective measures

Belmont Digital Repository (Belmont University)

Recommended from our members

Bayesian methods in music modelling

Author: Peeling Paul
Publication venue: University of Cambridge
Publication date: 15/03/2011
Field of study

This thesis presents several hierarchical generative Bayesian models of musical signals designed to improve the accuracy of existing multiple pitch detection systems and other musical signal processing applications whilst remaining feasible for real-time computation. At the lowest level the signal is modelled as a set of overlapping sinusoidal basis functions. The parameters of these basis functions are built into a prior framework based on principles known from musical theory and the physics of musical instruments. The model of a musical note optionally includes phenomena such as frequency and amplitude modulations, damping, volume, timbre and inharmonicity. The occurrence of note onsets in a performance of a piece of music is controlled by an underlying tempo process and the alignment of the timings to the underlying score of the music. A variety of applications are presented for these models under differing inference constraints. Where full Bayesian inference is possible, reversible-jump Markov Chain Monte Carlo is employed to estimate the number of notes and partial frequency components in each frame of music. We also use approximate techniques such as model selection criteria and variational Bayes methods for inference in situations where computation time is limited or the amount of data to be processed is large. For the higher level score parameters, greedy search and conditional modes algorithms are found to be sufficiently accurate. We emphasize the links between the models and inference algorithms developed in this thesis with that in existing and parallel work, and demonstrate the effects of making modifications to these models both theoretically and by means of experimental results

Apollo (Cambridge)

Deep neural network based multichannel audio source separation

Author: Liutkus Antoine
Nugraha Aditya Arie
Vincent Emmanuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

International audienceThis chapter presents a multichannel audio source separation framework where deep neural networks (DNNs) are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information. The parameters are estimated in an iterative expectation-maximization (EM) fashion and used to derive a multichannel Wiener filter. Different design choices and their impact on the performance are discussed. They include the cost functions for DNN training, the number of parameter updates, the use of multiple DNNs, and the use of weighted parameter updates. Finally, we present its application to a speech enhancement task and a music separation task. The experimental results show the benefit of the multichannel DNN-based approach over a single-channel DNN-based approach and the multichannel nonnegative matrix factorization based iterative EM framework

INRIA a CCSD electronic archive server

Décompositions en éléments sonores et applications musicales

Author: Badeau Roland
Bertin Nancy
Daudet Laurent
David Bertrand
Derrien Olivier
Lagrange Mathieu
Marchand Sylvain
Publication venue: 'Lavoisier'
Publication date: 01/01/2011
Field of study

National audienceIn this paper is presented the DESAM project which was divided in two parts. The first one was devoted to the theoretical and experimental study of parametric and non-parametric techniques for decomposing audio signals into sound elements. The second part focused on some musical applications of these decompositions. Most aspects that have been considered in this project have led to the proposal of new methods which have been grouped together into the so-called DESAM Toolbox, a set of Matlab® functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processe soumission à Traitement du signalDans cet article sont présentés de manière synthétique les résultats du projet ANR DE-SAM (Décompositions en Éléments Sonores et Applications Musicales). Ce projet comportait deux parties, la première portant sur des avancées théoriques de techniques de décompositions de signaux audionumériques et la seconde traitant d'applications musicales de ces décompo-sitions. La plupart des aspects abordés dans le projet ont donné lieu à de nouvelles méthodes et algorithmes qui sont regroupés au sein d'une boîte à outils, la DESAM Toolbox. Celle-ci rassemble un ensemble de fonctions Matlab® dédiées à l'estimation de modèles spectraux très utilisés pour les signaux musicaux. Les méthodes étudiées dans ce projet peuvent bien sûr être utiles pour la recherche automatique d'informations dans les signaux musicaux, mais elles constituent avant tout une collection d'outils récents pour décomposer les signaux selon dif-férents modèles, avec pour résultat des représentations mi-niveau variées, pouvant être utiles dans d'autres domaines d'application

HAL-CentraleSupelec

I-Revues

HAL AMU

INRIA a CCSD electronic archive server

HAL-Rennes 1

Exploiting Piano Acoustics in Automatic Transcription

Author: CHENG T
Publication venue
Publication date: 06/01/2017
Field of study

This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.This work was supported by a joint Queen Mary/China Scholarship Council Scholarship.In this thesis we exploit piano acoustics to automatically transcribe piano recordings into a symbolic representation: the pitch and timing of each detected note. To do so we use approaches based on non-negative matrix factorisation (NMF). To motivate the main contributions of this thesis, we provide two preparatory studies: a study of using a deterministic annealing EM algorithm in a matrix factorisation-based system, and a study of decay patterns of partials in real-word piano tones. Based on these studies, we propose two generative NMF-based models which explicitly model different piano acoustical features. The first is an attack/decay model, that takes into account the time-varying timbre and decaying energy of piano sounds. The system divides a piano note into percussive attack and harmonic decay stages, and separately models the two parts using two sets of templates and amplitude envelopes. The two parts are coupled by the note activations. We simplify the decay envelope by an exponentially decaying function. The proposed method improves the performance of supervised piano transcription. The second model aims at using the spectral width of partials as an independent indicator of the duration of piano notes. Each partial is represented by a Gaussian function, with the spectral width indicated by the standard deviation. The spectral width is large in the attack part, but gradually decreases to a stable value and remains constant in the decay part. The model provides a new aspect to understand the time-varying timbre of piano notes, but furtherinvestigation is needed to use it effectively to improve piano transcription. We demonstrate the utility of the proposed systems in piano music transcription and analysis. Results show that explicitly modelling piano acoustical features, especially temporal features, can improve the transcription performance.Queen Mary/China Scholarship Council Scholarship

Queen Mary Research Online

Non-Negative Matrix Factorization Based Algorithms to Cluster Frequency Basis Functions for Monaural Sound Source Separation.

Author: Jaiswal Amit
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2013
Field of study

Monophonic sound source separation (SSS) refers to a process that separates out audio signals produced from the individual sound sources in a given acoustic mixture, when the mixture signal is recorded using one microphone or is directly recorded onto one reproduction channel. Many audio applications such as pitch modification and automatic music transcription would benefit from the availability of segregated sound sources from the mixture of audio signals for further processing. Recently, Non-negative matrix factorization (NMF) has found application in monaural audio source separation due to its ability to factorize audio spectrograms into additive part-based basis functions, where the parts typically correspond to individual notes or chords in music. An advantage of NMF is that there can be a single basis function for each note played by a given instrument, thereby capturing changes in timbre with pitch for each instrument or source. However, these basis functions need to be clustered to their respective sources for the reconstruction of the individual source signals. Many clustering methods have been proposed to map the separated signals into sources with considerable success. Recently, to avoid the need of clustering, Shifted NMF (SNMF) was proposed, which assumes that the timbre of a note is constant for all the pitches produced by an instrument. SNMF has two drawbacks. Firstly, the assumption that the timbre of the notes played by an instrument remains constant, is not true in general. Secondly, the SNMF method uses the Constant Q transform (CQT) and the lack of a true inverse of the CQT results in compromising on separation quality of the reconstructed signal. The principal aim of this thesis is to attempt to solve the problem of clustering NMF basis functions. Our first major contribution is the use of SNMF as a method of clustering the basis functions obtained via standard NMF. The proposed SNMF clustering method aims to cluster the frequency basis functions obtained via standard NMF to their respective sources by making use of shift invariance in a log-frequency domain. Further, a minor contribution is made by improving the separation performance of the standard SNMF algorithm (here used directly to separate sources) obtained through the use of an improved inverse CQT. Here, the standard SNMF algorithm finds shift-invariance in a CQ spectrogram, that contain the frequency basis functions, obtained directly from the spectrogram of the audio mixture. Our next contribution is an improvement in the SNMF clustering algorithm through the incorporation of the CQT matrix inside the SNMF model in order to avoid the need of an inverse CQT to reconstruct the clustered NMF basis unctions. Another major contribution deals with the incorporation of a constraint called group sparsity (GS) into the SNMF clustering algorithm at two stages to improve clustering. The effect of the GS is evaluated on various SNMF clustering algorithms proposed in this thesis. Finally, we have introduced a new family of masks to reconstruct the original signal from the clustered basis functions and compared their performance to the generalized Wiener filter masks using three different factorisation-based separation algorithms. We show that better separation performance can be achieved by using the proposed family of masks

CiteSeerX

Arrow@TUDublin