Search CORE

228 research outputs found

Semi-supervised {NMF} with time-frequency annotations for single-channel source separation

Author: Bach Francis
Févotte Cédric
Lefèvre Augustin
Publication venue: HAL CCSD
Publication date: 08/10/2012
Field of study

International audienceWe formulate a novel extension of nonnegative matrix factorization (NMF) to take into account partial information on source-speciﬁc activity in the spectrogram. This information comes in the form of masking coefﬁcients, such as those found in an ideal binary mask. We show that state-of-the-art results in source separation may be achieved with only a limited amount of correct annotation, and furthermore our algorithm is robust to incorrect annotations. Since in practice ideal annotations are not observed, we propose several supervision scenarios to estimate the ideal mask- ing coefﬁcients. First, manual annotations by a trained user on a dedicated graphical user interface are shown to provide satisfactory performance although they are prone to errors. Second, we investigate simple learning strate- gies to predict the Wiener coefﬁcients based on local information around a given time-frequency bin of the spec- trogram. Results on single-channel source separation show that time-frequency annotations allow to disambiguate the source separation problem, and learned annotations open the way for a completely unsupervised learning procedure for source separation with no human intervention

CiteSeerX

INRIA a CCSD electronic archive server

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Score-Informed Source Separation for Musical Audio Recordings [An overview]

Author: Ewert S
Mueller M
Pardo B
Plumbley MD
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

(c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

CiteSeerX

Crossref

Queen Mary Research Online

Surrey Research Insight

Audio source separation for music in low-latency and high-latency scenarios

Author: Marxer Piñón Ricard
Publication venue: 'Universitat Pompeu Fabra'
Publication date: 01/01/2013
Field of study

Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Author: Heittola Toni
Huttunen Heikki
Parascandolo Giambattista
Virtanen Tuomas
Çakır Emre
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/02/2017
Field of study

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.Comment: Accepted for IEEE Transactions on Audio, Speech and Language Processing, Special Issue on Sound Scene and Event Analysi

arXiv.org e-Print Archive

Trepo - Institutional Repository of Tampere University

Multi-channel approaches for musical audio content analysis

Author: João Carlos Couto Antunes Fonseca
Publication venue
Publication date: 24/07/2020
Field of study

The goal of this research project is to undertake a critical evaluation of signal representations for musical audio content analysis. In particular it will contrast three different means for undertaking the analysis of micro-rhythmic content in Afro-Latin American music, namely through the use of: i) stereo or mono mixed recordings; ii) separated sources obtained via state of the art musical audio source separation techniques; and iii) the use of perfectly separated multi-track stems. In total the project comprises the following four objectives: i) To compile a dataset of mixed and multi-channel recordings of the Brazilian Maracatu musicians; ii) To conceive methods for rhythmical micro-variations analysis and pattern recognition; iii) To explore diverse music source separation approaches that preserve micro-rhythmic content; iv) To evaluate the performance of several automatic onset estimation approaches; and v) To compare the rhythmic analysis obtained from the original multi-channel sources versus the separated ones to evaluate separation quality regarding microtiming identification

Repositório Aberto da Universidade do Porto