3 research outputs found

    Audio source separation for music in low-latency and high-latency scenarios

    Get PDF
    Aquesta tesi proposa m猫todes per tractar les limitacions de les t猫cniques existents de separaci贸 de fonts musicals en condicions de baixa i alta lat猫ncia. En primer lloc, ens centrem en els m猫todes amb un baix cost computacional i baixa lat猫ncia. Proposem l'煤s de la regularitzaci贸 de Tikhonov com a m猫tode de descomposici贸 de l'espectre en el context de baixa lat猫ncia. El comparem amb les t猫cniques existents en tasques d'estimaci贸 i seguiment dels tons, que s贸n passos crucials en molts m猫todes de separaci贸. A continuaci贸 utilitzem i avaluem el m猫tode de descomposici贸 de l'espectre en tasques de separaci贸 de veu cantada, baix i percussi贸. En segon lloc, proposem diversos m猫todes d'alta lat猫ncia que milloren la separaci贸 de la veu cantada, gr脿cies al modelatge de components espec铆fics, com la respiraci贸 i les consonants. Finalment, explorem l'煤s de correlacions temporals i anotacions manuals per millorar la separaci贸 dels instruments de percussi贸 i dels senyals musicals polif貌nics complexes.Esta tesis propone m茅todos para tratar las limitaciones de las t茅cnicas existentes de separaci贸n de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los m茅todos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularizaci贸n de Tikhonov como m茅todo de descomposici贸n del espectro en el contexto de baja latencia. Lo comparamos con las t茅cnicas existentes en tareas de estimaci贸n y seguimiento de los tonos, que son pasos cruciales en muchos m茅todos de separaci贸n. A continuaci贸n utilizamos y evaluamos el m茅todo de descomposici贸n del espectro en tareas de separaci贸n de voz cantada, bajo y percusi贸n. En segundo lugar, proponemos varios m茅todos de alta latencia que mejoran la separaci贸n de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiraci贸n y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separaci贸n de los instrumentos de percusi贸n y se帽ales musicales polif贸nicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

    Transcribing Multi-Instrument Polyphonic Music With Hierarchical Eigeninstruments

    Get PDF
    This paper presents a general probabilistic model for transcribing single-channel music recordings containing multiple polyphonic instrument sources. The system requires no prior knowledge of the instruments present in the mixture (other than the number), although it can benefit from information about instrument type if available. In contrast to many existing polyphonic transcription systems, our approach explicitly models the individual instruments and is thereby able to assign detected notes to their respective sources. We use training instruments to learn a set of linear manifolds in model parameter space which are then used during transcription to constrain the properties of models fit to the target mixture. This leads to a hierarchical mixture-of-subspaces design which makes it possible to supply the system with prior knowledge at different levels of abstraction. The proposed technique is evaluated on both recorded and synthesized mixtures containing two, three, four, and five instruments each. We compare our approach in terms of transcription with (i.e., detected pitches must be associated with the correct instrument) and without source-assignment to another multi-instrument transcription system as well as a baseline non-negative matrix factorization (NMF) algorithm. For two-instrument mixtures evaluated with source-assignment, we obtain average frame-level F-measures of up to 0.52 in the completely blind transcription setting (i.e., no prior knowledge of the instruments in the mixture) and up to 0.67 if we assume knowledge of the basic instrument types. For transcription without source assignment, these numbers rise to 0.76 and 0.83, respectively
    corecore