3 research outputs found

    Advances in independent component analysis and nonnegative matrix factorization

    Get PDF
    A fundamental problem in machine learning research, as well as in many other disciplines, is finding a suitable representation of multivariate data, i.e. random vectors. For reasons of computational and conceptual simplicity, the representation is often sought as a linear transformation of the original data. In other words, each component of the representation is a linear combination of the original variables. Well-known linear transformation methods include principal component analysis (PCA), factor analysis, and projection pursuit. In this thesis, we consider two popular and widely used techniques: independent component analysis (ICA) and nonnegative matrix factorization (NMF). ICA is a statistical method in which the goal is to find a linear representation of nongaussian data so that the components are statistically independent, or as independent as possible. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction and signal separation. Starting from ICA, several methods of estimating the latent structure in different problem settings are derived and presented in this thesis. FastICA as one of most efficient and popular ICA algorithms has been reviewed and discussed. Its local and global convergence and statistical behavior have been further studied. A nonnegative FastICA algorithm is also given in this thesis. Nonnegative matrix factorization is a recently developed technique for finding parts-based, linear representations of non-negative data. It is a method for dimensionality reduction that respects the nonnegativity of the input data while constructing a low-dimensional approximation. The non-negativity constraints make the representation purely additive (allowing no subtractions), in contrast to many other linear representations such as principal component analysis and independent component analysis. A literature survey of Nonnegative matrix factorization is given in this thesis, and a novel method called Projective Nonnegative matrix factorization (P-NMF) and its applications are provided

    Audio source separation for music in low-latency and high-latency scenarios

    Get PDF
    Aquesta tesi proposa m猫todes per tractar les limitacions de les t猫cniques existents de separaci贸 de fonts musicals en condicions de baixa i alta lat猫ncia. En primer lloc, ens centrem en els m猫todes amb un baix cost computacional i baixa lat猫ncia. Proposem l'煤s de la regularitzaci贸 de Tikhonov com a m猫tode de descomposici贸 de l'espectre en el context de baixa lat猫ncia. El comparem amb les t猫cniques existents en tasques d'estimaci贸 i seguiment dels tons, que s贸n passos crucials en molts m猫todes de separaci贸. A continuaci贸 utilitzem i avaluem el m猫tode de descomposici贸 de l'espectre en tasques de separaci贸 de veu cantada, baix i percussi贸. En segon lloc, proposem diversos m猫todes d'alta lat猫ncia que milloren la separaci贸 de la veu cantada, gr脿cies al modelatge de components espec铆fics, com la respiraci贸 i les consonants. Finalment, explorem l'煤s de correlacions temporals i anotacions manuals per millorar la separaci贸 dels instruments de percussi贸 i dels senyals musicals polif貌nics complexes.Esta tesis propone m茅todos para tratar las limitaciones de las t茅cnicas existentes de separaci贸n de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los m茅todos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularizaci贸n de Tikhonov como m茅todo de descomposici贸n del espectro en el contexto de baja latencia. Lo comparamos con las t茅cnicas existentes en tareas de estimaci贸n y seguimiento de los tonos, que son pasos cruciales en muchos m茅todos de separaci贸n. A continuaci贸n utilizamos y evaluamos el m茅todo de descomposici贸n del espectro en tareas de separaci贸n de voz cantada, bajo y percusi贸n. En segundo lugar, proponemos varios m茅todos de alta latencia que mejoran la separaci贸n de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiraci贸n y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separaci贸n de los instrumentos de percusi贸n y se帽ales musicales polif贸nicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals
    corecore