397 research outputs found

    Sparse and Nonnegative Factorizations For Music Understanding

    Get PDF
    In this dissertation, we propose methods for sparse and nonnegative factorization that are specifically suited for analyzing musical signals. First, we discuss two constraints that aid factorization of musical signals: harmonic and co-occurrence constraints. We propose a novel dictionary learning method that imposes harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms. We also propose co-occurrence constraints -- three simple and convenient multiplicative update rules for nonnegative matrix factorization (NMF) that enforce dependence among atoms. Using examples in music transcription, we demonstrate the ability of these updates to represent each musical note with multiple atoms and cluster the atoms for source separation purposes. Second, we study how spectral and temporal information extracted by nonnegative factorizations can improve upon musical instrument recognition. Musical instrument recognition in melodic signals is difficult, especially for classification systems that rely entirely upon spectral information instead of temporal information. Here, we propose a simple and effective method of combining spectral and temporal information for instrument recognition. While existing classification methods use traditional features such as statistical moments, we extract novel features from spectral and temporal atoms generated by NMF using a biologically motivated multiresolution gamma filterbank. Unlike other methods that require thresholds, safeguards, and hierarchies, the proposed spectral-temporal method requires only simple filtering and a flat classifier. Finally, we study how to perform sparse factorization when a large dictionary of musical atoms is already known. Sparse coding methods such as matching pursuit (MP) have been applied to problems in music information retrieval such as transcription and source separation with moderate success. However, when the set of dictionary atoms is large, identification of the best match in the dictionary with the residual is slow -- linear in the size of the dictionary. Here, we propose a variant called approximate matching pursuit (AMP) that is faster than MP while maintaining scalability and accuracy. Unlike MP, AMP uses an approximate nearest-neighbor (ANN) algorithm to find the closest match in a dictionary in sublinear time. One such ANN algorithm, locality-sensitive hashing (LSH), is a probabilistic hash algorithm that places similar, yet not identical, observations into the same bin. While the accuracy of AMP is comparable to similar MP methods, the computational complexity is reduced. Also, by using LSH, this method scales easily; the dictionary can be expanded without reorganizing any data structures

    Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking

    Get PDF
    Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF) approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.</p

    Automatic music transcription: challenges and future directions

    Get PDF
    Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

    Context based multimedia information retrieval

    Get PDF

    Music feature extraction and analysis through Python

    Get PDF
    En l'era digital, plataformes com Spotify s'han convertit en els principals canals de consum de música, ampliant les possibilitats per analitzar i entendre la música a través de les dades. Aquest projecte es centra en un examen exhaustiu d'un conjunt de dades obtingut de Spotify, utilitzant Python com a eina per a l'extracció i anàlisi de dades. L'objectiu principal es centra en la creació d'aquest conjunt de dades, emfatitzant una àmplia varietat de cançons de diversos subgèneres. La intenció és representar tant el panorama musical més tendenciós i popular com els nínxols, alineant-se amb el concepte de distribució de Cua Llarga, terme popularitzat com a "Long Tail" en anglès, que destaca el potencial de mercat de productes de nínxols amb menor popularitat. A través de l'anàlisi, es posen de manifest patrons en l'evolució de les característiques musicals al llarg de les dècades passades. Canvis en característiques com l'energia, el volum, la capacitat de ball, el positivisme que desprèn una cançó i la seva correlació amb la popularitat sorgeixen del conjunt de dades. Paral·lelament a aquesta anàlisi, es concep un sistema de recomanació musical basat en el contingut del conjunt de dades creat. L'objectiu és connectar cançons, especialment les menys conegudes, amb possibles oients. Aquest projecte ofereix perspectives beneficioses per a entusiastes de la música, científics de dades i professionals de la indústria. Les metodologies implementades i l'anàlisi realitzat presenten un punt de convergència de la ciència de dades i la indústria de la música en el context digital actualEn la era digital, plataformas como Spotify se han convertido en los principales canales de consumo de música, ampliando las posibilidades para analizar y entender la música a través de los datos. Este proyecto se centra en un examen exhaustivo de un conjunto de datos obtenido de Spotify, utilizando Python como herramienta para la extracción y análisis de datos. El objetivo principal se centra en la creación de este conjunto de datos, enfatizando una amplia variedad de canciones de diversos subgéneros. La intención es representar tanto el panorama musical más tendencioso y popular como los nichos, alineándose con el concepto de distribución de Cola Larga, término popularizado como Long Tail en inglés, que destaca el potencial de mercado de productos de nichos con menor popularidad. A través del análisis, se evidencian patrones en la evolución de las características musicales a lo largo de las décadas pasadas. Cambios en características como la energía, el volumen, la capacidad de baile, el positivismo que desprende una canción y su correlación con la popularidad surgen del conjunto de datos. Paralelamente a este análisis, se concibe un sistema de recomendación musical basado en el contenido del conjunto de datos creado. El objetivo es conectar canciones, especialmente las menos conocidas, con posibles oyentes. Este proyecto ofrece perspectivas beneficiosas para entusiastas de la música, científicos de datos y profesionales de la industria. Las metodologías implementadas y el análisis realizado presentan un punto de convergencia de la ciencia de datos y la industria de la música en el contexto digital actualIn the digital era, platforms like Spotify have become the primary channels of music consumption, broadening the possibilities for analyzing and understanding music through data. This project focuses on a comprehensive examination of a dataset sourced from Spotify, with Python as the tool for data extraction and analysis. The primary objective centers around the creation of this dataset, emphasizing a diverse range of songs from various subgenres. The intention is to represent both mainstream and niche musical landscapes, aligning with the Long Tail distribution concept, which highlights the market potential of less popular niche products. Through analysis, patterns in the evolution of musical features over past decades become evident. Shifts in features such as energy, loudness, danceability, and valence and their correlation with popularity emerge from the dataset. Parallel to this analysis is the conceptualization of a music recommendation system based on the content of the data set. The aim is to connect tracks, especially lesser-known ones, with potential listeners. This project provides insights beneficial for music enthusiasts, data scientists, and industry professionals. The methodologies and analyses present a convergence of data science and the music industry in today's digital contex

    Blind source separation using statistical nonnegative matrix factorization

    Get PDF
    PhD ThesisBlind Source Separation (BSS) attempts to automatically extract and track a signal of interest in real world scenarios with other signals present. BSS addresses the problem of recovering the original signals from an observed mixture without relying on training knowledge. This research studied three novel approaches for solving the BSS problem based on the extensions of non-negative matrix factorization model and the sparsity regularization methods. 1) A framework of amalgamating pruning and Bayesian regularized cluster nonnegative tensor factorization with Itakura-Saito divergence for separating sources mixed in a stereo channel format: The sparse regularization term was adaptively tuned using a hierarchical Bayesian approach to yield the desired sparse decomposition. The modified Gaussian prior was formulated to express the correlation between different basis vectors. This algorithm automatically detected the optimal number of latent components of the individual source. 2) Factorization for single-channel BSS which decomposes an information-bearing matrix into complex of factor matrices that represent the spectral dictionary and temporal codes: A variational Bayesian approach was developed for computing the sparsity parameters for optimizing the matrix factorization. This approach combined the advantages of both complex matrix factorization (CMF) and variational -sparse analysis. BLIND SOURCE SEPARATION USING STATISTICAL NONNEGATIVE MATRIX FACTORIZATION ii 3) An imitated-stereo mixture model developed by weighting and time-shifting the original single-channel mixture where source signals can be modelled by the AR processes. The proposed mixing mixture is analogous to a stereo signal created by two microphones with one being real and another virtual. The imitated-stereo mixture employed the nonnegative tensor factorization for separating the observed mixture. The separability analysis of the imitated-stereo mixture was derived using Wiener masking. All algorithms were tested with real audio signals. Performance of source separation was assessed by measuring the distortion between original source and the estimated one according to the signal-to-distortion (SDR) ratio. The experimental results demonstrate that the proposed uninformed audio separation algorithms have surpassed among the conventional BSS methods; i.e. IS-cNTF, SNMF and CMF methods, with average SDR improvement in the ranges from 2.6dB to 6.4dB per source.Payap Universit
    • …
    corecore