107 research outputs found

    Audio source separation for music in low-latency and high-latency scenarios

    Get PDF
    Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals

    Evaluating the spectral clustering segmentation algorithm for describing diverse music collections

    Get PDF
    This paper presents an evaluation of the spectral clustering segmentation algorithm used for automating the description of musical structure within a song. This study differs from the standard evaluation in that it accounts for variability in genre, class, tempo, song duration, and time signature on the results of evaluation metrics. The study uses standard metrics for segment boundary placement accuracy and labeling accuracy against these song metadata. It reveals that song duration, tempo, class, and genre have a significant effect on evaluation scores. This study demonstrates how the algorithm may be evaluated to predict its performance for a given collection where these variables are known. The possible causes and implications of these effects on evaluation scores are explored based on the construction of the spectral clustering algorithm and its potential for use in describing diverse music collections.Master of Science in Library Scienc

    Signal Processing Methods for Music Synchronization, Audio Matching, and Source Separation

    Get PDF
    The field of music information retrieval (MIR) aims at developing techniques and tools for organizing, understanding, and searching multimodal information in large music collections in a robust, efficient and intelligent manner. In this context, this thesis presents novel, content-based methods for music synchronization, audio matching, and source separation. In general, music synchronization denotes a procedure which, for a given position in one representation of a piece of music, determines the corresponding position within another representation. Here, the thesis presents three complementary synchronization approaches, which improve upon previous methods in terms of robustness, reliability, and accuracy. The first approach employs a late-fusion strategy based on multiple, conceptually different alignment techniques to identify those music passages that allow for reliable alignment results. The second approach is based on the idea of employing musical structure analysis methods in the context of synchronization to derive reliable synchronization results even in the presence of structural differences between the versions to be aligned. Finally, the third approach employs several complementary strategies for increasing the accuracy and time resolution of synchronization results. Given a short query audio clip, the goal of audio matching is to automatically retrieve all musically similar excerpts in different versions and arrangements of the same underlying piece of music. In this context, chroma-based audio features are a well-established tool as they possess a high degree of invariance to variations in timbre. This thesis describes a novel procedure for making chroma features even more robust to changes in timbre while keeping their discriminative power. Here, the idea is to identify and discard timbre-related information using techniques inspired by the well-known MFCC features, which are usually employed in speech processing. Given a monaural music recording, the goal of source separation is to extract musically meaningful sound sources corresponding, for example, to a melody, an instrument, or a drum track from the recording. To facilitate this complex task, one can exploit additional information provided by a musical score. Based on this idea, this thesis presents two novel, conceptually different approaches to source separation. Using score information provided by a given MIDI file, the first approach employs a parametric model to describe a given audio recording of a piece of music. The resulting model is then used to extract sound sources as specified by the score. As a computationally less demanding and easier to implement alternative, the second approach employs the additional score information to guide a decomposition based on non-negative matrix factorization (NMF)

    Towards the Automatic Analysis of Metric Modulations

    Get PDF
    PhDThe metrical structure is a fundamental aspect of music, yet its automatic analysis from audio recordings remains one of the great challenges of Music Information Retrieval (MIR) research. This thesis is concerned with addressing the automatic analysis of changes of metrical structure over time, i.e. metric modulations. The evaluation of automatic musical analysis methods is a critical element of the MIR research and is typically performed by comparing the machine-generated estimates with human expert annotations, which are used as a proxy for ground truth. We present here two new datasets of annotations for the evaluation of metrical structure and metric modulation estimation systems. Multiple annotations allowed for the assessment of inter-annotator (dis)agreement, thereby allowing for an evaluation of the reference annotations used to evaluate the automatic systems. The rhythmogram has been identified in previous research as a feature capable of capturing characteristics of rhythmic content of a music recording. We present here a direct evaluation of its ability to characterise the metrical structure and as a result we propose a method to explicitly extract metrical structure descriptors from it. Despite generally good and increasing performance, such rhythm features extraction systems occasionally fail. When unpredictable, the failures are a barrier to usability and development of trust in MIR systems. In a bid to address this issue, we then propose a method to estimate the reliability of rhythm features extraction. Finally, we propose a two-fold method to automatically analyse metric modulations from audio recordings. On the one hand, we propose a method to detect metrical structure changes from the rhythmogram feature in an unsupervised fashion. On the other hand, we propose a metric modulations taxonomy rooted in music theory that relies on metrical structure descriptors that can be automatically estimated. Bringing these elements together lays the ground for the automatic production of a musicological interpretation of metric modulations.EPSRC award 1325200 and Omnifone Ltd

    16th Sound and Music Computing Conference SMC 2019 (28–31 May 2019, Malaga, Spain)

    Get PDF
    The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28-31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 associated Summer School took place 25-28 May 2019. The First International Day of Women in Inclusive Engineering, Sound and Music Computing Research (WiSMC 2019) took place on 28 May 2019. The SMC 2019 TOPICS OF INTEREST included a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Computational Modeling and Analysis of Multi-timbral Musical Instrument Mixtures

    Get PDF
    In the audio domain, the disciplines of signal processing, machine learning, psychoacoustics, information theory and library science have merged into the field of Music Information Retrieval (Music-IR). Music-IR researchers attempt to extract high level information from music like pitch, meter, genre, rhythm and timbre directly from audio signals as well as semantic meta-data over a wide variety of sources. This information is then used to organize and process data for large scale retrieval and novel interfaces. For creating musical content, access to hardware and software tools for producing music has become commonplace in the digital landscape. While the means to produce music have become widely available, significant time must be invested to attain professional results. Mixing multi-channel audio requires techniques and training far beyond the knowledge of the average music software user. As a result, there is significant growth and development in intelligent signal processing for audio, an emergent field combining audio signal processing and machine learning for producing music. This work focuses on methods for modeling and analyzing multi-timbral musical instrument mixtures and performing automated processing techniques to improve audio quality based on quantitative and qualitative measures. The main contributions of the work involve training models to predict mixing parameters for multi-channel audio sources and developing new methods to model the component interactions of individual timbres to an overall mixture. Linear dynamical systems (LDS) are shown to be capable of learning the relative contributions of individual instruments to re-create a commercial recording based on acoustic features extracted directly from audio. Variations in the model topology are explored to make it applicable to a more diverse range of input sources and improve performance. An exploration of relevant features for modeling timbre and identifying instruments is performed. Using various basis decomposition techniques, audio examples are reconstructed and analyzed in a perceptual listening test to evaluate their ability to capture salient aspects of timbre. These tests show that a 2-D decomposition is able to capture much more perceptually relevant information with regard to the temporal evolution of the frequency spectrum of a set of audio examples. The results indicate that joint modeling of frequencies and their evolution is essential for capturing higher level concepts in audio that we desire to leverage in automated systems.Ph.D., Electrical Engineering -- Drexel University, 201
    corecore