2,943 research outputs found

    Signal Processing Methods for Music Synchronization, Audio Matching, and Source Separation

    Get PDF
    The field of music information retrieval (MIR) aims at developing techniques and tools for organizing, understanding, and searching multimodal information in large music collections in a robust, efficient and intelligent manner. In this context, this thesis presents novel, content-based methods for music synchronization, audio matching, and source separation. In general, music synchronization denotes a procedure which, for a given position in one representation of a piece of music, determines the corresponding position within another representation. Here, the thesis presents three complementary synchronization approaches, which improve upon previous methods in terms of robustness, reliability, and accuracy. The first approach employs a late-fusion strategy based on multiple, conceptually different alignment techniques to identify those music passages that allow for reliable alignment results. The second approach is based on the idea of employing musical structure analysis methods in the context of synchronization to derive reliable synchronization results even in the presence of structural differences between the versions to be aligned. Finally, the third approach employs several complementary strategies for increasing the accuracy and time resolution of synchronization results. Given a short query audio clip, the goal of audio matching is to automatically retrieve all musically similar excerpts in different versions and arrangements of the same underlying piece of music. In this context, chroma-based audio features are a well-established tool as they possess a high degree of invariance to variations in timbre. This thesis describes a novel procedure for making chroma features even more robust to changes in timbre while keeping their discriminative power. Here, the idea is to identify and discard timbre-related information using techniques inspired by the well-known MFCC features, which are usually employed in speech processing. Given a monaural music recording, the goal of source separation is to extract musically meaningful sound sources corresponding, for example, to a melody, an instrument, or a drum track from the recording. To facilitate this complex task, one can exploit additional information provided by a musical score. Based on this idea, this thesis presents two novel, conceptually different approaches to source separation. Using score information provided by a given MIDI file, the first approach employs a parametric model to describe a given audio recording of a piece of music. The resulting model is then used to extract sound sources as specified by the score. As a computationally less demanding and easier to implement alternative, the second approach employs the additional score information to guide a decomposition based on non-negative matrix factorization (NMF)

    Distribution-Dissimilarities in Machine Learning

    Get PDF
    Any binary classifier (or score-function) can be used to define a dissimilarity between two distributions. Many well-known distribution-dissimilarities are actually classifier-based: total variation, KL- or JS-divergence, Hellinger distance, etc. And many recent popular generative modeling algorithms compute or approximate these distribution-dissimilarities by explicitly training a classifier: e.g. generative adversarial networks (GAN) and their variants. This thesis introduces and studies such classifier-based distribution-dissimilarities. After a general introduction, the first part analyzes the influence of the classifiers' capacity on the dissimilarity's strength for the special case of maximum mean discrepancies (MMD) and provides applications. The second part studies applications of classifier-based distribution-dissimilarities in the context of generative modeling and presents two new algorithms: Wasserstein Auto-Encoders (WAE) and AdaGAN. The third and final part focuses on adversarial examples, i.e. targeted but imperceptible input-perturbations that lead to drastically different predictions of an artificial classifier. It shows that adversarial vulnerability of neural network based classifiers typically increases with the input-dimension, independently of the network topology

    Some New Results on the Estimation of Sinusoids in Noise

    Get PDF

    Style, structure and function in Cape Town Tsotsitaal

    Get PDF
    Includes bibliographical references (leaves 214-223).The thesis applies a social constructionist framework and Foucauldian Discourse Analysis to demonstrate that while Tsotsitaal was perceived by many respondents as a language of gangsters and criminals, evidence suggests that it is actually part of an ongoing identity construction for young, black, primarily male urban township residents in South Africa, which is performed through a subcultural style. By applying Myers-Scotton's Matrix Language Frame model to questionnaire and interview data collected in two Cape Town townships, Gugulethu and Khayelitsha, the thesis identifies the syntactic framework of Cape Town Tsotsitaal as Xhosa

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    On the encoding of natural music in computational models and human brains

    Get PDF
    This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music
    corecore