298 research outputs found

    A Convolutional Approach to Melody Line Identification in Symbolic Scores

    Get PDF
    In many musical traditions, the melody line is of primary significance in a piece. Human listeners can readily distinguish melodies from accompaniment; however, making this distinction given only the written score -- i.e. without listening to the music performed -- can be a difficult task. Solving this task is of great importance for both Music Information Retrieval and musicological applications. In this paper, we propose an automated approach to identifying the most salient melody line in a symbolic score. The backbone of the method consists of a convolutional neural network (CNN) estimating the probability that each note in the score (more precisely: each pixel in a piano roll encoding of the score) belongs to the melody line. We train and evaluate the method on various datasets, using manual annotations where available and solo instrument parts where not. We also propose a method to inspect the CNN and to analyze the influence exerted by notes on the prediction of other notes; this method can be applied whenever the output of a neural network has the same size as the input

    Convolutional Methods for Music Analysis

    Get PDF

    Multimodal music information processing and retrieval: survey and future challenges

    Full text link
    Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

    Polyphonic music generation using neural networks

    Get PDF
    In this project, the application of generative models for polyphonic music generation is investigated. Polyphonic music generation falls into the field of algorithmic composition, which is a field that aims to develop models to automate, partially or completely, the composition of musical pieces. This process has many challenges both in terms of how to achieve the generation of musical pieces that are enjoyable and also how to perform a robust evaluation of the model to guide improvements. An extensive survey of the development of the field and the state-of-the-art is carried out. From this, two distinct generative models were chosen to apply to the problem of polyphonic music generation. The models chosen were the Restricted Boltzmann Machine and the Generative Adversarial Network. In particular, for the GAN, two architectures were used, the Deep Convolutional GAN and the Wasserstein GAN with gradient penalty. To train these models, a dataset containing over 9000 samples of classical musical pieces was used. Using a piano-roll representation of the musical pieces, these were converted into binary 2D arrays in which the vertical dimensions related to the pitch while the horizontal dimension represented the time, and note events were represented by active units. The first 16 seconds of each piece was extracted and used for training the model after applying data cleansing and preprocessing. Using implementations of these models, samples of musical pieces were generated. Based on listening tests performed by participants, the Deep Convolutional GAN achieved the best scores, with its compositions being ranked on average 4.80 on a scale from 1-5 of how enjoyable the pieces were. To perform a more objective evaluation, different musical features that describe rhythmic and melodic characteristics were extracted from the generated pieces and compared against the training dataset. These features included the implementation of the Krumhansl-Schmuckler algorithm for musical key detection and the average information rate used as an estimator of long-term musical structure. Within each set of the generated musical samples, the pairwise cross-validation using the Euclidean distance between each feature was performed. This was also performed between each set of generated samples and the features extracted from the training data, resulting in two sets of distances, the intra-set and inter-set distances. Using kernel density estimation, the probability density functions of these are obtained. Finally, the Kullback-Liebler divergence between the intra-set and inter-set distance of each feature for each generative model was calculated. The lower divergence indicates that the distributions are more similar. On average, the Restricted Boltzmann Machine obtained the lowest Kullback-Liebler divergences

    16th Sound and Music Computing Conference SMC 2019 (28–31 May 2019, Malaga, Spain)

    Get PDF
    The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28-31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 associated Summer School took place 25-28 May 2019. The First International Day of Women in Inclusive Engineering, Sound and Music Computing Research (WiSMC 2019) took place on 28 May 2019. The SMC 2019 TOPICS OF INTEREST included a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc

    Kompozicionalni hierarhični model za pridobivanje informacij iz glasbe

    Full text link
    In recent years, deep architectures, most commonly based on neural networks, have advanced the state of the art in many research areas. Due to the popularity and the success of deep neural-networks, other deep architectures, including compositional models, have been put aside from mainstream research. This dissertation presents the compositional hierarchical model as a novel deep architecture for music processing. Our main motivation was to develop and explore an alternative non-neural deep architecture for music processing which would be transparent, meaning that the encoded knowledge would be interpretable, trained in an unsupervised manner and on small datasets, and useful as a feature extractor for classification tasks, as well as a transparent model for unsupervised pattern discovery. We base our work on compositional models, as compositionality is inherent in music. The proposed compositional hierarchical model learns a multi-layer hierarchical representation of the analyzed music signals in an unsupervised manner. It provides transparent insights into the learned concepts and their structure. It can be used as a feature extractor---its output can be used for classification tasks using existing machine learning techniques. Moreover, the model\u27s transparency enables an interpretation of the learned concepts, so the model can be used for analysis (exploration of the learned hierarchy) or discovery-oriented (inferring the hierarchy) tasks, which is difficult with most neural network based architectures. The proposed model uses relative coding of the learned concepts, which eliminates the need for large annotated training datasets that are essential in deep architectures with a large number of parameters. Relative coding contributes to slim models, which are fast to execute and have low memory requirements. The model also incorporates several biologically-inspired mechanisms that are modeled according to the mechanisms that exists at the lower levels of human perception (eg~ lateral inhibition in the human ear) and that significantly affect perception. The proposed model is evaluated on several music information retrieval tasks and its results are compared to the current state of the art. The dissertation is structured as follows. In the first chapter we present the motivation for the development of the new model. In the second chapter we elaborate on the related work in music information retrieval and review other compositional and transparent models. Chapter three introduces a thorough description of the proposed model. The model structure, its learning and inference methods are explained, as well as the incorporated biologically-inspired mechanisms. The model is then applied to several different music domains, which are divided according to the type of input data. In this we follow the timeline of the development and the implementation of the model. In chapter four, we present the model\u27s application to audio recordings, specifically for two tasks: automatic chord estimation and multiple fundamental frequency estimation. In chapter five, we present the model\u27s application to symbolic music representations. We concentrate on pattern discovery, emphasizing the model\u27s ability to tackle such problems. We also evaluate the model as a feature generator for tune family classification. Finally, in chapter six, we show the latest progress in developing the model for representing rhythm and show that it exhibits a high degree of robustness in extracting high-level rhythmic structures from music signals. We conclude the dissertation by summarizing our work and the results, elaborating on forthcoming work in the development of the model and its future applications.S porastom globokih arhitektur, ki temeljijo na nevronskih mrežah, so se v zadnjem času bistveno izboljšali rezultati pri reševanju problemov na več področjih. Zaradi popularnosti in uspešnosti teh globokih pristopov, temelječih na nevronskih mrežah, so bili drugi, predvsem kompozicionalni pristopi, odmaknjeni od središča pozornosti raziskav. V pričujoči disertaciji se posvečamo vprašanju, ali je mogoče razviti globoko arhitekturo, ki bo presegla obstoječe probleme globokih arhitektur. S tem namenom se vračamo h kompozicionalnim modelom in predstavimo kompozicionalni hierarhični model kot alternativno globoko arhitekturo, ki bo imela naslednje značilnosti: transparentnost, ki omogoča enostavno razlago naučenih konceptov, nenadzorovano učenje in zmožnost učenja na majhnih podatkovnih bazah, uporabnost modela kot izluščevalca značilk, kot tudi zmožnost uporabe transparentnosti modela za odkrivanje vzorcev. Naše delo temelji na kompozicionalnih modelih, ki so v glasbi intuitivni. Predlagani kompozicionalni hierarhični model je zmožen nenadzorovanega učenja večnivojske predstavitve glasbenega vhoda. Model omogoča pregled naučenih konceptov skozi transparentne strukture. Lahko ga uporabimo kot generator značilk -- izhod modela lahko uporabimo za klasifikacijo z drugimi pristopi strojnega učenja. Hkrati pa lahko transparentnost predlaganega modela uporabimo za analizo (raziskovanje naučene hierarhije) pri odkrivanju vzorcev, kar je težko izvedljivo z ostalimi pristopi, ki temeljijo na nevronskih mrežah. Relativno kodiranje konceptov v samem modelu pripomore k precej manjšim modelom in posledično zmanjšuje potrebo po velikih podatkovnih zbirkah, potrebnih za učenje modela. Z vpeljavo biološko navdahnjenih mehanizmov želimo model še bolj približati človeškemu načinu zaznave. Za nekatere mehanizme, na primer inhibicijo, vemo, da so v človeški percepciji prisotni na nižjih nivojih v ušesu in bistveno vplivajo na način zaznave. V modelu uvedemo prve korake k takšnemu načinu procesiranja proti končnemu cilju izdelave modela, ki popolnoma odraža človeško percepcijo. V prvem poglavju disertacije predstavimo motivacijo za razvoj novega modela. V drugem poglavju se posvetimo dosedanjim objavljenim dosežkom na tem področju. V nadaljnjih poglavjih se osredotočimo na sam model. Sprva opišemo teoretično zasnovo modela in način učenja ter delovanje biološko-navdahnjenih mehanizmov. V naslednjem koraku model apliciramo na več različnih glasbenih domen, ki so razdeljene glede na tip vhodnih podatkov. Pri tem sledimo časovnici razvoja in implementacijam modela tekom doktorskega študija. Najprej predstavimo aplikacijo modela za časovno-frekvenčne signale, na katerem model preizkusimo za dve opravili: avtomatsko ocenjevanje harmonij in avtomatsko transkripcijo osnovnih frekvenc. V petem poglavju predstavimo drug način aplikacije modela, tokrat na simbolne vhodne podatke, ki predstavljajo glasbeni zapis. Pri tem pristopu se osredotočamo na odkrivanje vzorcev, s čimer poudarimo zmožnost modela za reševanje tovrstnih problemov, ki je ostalim pristopom še nedosegljivo. Model prav tako evalviramo v vlogi generatorja značilk. Pri tem ga evalviramo na problemu melodične podobnosti pesmi in razvrščanja v variantne tipe. Nazadnje, v šestem poglavju, pokažemo zadnji dosežek razvoja modela, ki ga apliciramo na problem razumevanja ritma v glasbi. Prilagojeni model analiziramo in pokažemo njegovo zmožnost učenja različnih ritmičnih oblik in visoko stopnjo robustnosti pri izluščevanju visokonivojskih struktur v ritmu. V zaključkih disertacije povzamemo vloženo delo in rezultate ter nakažemo nadaljnje korake za razvoj modela v prihodnosti

    Flamenco music information retrieval.

    Get PDF
    El flamenco, un género musical centrado en la improvisación y la espontaneidad, tiene su origen en el sur de España y atrae a una creciente comunidad de aficionados de países de todo el mundo. El aumento constante y la accesibilidad a colecciones digitales de flamenco, en archivos de música y plataformas online, exige el desarrollo de métodos de análisis y descripción computacionales con el fin de indexar y analizar el contenido musical de manera automática. Music Information Retrieval (MIR) es un área de investigación multidisciplinaria dedicada a la extracción automática de información musical desde grabaciones de audio y partituras. Sin embargo, la gran mayoría de las herramientas existentes se dirigen a la música clásica y la música popular occidental y, a menudo, no se generalizan bien a las tradiciones musicales no occidentales, particularmente cuando las suposiciones relacionadas con la teoría musical no son válidas para estos géneros. Por otro lado, las características y los conceptos musicales específicos de una tradición musical pueden implicar nuevos desafíos computacionales, para los cuales no existen métodos adecuados. Esta tesis enfoca estas limitaciones existentes en el área abordando varios desafíos computacionales que surgen en el contexto de la música flamenca. Con este fin, se realizan una serie de contribuciones en forma de algoritmos novedosos, evaluaciones comparativas y estudios basados en datos, dirigidos a varias dimensiones musicales y que abarcan varias subáreas de ingeniería, matemática computacional, estadística, optimización y musicología computacional. Una particularidad del género, que influye enormemente en el trabajo presentado en esta tesis, es la ausencia de partituras para el cante flamenco. En consecuencia, los métodos computacionales deben basarse únicamente en el análisis de grabaciones, o de transcripciones extraídas automáticamente, lo que genera una colección de nuevos problemas computacionales. Un aspecto clave del flamenco es la presencia de patrones melódicos recurrentes, que esán sujetos a variación y ornamentación durante su interpretación. Desde la perspectiva computacional, identificamos tres tareas relacionadas a esta característica que se abordan en esta tesis: la clasificación por melodía, la búsqueda de secuencias melódicas y la extracción de patrones melódicos. Además, nos acercamos a la tarea de la detección no supervisada de frases melódicas repetidas y exploramos el uso de métodos de deep learning para la identificación de cantaores en grabaciones de video y la segmentación estructural de grabaciones de audio. Finalmente, demostramos en un estudio de minería de datos, cómo una exploración de anotaciones extraídas de manera automática de un corpus amplio de grabaciones nos ayuda a descubrir correlaciones interesantes y asimilar conocimientos sobre este género mayormente indocumentado.Flamenco is a rich performance-oriented art music genre from Southern Spain, which attracts a growing community of aficionados around the globe. The constantly increasing number of digitally available flamenco recordings in music archives, video sharing platforms and online music services calls for the development of genre-specific description and analysis methods, capable of automatically indexing and examining these collections in a content-driven manner. Music Information Retrieval is a multi-disciplinary research area dedicated to the automatic extraction of musical information from audio recordings and scores. Most existing approaches were however developed in the context of popular or classical music and do often not generalise well to non-Western music traditions, in particular when the underlying music theoretical assumptions do not hold for these genres. The specific characteristics and concepts of a music tradition can furthermore imply newcomputational challenges, for which no suitable methods exist. This thesis addresses these current shortcomings of Music Information Retrieval by tackling several computational challenge which arise in the context of flamenco music. To this end, a number of contributions to the field are made in form of novel algorithms, comparative evaluations and data-driven studies, directed at various musical dimensions and encompassing several sub-areas of computer science, computational mathematics, statistics, optimisation and computational musicology. A particularity of flamenco, which immensely shapes the work presented in this thesis, is the absence of written scores. Consequently, computational approaches can solely rely on the direct analysis of raw audio recordings or automatically extracted transcriptions, and this restriction generates set of new computational challenges. A key aspect of flamenco is the presence of reoccurring melodic templates, which are subject to heavy variation during performance. From a computational perspective, we identify three tasks related to this characteristic - melody classification, melody retrieval and melodic template extraction - which are addressed in this thesis. We furthermore approach the task of detecting repeated sung phrases in an unsupervised manner and explore the use of deep learning methods for image-based singer identification in flamenco videos and structural segmentation of flamenco recordings. Finally, we demonstrate in a data-driven corpus study, how automatic annotations can be mined to discover interesting correlations and gain insights into a largely undocumented genre
    corecore