139 research outputs found

    From heuristics-based to data-driven audio melody extraction

    Get PDF
    The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

    Self-Supervised Representation Learning for Vocal Music Context

    Full text link
    In music and speech, meaning is derived at multiple levels of context. Affect, for example, can be inferred both by a short sound token and by sonic patterns over a longer temporal window such as an entire recording. In this paper we focus on inferring meaning from this dichotomy of contexts. We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency (F0F_0) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks. We propose three self-supervised deep learning paradigms which leverage pseudotask learning of these two levels of context to produce latent representation spaces. We evaluate the usefulness of these representations by embedding unseen vocal contours into each space and conducting downstream classification tasks. Our results show that contextual representation can enhance downstream classification by as much as 15 % as compared to using traditional statistical contour features.Comment: Working on more updated versio

    Computational analysis of world music corpora

    Get PDF
    PhDThe comparison of world music cultures has been considered in musicological research since the end of the 19th century. Traditional methods from the field of comparative musicology typically involve the process of manual music annotation. While this provides expert knowledge, the manual input is timeconsuming and limits the potential for large-scale research. This thesis considers computational methods for the analysis and comparison of world music cultures. In particular, Music Information Retrieval (MIR) tools are developed for processing sound recordings, and data mining methods are considered to study similarity relationships in world music corpora. MIR tools have been widely used for the study of (mainly) Western music. The first part of this thesis focuses on assessing the suitability of audio descriptors for the study of similarity in world music corpora. An evaluation strategy is designed to capture challenges in the automatic processing of world music recordings and different state-of-the-art descriptors are assessed. Following this evaluation, three approaches to audio feature extraction are considered, each addressing a different research question. First, a study of singing style similarity is presented. Singing is one of the most common forms of musical expression and it has played an important role in the oral transmission of world music. Hand-designed pitch descriptors are used to model aspects of the singing voice and clustering methods reveal singing style similarities in world music. Second, a study on music dissimilarity is performed. While musical exchange is evident in the history of world music it might be possible that some music cultures have resisted external musical influence. Low-level audio features are combined with machine learning methods to find music examples that stand out in a world music corpus, and geographical patterns are examined. The last study models music similarity using descriptors learned automatically with deep neural networks. It focuses on identifying music examples that appear to be similar in their audio content but share no (obvious) geographical or cultural links in their metadata. Unexpected similarities modelled in this way uncover possible hidden links between world music cultures. This research investigates whether automatic computational analysis can uncover meaningful similarities between recordings of world music. Applications derive musicological insights from one of the largest world music corpora studied so far. Computational analysis as proposed in this thesis advances the state-of-the-art in the study of world music and expands the knowledge and understanding of musical exchange in the world.Queen Mary Principal’s research studentship

    Data-driven, memory-based computational models of human segmentation of musical melody

    Get PDF
    When listening to a piece of music, listeners often identify distinct sections or segments within the piece. Music segmentation is recognised as an important process in the abstraction of musical contents and researchers have attempted to explain how listeners perceive and identify the boundaries of these segments.The present study seeks the development of a system that is capable of performing melodic segmentation in an unsupervised way, by learning from non-annotated musical data. Probabilistic learning methods have been widely used to acquire regularities in large sets of data, with many successful applications in language and speech processing. Some of these applications have found their counterparts in music research and have been used for music prediction and generation, music retrieval or music analysis, but seldom to model perceptual and cognitive aspects of music listening.We present some preliminary experiments on melodic segmentation, which highlight the importance of memory and the role of learning in music listening. These experiments have motivated the development of a computational model for melodic segmentation based on a probabilistic learning paradigm.The model uses a Mixed-memory Markov Model to estimate sequence probabilities from pitch and time-based parametric descriptions of melodic data. We follow the assumption that listeners' perception of feature salience in melodies is strongly related to expectation. Moreover, we conjecture that outstanding entropy variations of certain melodic features coincide with segmentation boundaries as indicated by listeners.Model segmentation predictions are compared with results of a listening study on melodic segmentation carried out with real listeners. Overall results show that changes in prediction entropy along the pieces exhibit significant correspondence with the listeners' segmentation boundaries.Although the model relies only on information theoretic principles to make predictions on the location of segmentation boundaries, it was found that most predicted segments can be matched with boundaries of groupings usually attributed to Gestalt rules.These results question previous research supporting a separation between learningbased and innate bottom-up processes of melodic grouping, and suggesting that some of these latter processes can emerge from acquired regularities in melodic data

    Music as complex emergent behaviour : an approach to interactive music systems

    Get PDF
    Access to the full-text thesis is no longer available at the author's request, due to 3rd party copyright restrictions. Access removed on 28.11.2016 by CS (TIS).Metadata merged with duplicate record (http://hdl.handle.net/10026.1/770) on 20.12.2016 by CS (TIS).This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.This thesis suggests a new model of human-machine interaction in the domain of non-idiomatic musical improvisation. Musical results are viewed as emergent phenomena issuing from complex internal systems behaviour in relation to input from a single human performer. We investigate the prospect of rewarding interaction whereby a system modifies itself in coherent though non-trivial ways as a result of exposure to a human interactor. In addition, we explore whether such interactions can be sustained over extended time spans. These objectives translate into four criteria for evaluation; maximisation of human influence, blending of human and machine influence in the creation of machine responses, the maintenance of independent machine motivations in order to support machine autonomy and finally, a combination of global emergent behaviour and variable behaviour in the long run. Our implementation is heavily inspired by ideas and engineering approaches from the discipline of Artificial Life. However, we also address a collection of representative existing systems from the field of interactive composing, some of which are implemented using techniques of conventional Artificial Intelligence. All systems serve as a contextual background and comparative framework helping the assessment of the work reported here. This thesis advocates a networked model incorporating functionality for listening, playing and the synthesis of machine motivations. The latter incorporate dynamic relationships instructing the machine to either integrate with a musical context suggested by the human performer or, in contrast, perform as an individual musical character irrespective of context. Techniques of evolutionary computing are used to optimise system components over time. Evolution proceeds based on an implicit fitness measure; the melodic distance between consecutive musical statements made by human and machine in relation to the currently prevailing machine motivation. A substantial number of systematic experiments reveal complex emergent behaviour inside and between the various systems modules. Music scores document how global systems behaviour is rendered into actual musical output. The concluding chapter offers evidence of how the research criteria were accomplished and proposes recommendations for future research

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    Contributions on education (EUDIA-8)

    Get PDF
    136 p.Hizkeren sailkapenaz zientifikoki hitz egiten hasteko ezinbestekoa da zientzietan sailkapenak nola egiten diren aipatzea. Egia esan, metodo zientifikoez hitz egitean ez dela metodo bakarra esan behar dugu ezer baino lehen: metodo zientifikoa ez dela bide zurruna, alegia. Gainerakoetan bezala, giza zientzietan edo, zehatzago, hizkuntzalaritzan metodo zientifikoez dihardugunean metodo objektiboez hitz egitea dagokigu. Taxonomia arduratzen da aztertzen dituen objektuak bereizi eta beraien arteko harremanen araberako egitura bilatu eta egitura horretan objektuak kokatzeaz, antzekotasun, berdintasun edo hurbiltasuna kontuan hartuta. Bide honetan, zientziak, gure kasuan zientzia enpirikoak (badira logiko-deduktiboak eta induktiboak ere) ezaugarritu duen prozedura-multzoa hau da: behaketa sistematikoa, neurketa, formulazioa eta analisia; hots, hizkuntza bere osotasunean ezin denez aukeratu, lagin bat, lagin ordezkatzaile bat aukeratu behar da
    • …
    corecore