16 research outputs found

    Multimodal music information processing and retrieval: survey and future challenges

    Full text link
    Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

    Multilayer Music Representation and Processing: Key Advances and Emerging Trends

    Get PDF
    This work represents the introduction to the proceedings of the 1st International Workshop on Multilayer Music Representation and Processing (MMRP19) authored by the Program Co-Chairs. The idea is to explain the rationale behind such a scientific initiative, describe the methodological approach used in paper selection, and provide a short overview of the workshop's accepted works, trying to highlight the thread that runs through different contributions and approaches

    A Convolutional Approach to Melody Line Identification in Symbolic Scores

    Get PDF
    In many musical traditions, the melody line is of primary significance in a piece. Human listeners can readily distinguish melodies from accompaniment; however, making this distinction given only the written score -- i.e. without listening to the music performed -- can be a difficult task. Solving this task is of great importance for both Music Information Retrieval and musicological applications. In this paper, we propose an automated approach to identifying the most salient melody line in a symbolic score. The backbone of the method consists of a convolutional neural network (CNN) estimating the probability that each note in the score (more precisely: each pixel in a piano roll encoding of the score) belongs to the melody line. We train and evaluate the method on various datasets, using manual annotations where available and solo instrument parts where not. We also propose a method to inspect the CNN and to analyze the influence exerted by notes on the prediction of other notes; this method can be applied whenever the output of a neural network has the same size as the input

    Integration of Audio Resources into a Digital Library: The BEIC Case Study

    Full text link

    Integration of Audio Resources into a Digital Library: The BEIC Case Study

    Get PDF
    The focus of this paper is on the integration of audio resources with other content types in digital libraries. As a case study, we will present the most recent initiative of the Biblioteca Europea di Informazione e Cultura (BEIC), an Italian institution that pursues educational and instructional goals through the realization and management of a multimedia, free access, open shelf library. A new audio section will be added to the already-existing digital archive, allowing users to listen to about 1000 classical recordings in a multi-platform and cross-browser manner. This experience involves a number of heterogeneous fields, ranging from musicology to computer programming, from cataloging to digitization and archiving. In this paper, we will apply a bottom-up technique in order to provide a generalization of the specific case study, thus suggesting a methodological approach for similar initiatives

    Transfer Learning for Improved Audio-Based Human Activity Recognition

    Get PDF
    Human activities are accompanied by characteristic sound events, the processing of which might provide valuable information for automated human activity recognition. This paper presents a novel approach addressing the case where one or more human activities are associated with limited audio data, resulting in a potentially highly imbalanced dataset. Data augmentation is based on transfer learning; more specifically, the proposed method: (a) identifies the classes which are statistically close to the ones associated with limited data; (b) learns a multiple input, multiple output transformation; and (c) transforms the data of the closest classes so that it can be used for modeling the ones associated with limited data. Furthermore, the proposed framework includes a feature set extracted out of signal representations of diverse domains, i.e., temporal, spectral, and wavelet. Extensive experiments demonstrate the relevance of the proposed data augmentation approach under a variety of generative recognition schemes
    corecore