71,648 research outputs found

    Music information retrieval: conceptuel framework, annotation and user behaviour

    Get PDF
    Understanding music is a process both based on and influenced by the knowledge and experience of the listener. Although content-based music retrieval has been given increasing attention in recent years, much of the research still focuses on bottom-up retrieval techniques. In order to make a music information retrieval system appealing and useful to the user, more effort should be spent on constructing systems that both operate directly on the encoding of the physical energy of music and are flexible with respect to users’ experiences. This thesis is based on a user-centred approach, taking into account the mutual relationship between music as an acoustic phenomenon and as an expressive phenomenon. The issues it addresses are: the lack of a conceptual framework, the shortage of annotated musical audio databases, the lack of understanding of the behaviour of system users and shortage of user-dependent knowledge with respect to high-level features of music. In the theoretical part of this thesis, a conceptual framework for content-based music information retrieval is defined. The proposed conceptual framework - the first of its kind - is conceived as a coordinating structure between the automatic description of low-level music content, and the description of high-level content by the system users. A general framework for the manual annotation of musical audio is outlined as well. A new methodology for the manual annotation of musical audio is introduced and tested in case studies. The results from these studies show that manually annotated music files can be of great help in the development of accurate analysis tools for music information retrieval. Empirical investigation is the foundation on which the aforementioned theoretical framework is built. Two elaborate studies involving different experimental issues are presented. In the first study, elements of signification related to spontaneous user behaviour are clarified. In the second study, a global profile of music information retrieval system users is given and their description of high-level content is discussed. This study has uncovered relationships between the users’ demographical background and their perception of expressive and structural features of music. Such a multi-level approach is exceptional as it included a large sample of the population of real users of interactive music systems. Tests have shown that the findings of this study are representative of the targeted population. Finally, the multi-purpose material provided by the theoretical background and the results from empirical investigations are put into practice in three music information retrieval applications: a prototype of a user interface based on a taxonomy, an annotated database of experimental findings and a prototype semantic user recommender system. Results are presented and discussed for all methods used. They show that, if reliably generated, the use of knowledge on users can significantly improve the quality of music content analysis. This thesis demonstrates that an informed knowledge of human approaches to music information retrieval provides valuable insights, which may be of particular assistance in the development of user-friendly, content-based access to digital music collections

    FROM MUSIC INFORMATION RETRIEVAL (MIR) TO INFORMATION RETRIEVAL FOR MUSIC (IRM)

    Get PDF
    This thesis reviews and discusses certain techniques from the domain of (Music) Information Retrieval, in particular some general data mining algorithms. It also describes their specific adaptations for use as building blocks in the CACE4 software application. The use of Augmented Transition Networks (ATN) from the field of (Music) Information Retrieval is, to a certain extent, adequate as long as one keeps the underlying tonal constraints and rules as a guide to understanding the structure one is looking for. However since a large proportion of algorithmic music, including music composed by the author, is atonal, tonal constraints and rules are of little use. Analysis methods from Hierarchical Clustering Techniques (HCT) such as k-means and Expectation-Maximisation (EM) facilitate other approaches and are better suited for finding (clustered) structures in large data sets. ART2 Neural Networks (Adaptive Resonance Theory) for example, can be used for analysing and categorising these data sets. Statistical tools such as histogram analysis, mean, variance as well as correlation calculations can provide information about connections between members in a data set. Altogether this provides a diverse palette of usable data analysis methods and strategies for creating algorithmic atonal music. Now acting as (software) strategy tools, their use is determined by the quality of their output within a musical context, as demonstrated when developed and programmed into the Computer Assisted Composition Environment: CACE4. Music Information Retrieval techniques are therefore inverted: their specific techniques and associated methods of Information Retrieval and general data mining are used to access the organisation and constraints of abstract (non-specific musical) data in order to use and transform it in a musical composition

    MusCaps: generating captions for music audio

    Get PDF
    Content-based music information retrieval has seen rapid progress with the adoption of deep learning. Current approaches to high-level music description typically make use of classification models, such as in auto tagging or genre and mood classification. In this work, we propose to address music description via audio captioning, defined as the task of generating a natural language description of music audio content in a human-like manner. To this end, we present the first music audio captioning model, MusCaps, consisting of an encoder-decoder with temporal attention. Our method combines convolutional and recurrent neural network architectures to jointly process audio-text inputs through a multimodal encoder and leverages pre-training on audio data to obtain representations that effectively capture and summarise musical features in the input. Evaluation of the generated captions through automatic metrics shows that our method outperforms a baseline designed for non-music audio captioning. Through an ablation study, we unveil that this performance boost can be mainly attributed to pre-training of the audio encoder, while other design choices – modality fusion, decoding strategy and the use of attention -- contribute only marginally. Our model represents a shift away from classification-based music description and combines tasks requiring both auditory and linguistic understanding to bridge the semantic gap in music information retrieval

    Reconsidering memorisation in the context of non-tonal piano music

    Get PDF
    Performers, pedagogues and researchers have shared interest in the topic of musical memorisation for centuries. A large and diverse body of studies on this subject has contributed to the current understanding of musicians’ views of performing from memory, as well as the mechanisms governing encoding and retrieval of musical information. Nevertheless, with a few exceptions, existing research is still highly based on tonal music and lacks further examination in the musical world of non-tonality. The convention of performing from memory is a well-established practice for particular instruments and musical genres, but an exception is often made for recent styles of repertoire moving away from tonality. No study to date has systematically investigated the reasons for such exception and musicians’ views on this matter. Moreover, the existing principles of memorisation that are thought to apply to musicians in the highest levels of skill are strongly based on the use of conceptual knowledge of tonal musical vernacular. Such knowledge is often obscured or absent in non-tonal repertoire. This thesis aims to extend the findings of previous research into musical memorisation in the context of non-tonal piano repertoire by documenting pianists’ views and practices in committing this music to memory. An interview study with pianists expert in contemporary music (Chapter 3) establishes the background for the thesis. A variety of views on performing contemporary music from memory were reported, with several pianists advocating benefits from performing this repertoire by heart and others from using the score. Memorisation accounts revealed idiosyncrasy and variety, but emphasised the importance of specific strategies, such as the use of mental rehearsal, principles of chunking applicable to this repertoire and the importance of different types of memory and their combination. The second study (Chapter 4) explores the topic in further depth, by thoroughly examining the author’s entire process of learning and memorising a newly commissioned non-tonal piece for prepared piano. This study extends findings from performance cue (PC) theory. This widely recognised account of expert memory in music suggests that musicians develop retrieval schemes hierarchically organised around their understanding of musical structure, using different types of PCs. The use of retrieval schemes in this context is confirmed by this study. The author organised the scheme around her own understanding of musical structure, which was gradually developed while working through the piece, since the music had no aural model available or ready-made structural framework to hold on to early in the process. Extending previous research, new types of PCs were documented and, for the first time, negative serial position effects were found for basic PCs (e.g., fingering, notes, patterns) in long-term recall. Finally, the study provided behavioural evidence for the use of chunking in non-tonal piano music. The third study (Chapters 5 and 6) extends these findings to a serial piece memorised by six pianists. Following a multiple-case study approach, this study observed in great depth memorisation approaches carried out by two of those pianists, who performed the music very accurately from memory, and by one pianist who performed less accurately. The first two pianists developed retrieval schemes based on their understanding of musical structure and different types of PCs, mainly basic and structural. Comparisons between the pianists revealed very different views of musical structure in the piece. Even so, both musicians used such understanding to organise encoding and retrieval. The pianist with the least accurate performance adopted an unsystematic approach, mainly relying on incidental memorisation. The absence of a conceptual retrieval scheme resulted in an inability to fully recover from a major memory lapse in performance. The findings of this research provide novel insights into pianists’ views towards performing non-tonal music from memory and into the cognitive mechanisms governing the encoding and retrieval of this music, which have practical applications for musicians wishing to memorise non-tonal piano music

    CaRo 2.0: an interactive system for expressive music rendering

    Get PDF
    In several application contexts in multimedia field (educational, extreme gaming), the interaction with the user requests that system is able to render music in expressive way. The expressiveness is the added value of a performance and is part of the reason that music is interesting to listen. Understanding and modeling expressive content communication is important for many engineering applications in information technology (e.g., Music Information Retrieval, as well as several applications in the affective computing field). In this paper, we present an original approach to modify the expressive content of a performance in a gradual way, applying a smooth morphing among performances with different expressive content in order to adapt the audio expressive character to the user's desires. The system won the final stage of Rencon 2011. This performance RENdering CONtest is a research project that organizes contests for computer systems generating expressive musical performances

    Tracking beats and microtiming in Afro-Latin American music using conditional random fields and deep learning

    Get PDF
    Trabajo presentado en ISMIR 2019 : 20th Conference of the International Society for Music Information Retrieval, Delft, Netherlands, 4-8 nov, 2019PostprintEvents in music frequently exhibit small-scale temporal deviations (microtiming), with respect to the underlying regular metrical grid. In some cases, as in music from the Afro-Latin American tradition, such deviations appear systematically, disclosing their structural importance in rhythmic and stylistic configuration. In this work we explore the idea of automatically and jointly tracking beats and microtiming in timekeeper instruments of Afro-Latin American music, in particular Brazilian samba and Uruguayan candombe. To that end, we propose a language model based on conditional random fields that integrates beat and onset likelihoods as observations. We derive those activations using deep neural networks and evaluate its performance on manually annotated data using a scheme adapted to this task. We assess our approach in controlled conditions suitable for these timekeeper instruments, and study the microtiming profiles’ dependency on genre and performer, illustrating promising aspects of this technique towards a more comprehensive understanding of these music traditions

    MARBLE: Music Audio Representation Benchmark for Universal Evaluation

    Get PDF
    In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue, we introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and reproducible suite for the community, with a clear statement on copyright issues on datasets. Results suggest recently proposed large-scale pre-trained musical language models perform the best in most tasks, with room for further improvement. The leaderboard and toolkit repository are published at this https URL to promote future music AI research
    • …
    corecore