29 research outputs found

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Improving Automatic Jazz Melody Generation by Transfer Learning Techniques

    Full text link
    In this paper, we tackle the problem of transfer learning for Jazz automatic generation. Jazz is one of representative types of music, but the lack of Jazz data in the MIDI format hinders the construction of a generative model for Jazz. Transfer learning is an approach aiming to solve the problem of data insufficiency, so as to transfer the common feature from one domain to another. In view of its success in other machine learning problems, we investigate whether, and how much, it can help improve automatic music generation for under-resourced musical genres. Specifically, we use a recurrent variational autoencoder as the generative model, and use a genre-unspecified dataset as the source dataset and a Jazz-only dataset as the target dataset. Two transfer learning methods are evaluated using six levels of source-to-target data ratios. The first method is to train the model on the source dataset, and then fine-tune the resulting model parameters on the target dataset. The second method is to train the model on both the source and target datasets at the same time, but add genre labels to the latent vectors and use a genre classifier to improve Jazz generation. The evaluation results show that the second method seems to perform better overall, but it cannot take full advantage of the genre-unspecified dataset.Comment: 8 pages, Accepted to APSIPA ASC(Asia-Pacific Signal and Information Processing Association Annual Summit and Conference ) 201

    Toward Interactive Music Generation: A Position Paper

    Get PDF
    Music generation using deep learning has received considerable attention in recent years. Researchers have developed various generative models capable of imitating musical conventions, comprehending the musical corpora, and generating new samples based on the learning outcome. Although the samples generated by these models are persuasive, they often lack musical structure and creativity. For instance, a vanilla end-to-end approach, which deals with all levels of music representation at once, does not offer human-level control and interaction during the learning process, leading to constrained results. Indeed, music creation is a recurrent process that follows some principles by a musician, where various musical features are reused or adapted. On the other hand, a musical piece adheres to a musical style, breaking down into precise concepts of timbre style, performance style, composition style, and the coherency between these aspects. Here, we study and analyze the current advances in music generation using deep learning models through different criteria. We discuss the shortcomings and limitations of these models regarding interactivity and adaptability. Finally, we draw the potential future research direction addressing multi-agent systems and reinforcement learning algorithms to alleviate these shortcomings and limitations

    Computer Music Algorithms. Bio-inspired and Articial Intelligence Applications

    Get PDF
    2014 - 2015Music is one of the arts that have most benefited from the invention of computers. Originally, the term Computer Music was used in the scientific community to identify the application of information technology in music composition. It began over time to include the theory and application of new or existing technologies in music, such as sound synthesis, sound design, acoustic, psychoacoustic. Thanks to its interdisciplinary nature, Computer Music can be seen as the encounter of different disciplines. In the last years technology has redefined the way individuals can work, communicate, share experiences, constructively debate, and actively participate to any aspect of the daily life, ranging from business to education, from political and intellectual to social, and also in music activity, such as play music, compose music and so on. In this new context, Computer Music has become an emerging research area for the application of Computational Intelligence techniques, such as machine learning, pattern recognition, bio-inspired algorithms and so on. My research activity is concerned with the Bio-inspired and Artificial Intelligence Applications in the Computer Music. Some of the problems I addressed are summarized in the following. Automatic composition of background music for games, films and other human activities: EvoBackMusic. Systems for real-time composition of background music respond to changes of the environment by generating music that matches the current state of the environment and/or of the user. We propose one such a system that we call EvoBackMusic. It is a multiagent system that exploits a feed-forward neural network and a multi-objective genetic algorithm to produce background music. The neural network is trained to learn the preferences of the user and such preferences are exploited by the genetic algorithm to compose the music. The composition process takes into account a set of controllers that describe several aspects of the environment, like the dynamism of both the user and the 2 context, other physical characteristics, and the emotional state of the user. Previous system mainly focus on the emotional aspect. Publications: • Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino: ‘‘An Evolutionary Composer for Real-Time Background Music’’. EvoMUSART 2016: 135-151. Interaction modalities for music performances: MarcoSmiles. In this field we considered new interaction modalities during music performances by using hands without the support of a real musical instrument. Exploiting natural user interfaces (NUI), initially conceived for the game market, it is possible to enhance the traditional modalities of interaction when accessing to technology, build new forms of interactions by transporting users in a virtual dimension, but that fully reflects the reality, and finally, improve the overall perceived experience. The increasing popularity of these innovative interfaces involved their adoption in other fields, including Computer Music. We propose a system, named MarcoSmiles, specifically designed to allow individuals to perform music in an easy, innovative, and personalized way. The idea is to design new interaction modalities during music performances by using hands without the support of a real musical instrument. We exploited Artificial Neural Networks to customize the virtual musical instrument, to provide the information for the mapping of the hands configurations into musical notes and, finally, to train and test these configurations. We performed several tests to study the behavior of the system and its efficacy in terms of learning capabilities. Publications: • Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino: ‘‘Natural Users Interfaces to support and enhance Real-Time Music Performance’’. AVI 2016. 3 Bio-inspired approach for automatic music composition Here we describe a new bio-inspired approach for automatic music composition in a specific style: Music Splicing System. Splicing systems were introduced by Tom Head (1987) as a formal model of a recombination process between DNA molecules. The existing literature on splicing systems mainly focuses on the computational power of these systems and on the properties of the generated languages; very few applications based on splicing systems have been introduced. We show a novel application of splicing systems to build an automatic music composer. As a result of a performance study we proved that our composer outperforms other meta-heuristics by producing better music according to a specific measure of quality evaluation, and this proved that the proposed system can be seen also as a new valid bio-inspired strategy for automatic music composition. Publications: ▪ Clelia De Felice, Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino, Rosalba Zizza: ‘‘Splicing Music Composition’’. Information Sciences Journal, 385: 196 – 215 (2017). ▪ Clelia De Felice, Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino, Rosalba Zizza: ‘‘Chorale Music Splicing System: An Algorithmic Music Composer Inspired by Molecular Splicing’’. EvoMusart 2015: 50 – 61. Music and Visualization Here we describe new approaches for learning of harmonic and melodic rules of classic music, by using visualization techniques: VisualMelody and VisualHarmony. Experienced musicians have the ability to understand the structural elements of music compositions. Such an ability is built over time through the study of music theory, the understanding of rules that guide the composition of music, and through countless hours of practice. The learning process is hard, especially for classical music, where the rigidity of the music structures and styles requires great effort to understand, assimilate, and then master the learned notions. In particular, we focused our attention on a specific type of music compositions, namely, music in chorale style (4-voice music). Composing such type of music 4 is often perceived as a difficult task, because of the rules the composer has to adhere to. In this paper we propose a visualization technique that can help people lacking a strong knowledge of music theory. The technique exploits graphic elements to draw the attention on the possible errors in the composition. We then developed two interactive systems, named VisualMelody and VisualHarmony, that employ the proposed visualization techniques to facilitate the understanding of the structure of music compositions. The aim is to allow people to make 4-voice music composition in a quick and effective way, i.e., avoiding errors, as dictated by classical music theory rules. Publications: ▪ Roberto De Prisco, Delfina Malandrino, Donato Pirozzi, Gianluca Zaccagnino, Rocco Zaccagnino: ‘‘Understanding the structure of music compositions: is visualization an effective approach?’’ Information Visualization Journal, 2016. (DOI): 10.1177/1473871616655468 • Delfina Malandrino, Donato Pirozzi, Gianluca Zaccagnino, Rocco Zaccagnino: ‘‘A Color-Based Visualization Approach to Understand Harmonic Structures of Musical Compositions’’. IV 2015: 56-61. • Delfina Malandrino, Donato Pirozzi, Gianluca Zaccagnino, Rocco Zaccagnino: ‘‘Visual Approaches for Harmonic Analysis of 4-part Music: Implementation and Evaluation’’. Major revision – Journal of Visual Languages and Computing, 2016. [edited by Author]XIV n.s

    Deep Learning for Continuous Symbolic Melody Generation Conditioned on Lyrics and Initial melodies

    Get PDF
    Symbolic music generation is an interdisciplinary research area, combining machine learning and music theory. This project focuses on the intersection of two problems within music generation, namely generating continuous music following a given seed (introduction), and rhythmically matching given lyrics. It enables artists to use AI as creative aid, obtaining a complete song having only written the lyrics and an initial melody. We propose a method for targeted training of a recursive Generative Adversarial Network (GAN) for initial melody conditioned generation, and explore the possibilities of using other state-of-the-art deep learning generation techniques, such as Denoising Diffusion Probabilistic Models (DDPMs), Long-short-term-memory networks (LSTMs) and the attention mechanism

    Zero-Shot Blind Audio Bandwidth Extension

    Full text link
    Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to non-blind filter-informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin
    corecore