29 research outputs found
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
Improving Automatic Jazz Melody Generation by Transfer Learning Techniques
In this paper, we tackle the problem of transfer learning for Jazz automatic
generation. Jazz is one of representative types of music, but the lack of Jazz
data in the MIDI format hinders the construction of a generative model for
Jazz. Transfer learning is an approach aiming to solve the problem of data
insufficiency, so as to transfer the common feature from one domain to another.
In view of its success in other machine learning problems, we investigate
whether, and how much, it can help improve automatic music generation for
under-resourced musical genres. Specifically, we use a recurrent variational
autoencoder as the generative model, and use a genre-unspecified dataset as the
source dataset and a Jazz-only dataset as the target dataset. Two transfer
learning methods are evaluated using six levels of source-to-target data
ratios. The first method is to train the model on the source dataset, and then
fine-tune the resulting model parameters on the target dataset. The second
method is to train the model on both the source and target datasets at the same
time, but add genre labels to the latent vectors and use a genre classifier to
improve Jazz generation. The evaluation results show that the second method
seems to perform better overall, but it cannot take full advantage of the
genre-unspecified dataset.Comment: 8 pages, Accepted to APSIPA ASC(Asia-Pacific Signal and Information
Processing Association Annual Summit and Conference ) 201
Toward Interactive Music Generation: A Position Paper
Music generation using deep learning has received considerable attention in recent years. Researchers have developed various generative models capable of imitating musical conventions, comprehending the musical corpora, and generating new samples based on the learning outcome. Although the samples generated by these models are persuasive, they often lack musical structure and creativity. For instance, a vanilla end-to-end approach, which deals with all levels of music representation at once, does not offer human-level control and interaction during the learning process, leading to constrained results. Indeed, music creation is a recurrent process that follows some principles by a musician, where various musical features are reused or adapted. On the other hand, a musical piece adheres to a musical style, breaking down into precise concepts of timbre style, performance style, composition style, and the coherency between these aspects. Here, we study and analyze the current advances in music generation using deep learning models through different criteria. We discuss the shortcomings and limitations of these models regarding interactivity and adaptability. Finally, we draw the potential future research direction addressing multi-agent systems and reinforcement learning algorithms to alleviate these shortcomings and limitations
Computer Music Algorithms. Bio-inspired and ArtiďŹcial Intelligence Applications
2014 - 2015Music is one of the arts that have most benefited from the invention of computers. Originally, the term Computer Music was used in the scientific community to identify the application of information technology in music composition. It began over time to include the theory and application of new or existing technologies in music, such as sound synthesis, sound design, acoustic, psychoacoustic. Thanks to its interdisciplinary nature, Computer Music can be seen as the encounter of different disciplines. In the last years technology has redefined the way individuals can work, communicate, share experiences, constructively debate, and actively participate to any aspect of the daily life, ranging from business to education, from political and intellectual to social, and also in music activity, such as play music, compose music and so on. In this new context, Computer Music has become an emerging research area for the application of Computational Intelligence techniques, such as machine learning, pattern recognition, bio-inspired algorithms and so on. My research activity is concerned with the Bio-inspired and Artificial Intelligence Applications in the Computer Music. Some of the problems I addressed are summarized in the following.
Automatic composition of background music for games, films and other human activities: EvoBackMusic.
Systems for real-time composition of background music respond to changes of the environment by generating music that matches the current state of the environment and/or of the user. We propose one such a system that we call EvoBackMusic. It is a multiagent system that exploits a feed-forward neural network and a multi-objective genetic algorithm to produce background music. The neural network is trained to learn the preferences of the user and such preferences are exploited by the genetic algorithm to compose the music. The composition process takes into account a set of controllers that describe several aspects of the environment, like the dynamism of both the user and the
2 context, other physical characteristics, and the emotional state of the user. Previous system mainly focus on the emotional aspect.
Publications: ⢠Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino: ââAn Evolutionary Composer for Real-Time Background Musicââ. EvoMUSART 2016: 135-151.
Interaction modalities for music performances: MarcoSmiles.
In this field we considered new interaction modalities during music performances by using hands without the support of a real musical instrument. Exploiting natural user interfaces (NUI), initially conceived for the game market, it is possible to enhance the traditional modalities of interaction when accessing to technology, build new forms of interactions by transporting users in a virtual dimension, but that fully reflects the reality, and finally, improve the overall perceived experience. The increasing popularity of these innovative interfaces involved their adoption in other fields, including Computer Music. We propose a system, named MarcoSmiles, specifically designed to allow individuals to perform music in an easy, innovative, and personalized way. The idea is to design new interaction modalities during music performances by using hands without the support of a real musical instrument. We exploited Artificial Neural Networks to customize the virtual musical instrument, to provide the information for the mapping of the hands configurations into musical notes and, finally, to train and test these configurations. We performed several tests to study the behavior of the system and its efficacy in terms of learning capabilities.
Publications: ⢠Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino: ââNatural Users Interfaces to support and enhance Real-Time Music Performanceââ. AVI 2016.
3
Bio-inspired approach for automatic music composition
Here we describe a new bio-inspired approach for automatic music composition in a specific style: Music Splicing System. Splicing systems were introduced by Tom Head (1987) as a formal model of a recombination process between DNA molecules. The existing literature on splicing systems mainly focuses on the computational power of these systems and on the properties of the generated languages; very few applications based on splicing systems have been introduced. We show a novel application of splicing systems to build an automatic music composer. As a result of a performance study we proved that our composer outperforms other meta-heuristics by producing better music according to a specific measure of quality evaluation, and this proved that the proposed system can be seen also as a new valid bio-inspired strategy for automatic music composition.
Publications: ⪠Clelia De Felice, Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino, Rosalba Zizza: ââSplicing Music Compositionââ. Information Sciences Journal, 385: 196 â 215 (2017). ⪠Clelia De Felice, Roberto De Prisco, Delfina Malandrino, Gianluca Zaccagnino, Rocco Zaccagnino, Rosalba Zizza: ââChorale Music Splicing System: An Algorithmic Music Composer Inspired by Molecular Splicingââ. EvoMusart 2015: 50 â 61.
Music and Visualization
Here we describe new approaches for learning of harmonic and melodic rules of classic music, by using visualization techniques: VisualMelody and VisualHarmony. Experienced musicians have the ability to understand the structural elements of music compositions. Such an ability is built over time through the study of music theory, the understanding of rules that guide the composition of music, and through countless hours of practice. The learning process is hard, especially for classical music, where the rigidity of the music structures and styles requires great effort to understand, assimilate, and then master the learned notions. In particular, we focused our attention on a specific type of music compositions, namely, music in chorale style (4-voice music). Composing such type of music
4 is often perceived as a difficult task, because of the rules the composer has to adhere to. In this paper we propose a visualization technique that can help people lacking a strong knowledge of music theory. The technique exploits graphic elements to draw the attention on the possible errors in the composition. We then developed two interactive systems, named VisualMelody and VisualHarmony, that employ the proposed visualization techniques to facilitate the understanding of the structure of music compositions. The aim is to allow people to make 4-voice music composition in a quick and effective way, i.e., avoiding errors, as dictated by classical music theory rules.
Publications: ⪠Roberto De Prisco, Delfina Malandrino, Donato Pirozzi, Gianluca Zaccagnino, Rocco Zaccagnino: ââUnderstanding the structure of music compositions: is visualization an effective approach?ââ Information Visualization Journal, 2016. (DOI): 10.1177/1473871616655468 ⢠Delfina Malandrino, Donato Pirozzi, Gianluca Zaccagnino, Rocco Zaccagnino: ââA Color-Based Visualization Approach to Understand Harmonic Structures of Musical Compositionsââ. IV 2015: 56-61. ⢠Delfina Malandrino, Donato Pirozzi, Gianluca Zaccagnino, Rocco Zaccagnino: ââVisual Approaches for Harmonic Analysis of 4-part Music: Implementation and Evaluationââ. Major revision â Journal of Visual Languages and Computing, 2016. [edited by Author]XIV n.s
Deep Learning for Continuous Symbolic Melody Generation Conditioned on Lyrics and Initial melodies
Symbolic music generation is an interdisciplinary research area, combining machine learning and music theory. This project focuses on the intersection of two problems within music generation, namely generating continuous music following a given seed (introduction), and rhythmically matching given lyrics. It enables artists to use AI as creative aid, obtaining a complete song having only written the lyrics and an initial melody. We propose a method for targeted training of a recursive Generative Adversarial Network (GAN) for initial melody conditioned generation, and explore the possibilities of using other state-of-the-art deep learning generation techniques, such as Denoising Diffusion Probabilistic Models (DDPMs), Long-short-term-memory networks (LSTMs) and the attention mechanism
Zero-Shot Blind Audio Bandwidth Extension
Audio bandwidth extension involves the realistic reconstruction of
high-frequency spectra from bandlimited observations. In cases where the
lowpass degradation is unknown, such as in restoring historical audio
recordings, this becomes a blind problem. This paper introduces a novel method
called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem
in a zero-shot setting, leveraging the generative priors of a pre-trained
unconditional diffusion model. During the inference process, BABE utilizes a
generalized version of diffusion posterior sampling, where the degradation
operator is unknown but parametrized and inferred iteratively. The performance
of the proposed method is evaluated using objective and subjective metrics, and
the results show that BABE surpasses state-of-the-art blind bandwidth extension
baselines and achieves competitive performance compared to non-blind
filter-informed methods when tested with synthetic data. Moreover, BABE
exhibits robust generalization capabilities when enhancing real historical
recordings, effectively reconstructing the missing high-frequency content while
maintaining coherence with the original recording. Subjective preference tests
confirm that BABE significantly improves the audio quality of historical music
recordings. Examples of historical recordings restored with the proposed method
are available on the companion webpage:
(http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)Comment: Submitted to IEEE/ACM Transactions on Audio, Speech and Language
Processin
Recommended from our members
Knowledge transfer using latent variable models
textIn several applications, scarcity of labeled data is a challenging problem that hinders the predictive capabilities of machine learning algorithms. Additionally, the distribution of the data changes over time, rendering models trained with older data less capable of discovering useful structure from the newly available data. Transfer learning is a convenient framework to overcome such problems where the learning of a model specific to a domain can benefit the learning of other models in other domains through either simultaneous training of domains or sequential transfer of knowledge from one domain to the others. This thesis explores the opportunities of knowledge transfer in the context of a few applications pertaining to object recognition from images, text analysis, network modeling and recommender systems, using probabilistic latent variable models as building blocks. Both simultaneous and sequential knowledge transfer are achieved through the latent variables, either by sharing these across multiple related domains (for simultaneous learning) or by adapting their distributions to fit data from a new domain (for sequential learning).Electrical and Computer Engineerin